What is Box?
Box is the market leader for Cloud Content Management. Our mission is to power how the world works together. Box is partnering with enterprise organizations to accelerate their digital transformation by creating a single platform for secure content management, collaboration and workflow. We have an amazing opportunity to further establish ourselves as leaders in space, and we need strong advocates to help us achieve that goal.
Today, Box powers over 100,000 businesses, including 70% of the Fortune 500 who trust Box to manage their content in the cloud. Our Warsaw office is an incredibly exciting addition to our EMEA expansion. We're already in the UK, France, and Germany, and the new Poland location will act as a global engineering and product development hub alongside our headquarters in Redwood City, California.
Why Box Needs You?
The main focus of the Observability Team is to build frameworks and systems that can manage the performance of Box systems while scaling to billions of events per second. Additionally, we are responsible to standardize observability across engineering teams, drive designs for high performing services and foster great observability practices. We build, scale, and operate low-latency, high-throughput data systems that power high resiliency of Box Systems. You will help us execute on this vision and ensure that Box continues to ship scalable services that can hold against the high-performance expectation from our customers.
The Observability Platforms team provides an end-to-end experience enabling Box engineers by leveraging frameworks, tools, APIs and visualisations to better understand the behavior of features, services, and infrastructure they own and maintain. The team also helps educate product, infrastructure, and systems teams on how to appropriately monitor features and services they own, provide visualisations for monitoring distributed systems, give guidance for reducing operational overhead, and supports the delivery of unmatched availability to our customers.
We ideally need a Software Engineer with the experience of having designed, operated, and implemented Observability frameworks at a very large scale, and well versed in the operation of scaled architectures. You should have operational knowledge of distributed systems and how to avoid limitations through innovative design.
We are looking for big thinkers and innovators who have experience working with scalable distributed systems and have a passion for high performance and reliability. We are a small team with big ambitions that values impact and is not afraid of huge, gnarly problems. If this excites you, come join us!
What You'll Do?
You're going to have the unique opportunity to build, improve, and support our Observability (o11y) platform. You will get to work with cutting-edge technologies that are defining the future of Box's cloud platforms. You will have visibility and impact across all of Engineering.
That means you will:
- Provide o11y products like ELK, Splunk, Sensu, Prometheus, AppDynamics, Dynatrace, etc. to engineering teams for centralized logging, APM tooling, monitoring and alerting, and distributed tracing.
- Collaborate with other engineers on the team to foster solid engineering principles and represent our engineering values.
- Manage, maintain and scale the infrastructure responsible for telemetry frameworks used throughout Box's infrastructure, cloud services, and products to capture, transport, store and analyze the telemetry data.
- Scale the observability infrastructure to support petabytes of logs and billions of metric data points daily.
- Collaborate, influence and drive for improvement across scrum teams.
- Provide additional support & perform various pocs on new projects, frameworks for Observability.
- Define and educate platform consumers on observability best practices from a SRE perspective.
- Participate in deep technical design discussions within your team, across partner teams, and ensure that we’re building the right systems.
Who You Are?
- You have experience in building automations, frameworks preferably with Python and/or Go.
- You have understanding of infrastructure automation tools (Puppet, Ansible, or the like).
- You have experience in using industry standard DevOps CI/CD frameworks (Jenkins/Spinnaker, or the like).
- You have production service troubleshooting skills that span applications, systems and network within a primarily Linux environment.
- You take an SRE-centric approach to everything you build/manage, ensuring reliability, availability and security.
- You act like an owner and strive to do work you're proud of, both technically and in your team interactions.
- You are fluent in English.
- You have experience in running containerised services in Private/Public Cloud (GCP, AWS).
- You have a fair understanding of technologies like ElasticSearch, Apache Storm or other DAG technologies, and streaming technologies like Kafka (Pub/Sub, or Kinesis).
- You have built distributed, high-throughput and low-latency systems with a strong focus on availability, resilience, and durability.
- You have experience in managing O11y (Observability) and building, managing metrics and data driven observability platforms and peripherals.
Want to learn more?
- Immerse yourself in the Box Platform: Create a developer account at developer.box.com
- Box Engineers share on opensource.box.com
- Get under the hood, comment on your favorite architecture deep-dive at tech.blog.box.com
- Grab a free account and make your content more valuable: Hit us up at box.com
- And check out some of the work our teams have done:
- Moving to Microservice Infrastructure: https://architecht.io/box-co-founder-on-moving-to-microservices-and-the-promise-of-kubernetes-a49f01b1c0c0
- Kubernetes Case study: https://kubernetes.io/case-studies/box/