The DC/OS SRE Team is responsible for running our internal infrastructure services. This team consists of mostly DevOps engineers with a strong slant towards operations and tool building. All of us come with a strong operations or site reliability background and we heavily dogfood DC/OS in everything we do.

We don't mind getting into the weeds with hard to diagnose networking issues, and we troubleshoot such problems by leveraging our years of frontline experience firefighting within large scale web operations. Some of us have experience with Mesos before coming on board at Mesosphere, and some of us don’t. However, having a strong understanding of distributed systems and systems engineering is key to our success. We’ve been solving Site Reliability Engineering problems through code before SRE or DevOps became a term. We take pride in creating software which people rely on and is a joy to use.

Responsibilities

  • Architect, build, and maintain systems that our engineering team and customers rely on
  • Contribute to documentation for both our customers and other engineers
  • Make DC/OS the easiest operating system to deploy, manage, and monitor at scale
  • Responsible for third party services and production infrastructure in which DC/OS is operating on
  • Partner with other engineers to design, build, and maintain critical systems
  • Consistently work to make our software simpler
  • Effectively estimate time to implement designs
  • Challenge yourself and your peers to always improve

Basic Qualifications

  • Expert level knowledge in at least one high level programming language such as Python or Go
  • 3+ years experience with production infrastructure
  • Designed and operated large scale infrastructure running on AWS, GCP, Azure or other cloud providers
  • Able to debug, troubleshoot, and resolve complex technical issues reported by customers
  • Background in system administration, operations or site reliability
  • Understanding of network protocols and networking in general
  • Deep knowledge of Linux fundamentals

Preferred Qualifications

  • Production experience with service oriented architectures and distributed systems like Mesos, Kafka, Cassandra, Hadoop, Zookeeper, etc.
  • An extremely clear, concise, and effective communicator
  • Worked with container systems like Docker or Rkt in production
  • Strong sense of ownership, urgency, and drive
  • Self-driven and motivated, with a strong work ethic and a passion for problem solving

About Mesosphere

Mesosphere is leading the enterprise transformation toward distributed computing and hybrid cloud. We combine the rich capability you get from public cloud providers with the freedom and control of choosing your own infrastructure.

Mesosphere DC/OS is the premier platform for building, deploying, and elastically scaling modern applications and big data. DC/OS makes running containers, data services, and microservices easy across your own hardware and cloud instances.

Mesosphere helps businesses accelerate time to market, ensure resilient applications, and save on cloud and infrastructure costs. Backed by T. Rowe Price, Andreessen Horowitz, Khosla Ventures, Microsoft, HPE, Data Collective, and Fuel Capital, Mesosphere is headquartered in San Francisco with a second office in Hamburg, Germany.




Apply for this Job

* Required