Strava is Swedish for “strive,” which epitomizes our attitude and ambition: We’re a passionate and committed team, unified by our mission to build the most engaged community of athletes in the world. Every day, we’re searching for new ways to inspire athletes and make the sports they love even more fun. But it’s not only about achieving – we’re an inclusive team, dedicated to elevating each other and the members of our community. That balanced approach has helped us revolutionize our industry, and we’re just getting started. Millions of athletes are on Strava, millions more will come. When you’re ready for a challenge and a team that will support you along the way, join us.
About this Job
As a Senior Site Reliability Engineer, you'll work with a talented team of other SREs and platform engineers to deliver highly scalable and reliable systems. This role is for an engineer with a strong knowledge of system administration and who appreciates well written and tested code.
We're running entirely on AWS and have implemented a microservice architecture behind a large Rails application with components primarily written in Ruby, and Scala, with a large and growing percentage of it running on Mesos, Marathon, and Docker. We aim to be state of the art when it comes to deployment, monitoring, and scaling. In addition to handling web, mobile, and API traffic, we perform background stream and batch data processing on a variety of workloads, from just-in-time upload processing to building heat maps for the world. We’ve grown significantly and we’re looking for someone to help define and bring to reality the next phase of growth of our system infrastructure. This role is based in San Francisco, CA or Denver, CO.
We'd love to talk to you about the future of Strava’s server infrastructure and your role in it. Please take a look at the links below to learn about the exciting work we are doing.
- The Engineering Blog covers a wide range of topics, from how we rebuilt our leaderboard systems to how we have refined our interview process.
- Strava Labs shows off some of our R+D efforts, and gives a sense of the power and scale of Strava’s datasets.
- Design, build and maintain highly scalable, fault-tolerant and performant systems in conjunction with our product engineering teams, with a focus on security.
- Build, improve and maintain our cloud-based infrastructure - from the load balancers to the databases - and internal services such as configuration management, monitoring, load testing, and deployment.
- Help define how server-side software at Strava is built now and into the future.
- Perform cloud-based migrations with close to zero down-time.
- Troubleshoot issues and outages across the entire stack.
- Participate in a 24/7 on-call rotation.
- B.S. in Computer Science or equivalent work experience.
- Experience with cloud-based services, especially Amazon AWS.
- Familiarity with security best practices in a cloud environment
- Experience scaling high traffic web applications.
- Familiarity with configuration management tools such as Ansible, Puppet or Chef.
- Ability to prioritize tasks and work independently.
- Disciplined approach to testing and quality assurance.
- Proficiency in at least one scripting language (Ruby, Python, Perl).
- Proficiency with Linux.
- Experience with Ruby on Rails, Scala, Finagle
- Experience in MySQL (especially RDS/Aurora), Redis, Kafka, Cassandra administration.
- Experience with vulnerability detection and mitigation, AWS credential and certificate management.
- Experience with Mesos, Marathon, Spark