About the Role
As a Site Reliability Engineer on this team, you will be responsible for the infrastructure of our core backend system and other future services. You’ll be tasked with optimizing our current infrastructure, identifying areas for improvement, and building tools and systems to automate processes where we can. With you at the helm, our systems should become more performant, reliable, and available.
Manage the full lifecycle of services -- from initial setup and release to day-to-day CI/CD. Should be passionate about building software to streamline and automate service deployment and operation for engineers.
Maintain services by monitoring and measuring availability, latency, and overall system health and iteratively drive improvement in these areas by introducing new technologies or developing new tools.
Provide vision and guidance for the evolution of our service architecture, with a focus on scalability and reliability.
Assist in early stage planning for new services through architecture and systems design reviews.
Evangelize and practice effective incident response management.
5+ years of experience in backend software engineering or systems engineering
Knowledge of Ruby, Java, C, C++, or Python.
Ability to analyze and debug complex software and infrastructure issues, and develop tools/systems for task automation.
Excellent communication skills -- need to be able to effectively manage communication with multiple engineering teams.
Experience working with AWS and web application deployment tools such as Heroku, and tools used to deploy and monitor cloud applications. Should have experience setting up and maintaining CI/CD pipelines with Jenkins or similar.