As the senior manager of site reliability engineering (SRE) you own the hosting infrastructure, hosting strategy and roadmap for edX, ensuring our infrastructure is highly scalable, resilient, and aligned with business needs. edX is an online platform where over 30 million people from around the world go to learn. The scope of your work will enable learners to find courses that move their careers forward and change their lives.
The SRE squad is split into two different work streams: Infrastructure Site Reliability (3 people) and Product Site Reliability (5 people). The Infrastructure SRE’s own and manage the central shared services like our CI/CD, centralized logging, and our kubernetes clusters. The Product SRE is focused on supporting the operations needs of our product delivery teams by creating automations to eliminate rote tasks, training developers with new operations skills, and augmenting teams through additional capacity.
- Act as the technical product owner for our infrastructure
- Meet and coordinate with product delivery teams to understand their needs and identify opportunities to prioritize new projects, ongoing support, and technical debt
- Be responsible for the hosting budget, cost containment, and vendor management of AWS (cloud infra), CloudFlare (CDN), and New Relic (monitoring)
- Own hosting security and compliance, partnering closely with the security working group, compliance officers, and engineering teams to drive improvements across all areas
- Own the nascent embedded SRE program to up-skill delivery teams and safely create more team autonomy
- Invest in the skill and career growth of your direct reports through regular 1 on 1 meetings
- Work with the open-source community to coordinate our infrastructure roadmap with community needs
- Manage the backlog and initiative priorities for the two SRE teams.
- Prior experience managing people
- A track record with roadmap/prioritization decisions
- Experience with and a commitment to modern SRE practices like configuration as code, infrastructure as code, error budgets, etc.
- Hands on cloud infrastructure experience and familiarity with containerization and orchestration systems, ideally Kubernetes
- Outcomes focus
- Contract/Vendor management
- Tools / Internal customer experience
- Managing leads. Leadership coaching skills.
- Experience with AWS (what we use) and other cloud services
We understand that applying for a job can be intimidating. Applicants rarely meet every single job requirement, and we know there are many skills and backgrounds that will contribute to success in this role.
That’s why we provide new employees with:
- Employee on-boarding and training sessions
- Personalized 30/60/90+ day plans
- Individual quarterly and annual goals
- Career pathways
And much more to support you in your personal journey at edX! That said, if this role looks like a great next step for you, please apply… even if you can’t “check every box.” We’d love to hear from you!
edX is the education movement for restless learners. Together with our founding partners Harvard and MIT, we’ve brought together over 30 million learners, the majority of top-ranked universities in the world, and industry-leading companies onto one online learning platform that supports learners at every stage. And we’re not stopping there—as a global nonprofit, we’re relentlessly pursuing our vision of a world where every learner can access education to unlock their potential, without the barriers of cost or location.