As the manager of the brand new Incident Management team, you will be responsible for co-creating and evolving a critical new capability for edX and its millions of learners: enabling our engineering squads to respond as fast as possible to production incidents while fostering the maximum amount of organizational learning. Your span of influence is all of edX engineering, product, and support, a majority of the company.
Your team will accomplish this by offering a number of services such as issue intake, routing, measurement, reporting, and communication. The team will also offer a level of pager defense and run-book execution for components that meet your bar.
Lead a team of Incident Management Specialists comprised of employees and long-term contractors
Define and roll out an incident response process in collaboration with your stakeholders
Coach, mentor, and provide general HR management for your team
Liaise between Support and Product Delivery to build relationships, foster transparency, and promote learning
Manage automated issue routing based on team ownership
Use data to understand trends and propose necessary technical and process improvements to increase efficiency
Establish off-hours coverage and execute runbooks
Qualifications & Skills:
5+ years of industry experience in a technical capacity related to software engineering, quality measurement or incident management
Experience leading a team or mentoring staff, both technically and professionally
Strength in project management with the ability to remain calm under pressure
Excellent cross-team communication and collaboration skills, including conflict resolution, presentation, and negotiation skills
Ability to identify risks in the system and pull in expertise when needed
Data-driven mindset: using data to inform your decision-making and proactively update processes. A knack for understanding which metrics are most important for the business outcomes we want.
Ability to speak both technical and non-technical languages to bridge engineering with other parts of the organization to facilitate problem-solving
Familiarity with fullstack development tools such as Jira, NewRelic, Google Analytics, Splunk, Python, Django, MySQL
Ability to design and build automation and create technical integrations between the tools we use: Zendesk, Jira, Google sheets and Tableau
A belief that process works for us, we don’t work for it. Process is a tool, not a belief system.
Education technology domain knowledge
Experience leading during high-pressure incidents
Experience working cross-functionally as well as with remote teams