At Magic Leap, we are looking for a Director of Production Engineering to own live site operations, observability platforms & incident management.
In this role, you would participate in architecture and standards discussion and bring thought leadership on topics of well architected design, recovery of systems from incidents/outages, scaling, monitoring and logging. This role would be a key leader working across engineering, product management, ops and customer care teams. Own incident management, set the discipline of production engineering using sound engineering principles, reestablish a physical NOC and SOC.
Operate mission critical services on a globally distributed level, using cloud hosting providers like AWS, GCP and more
Work with engineering leaders to build out an observability platform and setting SLA for each service
Own the day to day management of live site operations, incident management response & reestablish a physical NOC and SOC
Build and manage a world-class team of engineers & operators capable of scaling through a period of continued high-growth
Attract top tier talent to match this level of growth
10 or more years of management experience
Hands-on knowledge of DevOps, SRE, and CI/CD practices in a public cloud environment
Experience implementing instrumentation and monitoring solutions of at scale distributed systems
Experience in a public cloud deployment environment with Kubernetes supported by industry-standard provisioning and monitoring services like Terraform, Sumologic, Datadog, Prometheus
Experience with Linux/Unix internals and systems services like DNS, DHCP, TFTP, iptables, smtp, etc.
Experience with networking & cybersecurity operations
BS or MS in Computer Science, Engineering or a related technical discipline or equivalent experience
All your information will be kept confidential according to Equal Employment Opportunities guidelines.