Principal, Cloud Engineer
Magic Leap is looking for a Principal Cloud Engineer for the Site Reliability team. In this role, you will focus on leveraging cloud platform infrastructure services, and processes at their best to help increase infrastructure and system efficiencies, productivity, scalability, and system quality improvements at a 10X scale.
This role will be responsible for diving deep into technical problems, understanding current customer pain points, system domain, current platform limitations, and then work closely with engineering teams, software leaders to design and architect new systems and services leveraging cloud infrastructure that can scale 10X better, is reliable and is highly available. This includes providing system architecture artifacts, accountability on cloud infrastructure choices that can perform and scale at 10X level but are still efficient, guidance on using the right machine type, and network designs.
- Serve as a technical architect on cloud infrastructure layer.
- 100% accountable for the quality, architecture, and design of systems and platforms.
- Solves complex technical problems in our multi-cloud environment plus improves operations excellence.
- Strong ability to deep dive to understand current systems cloud architecture to provide strategic partnership and improvements on network usage, role management, CDN, domains, etc.
- Analyze complex distributed production deployments and recommend ways to optimize performance and/or automate processes by managing continuous integration servers, utilizing monitoring and testing tools.
- Identify opportunities to make disruptive improvements in cloud infrastructure usage, operations, and services with a high degree of systematic automation.
- Possess expert knowledge in performance (millisecond latencies), scalability, availability (99.99% uptime), enterprise architecture best practices.
- Strong technical, analytical, and design capability to understand common and shared platforms, web, and services API with underlying data to provide appropriate network, infrastructure recommendations.
- Exert technical influence over multiple teams, increasing their productivity and effectiveness by sharing your deep knowledge and experience.
- Strong problem-solving skills, analytical capabilities, and attention to detail.
- Strong cultural change management experience.
- Sound fundamentals in UNIX-based systems including proficiency with UNIX tools like SSH, grep, sed, awk, find, etc.
- A solid understanding of networking and core Internet protocols (e.g., TCP/IP, DNS, TLS, SMTP, HTTP)
- Strong programming skills in a modern language. Go, Python, Node.js, etc.
- Ability to script in a shell language (Bash or POSIX Shell).
- Experience with public cloud providers (AWS, Google Cloud Platform, etc.)
- Experience working with container runtimes (Docker, containerd, etc.)
- Experience working with container-orchestration systems (Kubernetes, ECS, etc.)
- Comfort with frequent, incremental code testing and deployment.
- Strong grasp of automation tools (Terraform, Gitlab CI, Concourse CI, etc.)
- Ability to remain calm under pressure and take command of a recovery effort.
- Excellent cross-group collaboration, outstanding verbal and written communication.
- 12+ years of experience working in a software engineering or development role.
- 5+ years of experience leading system design, architecture leveraging AWS/GCP services.
- 5+ years of experience in building high-performance, highly-available and scalable distributed systems in the cloud.
- BA/BS in Computer Science or equivalent experience
All your information will be kept confidential according to Equal Employment Opportunities guidelines.