Argo AI was founded to tackle one of the most challenging applications in computer science, robotics and artificial intelligence with self-driving vehicles. Argo AI is developing and deploying the latest advancements in artificial intelligence, machine learning and computer vision to help build safe and efficient self-driving vehicles that enable these transformations and more. The challenges are significant, but we are a team that believes in tackling hard, meaningful problems to improve the world.
We are building a high-performance team that is excited by complex engineering challenges and is passionate about making transportation safer, more affordable and accessible for all.
Argo AI Site Reliability Engineers are responsible for building and running our mission-critical systems. Through the implementation of monitoring and automation, our SREs constantly ensure the health, reliability, scalability, and performance of Argo AI’s infrastructure. The Site Reliability team works together with engineering teams, IT, and Security to address unique business challenges through comprehensive solutions while taking into account system uptime, reliability, and maintainability. Members of the team are expected to promote the importance of resiliency patterns to other teams within Argo AI, as well as contribute to a culture of continuous learning.
What you’ll do:
- Design and implement scalable distributed systems to facilitate the development of self-driving vehicles
- Monitor and maintain mission-critical production services to ensure maximum uptime
- Document actions to build a comprehensive library of runbooks, which will act as a knowledge base and foundation for automation
- Scale the reliability and velocity of our systems and processes through increased automation
- Participate in an on-call rotation and culture of continuous improvement through blameless postmortems
What we’re looking for:
- Degree in Computer Engineering, Computer Science, Electrical Engineering, Robotics or a related field
- Expertise in at least one scripting language (e.g. Bash, Python)
- Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems
- Strong experience scaling and securing services in the cloud (AWS, GCP) or cloud native environments
- Experience using infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation)
- Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget
- Experience implementing and debugging cloud native and open source tools such as Kubernetes, etcd, Prometheus, FluentD and Istio
- Strong communication skills and the ability to work effectively in a diverse and distributed team
At Argo AI, we have a strong emphasis on creating a highly effective team environment. Thus, we seek candidates that can work effectively with others across a broad range of disciplines.
Argo AI is an equal opportunity employer that believes in diversity as a strength and is committed to creating an inclusive environment for all employees.
We know it takes competitive benefits to fuel a team that works hard and enjoys the challenge. At Argo AI, you can expect stellar perks to support your best self:
- High-quality individual and family health, dental, and vision insurance
- Competitive compensation packages
- Employer-matched 401(k) retirement plan
- Paid parental leave
- Unlimited vacation
- Daily catered lunches and snacks
- Free onsite or adjacent parking
- Commuter reimbursement
- Fitness reimbursement
- Professional development reimbursement
Argo AI is a LinkedIn Top 50 Startup