Company: Argo AI GmbH
Who we are:
Argo AI is a global self-driving products and services company on a mission to make the world’s streets and roadways safe, accessible, and useful for all. Our technology is built to enable commercial services for autonomous delivery and ridesharing in cities.
With experienced leaders in the field and collaborative partnerships with some of the world’s top consumer brands, we’re working block by block, city by city to empower people and businesses to be more successful. We’re individuals driven by strong values to solve complex problems together. Come join us to reimagine the human journey.
Meet the team:
The Cloud Compute Platform team is building a hybrid-cloud platform which enables teams to run different types of workloads that contribute to Argo AI’s success. From large scale batch job processing that helps our business make data-driven decisions, to microservices that enable functionality to our end customers, we are on a mission to provide a first class platform which satisfies our customers unique use cases so they can focus on their goals.
As a Site Reliability Engineer on the team, you will help build and run this platform. You will contribute to the whole stack with a deeper focus but not limited to building tools that ensure our platform stays healthy and its reliability is measurable as it evolves to keep up with the state of the art.
What you’ll do:
- Build monitoring to ensure our platform is healthy and its reliability measurable
- Build alerting and a set of runbooks to enable faster detection and remediation of platform issues
- Debug complex issues that may combine multiple components of the stack and ensure proper fixes are implemented to prevent these issues from happening again
- Participate in an on-call rotation and culture of continuous improvement through blameless postmortems
- Design and implement components of the platform to enable features that make the work of our customers possible, simpler and more efficient
- Build Kubernetes controllers to automate operations
What you'll need to succeed:
- Degree in Computer Engineering, Computer Science, Electrical Engineering, Robotics or a related field
- Fundamental understanding of Linux operating system internals, TCP/IP networking, and storage subsystems
- Hands on development in Go or Python to create robust software that can run reliably in production
- Strong experience scaling and securing services in the cloud (AWS, GCP) or cloud native environments
- Experience using infrastructure-as-code principles to automate the creation of infrastructure resources (e.g. Terraform, CloudFormation)
- Experience authoring and maintaining Kubernetes Controllers in Go.
- Experience running Kubernetes and related core components in a large-scale, production environment
- Experience with metrics (e.g. Prometheus), logging (e.g. Elasticsearch, Loki) and tracing (e.g. Jaeger, Tempo) systems
- Understanding of engineering design limitations and ability to provide guidance to teams to scale their services to achieve desired performance within budget
- A focus on increasing service reliability through defining and adhering to SLOs.
- Strong communication skills and the ability to work effectively in a diverse and distributed team
Nice to have:
- Experience managing Kubernetes clusters in hybrid-cloud environments
- Experience with declarative cluster lifecycle management tooling (e.g. Cluster API, Crossplane)
- Kubernetes certifications such as CKA, CKS, or CKAD
What we offer you:
- Competitive compensation packages
- 30 vacation days
- Subsidized daily lunches, beverages, and snacks
- Professional development reimbursement
- Global Employee assistance program (Offerings include: work-life balance support, mindfulness programs, life coaching, new parent coaching, and more!)
- Local and global discount programs
- Company and team bonding outlets: employee resource groups, quarterly team activity stipend, and wellness initiatives
Argo AI was founded in 2016 by industry experts with extensive experience building robotic systems for commercial applications. Our once-small team has since grown into an over 1,700-person strong company with strategic partnerships with some of the world’s leading consumer brands. With global headquarters in Pittsburgh, we operate in eight cities across the U.S. and Germany in areas where self-driving technology can have the biggest impact on improving safety, traffic, and transportation equity.
At Argo AI, we believe that embracing differences delivers superior results. We are an equal opportunity employer that is committed to an inclusive environment for all employees.