Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract.
As a company, we constantly challenge what’s possible, never simply accepting what has always been done. We reframe old problems, seek new solutions and operate comfortably in areas that are unknown. Our backgrounds are diverse, but our team shares a love of the outdoors and a desire to protect it for future generations.
We operate development centers in Plymouth, Michigan; Southern California (Irvine, Carson & LA); Silicon Valley (San Jose and Palo Alto); Vancouver, British Columbia; and Surrey, England; as well as a manufacturing facility in Normal, Illinois.
Rivian’s Digital Technology Team is responsible for the end-to-end implementation of the digital experience outside the vehicle (e.g. vehicle configurator, payment gateway, vehicle delivery management, service scheduling) across web, mobile app and in-store. To that end, we are developing a world-class technology platform that will make learning about and purchasing electric adventure vehicles intuitive, seamless and fun. We are seeking a Principal Engineer who will guide and develop our platform infrastructure across cloud, engineering productivity, and site reliability, in creating best practices and solutions to keep the Rivian Digital Technology sites and applications highly available and reliable. This is an exciting role working with software engineering teams from the ground up to build cloud-based solutions using the latest technologies, tools, and practices. The right candidate will be passionate about platform infrastructure and automation, and plan to be extremely hands on writing code and working on solutions.
- Work across the organization and engineering teams to deliver high quality products and solutions that delight Rivian customers.
- Work with engineering teams to design robust cloud-based architectures and redundant, fault tolerant solutions utilizing practices around CICD, blue-green deployments, canary testing, and traffic management.
- Define non-functional requirements (NFRs) for engineering teams around security, logging, monitoring, alerting, configuration, and testing and work with those teams in their implementations of apps and services.
- Develop runbooks and standard operating procedures (SOPs) for each service and application to ensure DevOps and SRE teams can detect incidents or issues before customers are impacted and act quickly to restore impacted services.
- Define practices and procedures around postmortems and root cause analysis to ensure service quality and maintainability KPIs are improving and downtime and service interruption are negligible.
- Train and develop the abilities of less experienced team members and help build a culture of responsibility and ownership.
- Work collaboratively with various stake holders to provide team-based solutions, creating a culture of inclusion and diversity of skillsets.
- Participate in a 24x7 on-call rotation and define and implement on-call practices and procedures.
- 15+ years in a technical role such as senior engineer, lead, or architect in SW engineering, DevOps, or SRE functions
- 10+ years of experience being responsible for the uptime and reliability of customer facing web applications, critical services or mobile systems.
- 10+ years of experience maintaining and administrating large scale Linux based environments with best practices for security and automation.
- 10+ years of experience providing and maintaining cloud based infrastructure such as AWS, GCP, Azure with broad experience in Infrastructure as Code (IaC) solutions such as Terraform, Terragrunt, Atlantis etc.
- 7+ years implementing and maintaining monitoring and alerting systems, creating service level indicators (SLIs), service level objectives (SLOs), and focusing on systems that self heal or alert teams to take action before system downtime.
- 7+ years designing and operating fault tolerant systems, with zero to no downtime.
- Expert knowledge of Kubernetes (K8S) and distributed computing, containers, and scaling
- Expert knowledge of network architectures, security, and troubleshooting of connectivity or latency issues.
- Comfortable managing several thousand node deployments and the automation it takes to ensure system uptime and redundancy.
- Bachelor’s degree in computer science, electrical engineering, information systems or equivalent work experience
This is where you’ll work:
Department: Digital Technology
Location: Palo Alto
Rivian is an Equal Opportunity Employer and Prohibits Discrimination and Harassment of Any Kind: Rivian is committed to the principle of equal employment opportunity for all employees and to providing employees with a work environment free of discrimination and harassment. All employment decisions at Rivian are based on business needs, job requirements and individual qualifications, without regard to race, color, religion or belief, family or parental status, or any other status protected by the laws or regulations in the locations where we operate. Rivian will not tolerate discrimination or harassment based on any of these characteristics. Rivian encourages applicants of all ages.
We take your privacy seriously. For details please see our Candidate Privacy Notice.