Rivian is on a mission to keep the world adventurous forever. This goes for the emissions-free Electric Adventure Vehicles we build, and the curious, courageous souls we seek to attract.
As a company, we constantly challenge what’s possible, never simply accepting what has always been done. We reframe old problems, seek new solutions and operate comfortably in areas that are unknown. Our backgrounds are diverse, but our team shares a love of the outdoors and a desire to protect it for future generations.
The Sr. DevOps Manager will be responsible for managing teams of individual DevOps focused engineers working directly with Software Engineering teams to build Cloud Infrastructure, CICD Automation and Deployments and Site Reliability. You will be involved in designing and implementing technical solutions to support the needs of software engineering teams across our: E-commerce services, service operations, charging, fleet management and more. This is an exciting role working with software engineering teams from the ground up to build cloud-based solutions using the latest technologies, tools, and practices. The right candidate will be passionate about DevOps philosophy, CICD, and reliability and have years of experience helping engineering teams with compute and storage infrastructure along with building automation to deploy applications and services in a continuous delivery approach.
- Work across the organization and engineering teams to deliver high quality products and solutions that delight Rivian customers.
- Work with engineering teams to design robust cloud-based architectures and redundant, fault tolerant solutions utilizing practices around CICD, blue-green deployments, canary testing, and traffic management.
- Ensure infrastructure architecture is fully compliant with security and safety guidelines.
- Perform cost review of existing resources; assess opportunities for reducing costs or infrastructure footprint.
- Define non-functional requirements (NFRs) for engineering teams around security, logging, monitoring, alerting, configuration, and testing and work with those teams in their implementations of apps and services.
- Develop runbooks and standard operating procedures (SOPs) for each service and application to ensure DevOps and SRE teams can detect incidents or issues before customers are impacted and act quickly to restore impacted services.
- Train and develop the abilities of less experienced team members and help build a culture of responsibility and ownership.
- Work collaboratively with various stake holders to provide team-based solutions, creating a culture of inclusion and diversity of skillsets.
- Participate in a 24x7 on-call rotation and help define and implement on-call practices and procedures for your teams.
- Bachelor’s degree in computer science, electrical engineering, information systems or equivalent work experience.
- 6+ years in a technical role such as senior engineer, lead, or architect in SW engineering, DevOps, or SRE functions.
- 3+ years of experience managing a team of engineers, providing feedback reviews, career guidance, and leading technical direction for the team.
- Being responsible for and own the uptime and reliability of customer facing web applications, critical services.
- 10+ years of experience maintaining and administrating large scale Linux based environments with best practices for security and automation.
- 10+ years of experience providing and maintaining cloud-based infrastructure such as AWS, GCP, Azure, or internal data center solutions based on VSphere, Openstack etc.
- 7+ years implementing and maintaining monitoring and alerting systems, creating service level indicators (SLIs), service level objectives (SLOs), and focusing on systems that self-heal or alert teams to take action before system downtime.
- 7+ years designing and operating fault tolerant systems, with zero to no downtime.
- Knowledge of network architectures, security, and troubleshooting of connectivity or latency issues.
- Comfortable managing several thousand node deployments and the automation it takes to ensure system uptime and redundancy.
- Experience in Infrastructure as Code (IaC) using Terraform or CloudFormation and configuration management solutions such as Chef/Puppet/Ansible.
- Experience with K8S/EKS and Container or Serverless Lambda Architectures.
- Experience with implementing robust CICD automated Pipelines.
- Experience with DynamoDB, or other cloud database services.
Rivian is an equal opportunity employer and complies with all applicable federal, state, and local fair employment practices laws. All qualified applicants will receive consideration for employment without regard to race, color, religion, national origin, ancestry, sex, sexual orientation, gender, gender expression, gender identity, genetic information or characteristics, physical or mental disability, marital/domestic partner status, age, military/veteran status, medical condition, or any other characteristic protected by law.
Rivian is committed to ensuring that our hiring process is accessible for persons with disabilities. If you have a disability or limitation, such as those covered by the Americans with Disabilities Act, that requires accommodations to assist you in the search and application process, please email us at email@example.com.
We take your privacy seriously. For details please see our Candidate Privacy Notice.
Please note that we are currently not accepting applications from third party application services.