Sr. Site Reliability Engineer
Summary:
The Cloud Platform team at Lucid is currently seeking a Senior Site Reliability Engineer. In this position, the individual will be responsible for providing reliability to the Cloud Platform that enabled Lucid Motors' Cloud-based Applications on various public and private Cloud Infrastructure.
Our ideal candidate exhibits a can-do attitude and approaches the work with vigor and determination. We are looking for a hands-on Software Engineering who will collaborate with other engineers to build, automate, and maintain the Cloud Platform and keep up with the SLAs.
Responsibilities
- Provide Reliability Engineering to one or more cloud services deployed and managed in the region of KSA.
- Continuous delivery using ArgoCD for various platform services.
- Autoscale and monitoring performance for Kubernetes and running applications using Prometheus and Grafana or similar tools.
- Performing SRE activities such as availability and reliability monitoring and reports.
- Deploy, configure and maintain tools such as EMQX, Kafka, Spark, Trino, Airflow, MQTT, and Microservices.
- Setting up infrastructure as a service using Terraform.
- Operate and Setup code repository with GitLab or Bitbucket.
- Support No-SQL databases such as MongoDB, Elastic Search and other databases such Victoria Metrics and Hive.
- Setup and monitor various applications and services. Continuously enhance the alerts and automate the recovery process.
- Participate in on-call rotation to keep up the service SLA per the business needs.
- Work with Product Owner, Engineering Manager, and team members in Agile Scrum and Kanban mode.
- Take appropriate actions by doing impact analysis during the incidents.
- Containerization and deployment of microservices and data pipeline on Kubernetes using Helm installation.
- Advocate for a DevOps culture of automation, self-service, and engineering best practices to enable development teams.
Qualifications:
- S. or M.S. degree in Computer Science, Engineering, OR equivalent work experience.
- Can speak English Fluently to communicate with teams across geographical regions.
- 5+ years of experience in SRE or DevOps Engineering.
- 3-6 years of experience deploying and maintaining applications that are built using Docker and orchestrated on Kubernetes on Public or Private Cloud Providers.
- 3-6 years of experience using Cloud Automation tools such as Terraform, Cluster API, or other frameworks.
- 3-6 years of experience in Programing or scripting languages using Python, Go, Bash/Shell, or others.
- 3-6 years of administrative operations knowledge in RDBMS such as Postgres and no-SQL such as Cassandra, MongoDB, or others.
- 3-6 years of experience with CD tools such as ArgoCD.
- Experienced with running large-scale distributed computing infrastructure for running Data Platforms using Spark, Hive, Trino, Zookeeper, and Kafka.
- Experienced with various debugging tools and troubleshooting performance bottlenecks at the infrastructure or at the application tier.
- Good to have experience with Config Management and automation using Ansible, Chef, Puppet, or others.
Lucid maintains your privacy according to its Candidate Privacy Notice. If you are a California resident, please refer to our California Candidate Privacy Notice.
At Lucid, we don’t just welcome diversity - we celebrate it! Lucid Motors is proud to be an equal opportunity workplace. We are committed to equal employment opportunity regardless of race, color, national or ethnic origin, age, religion, disability, sexual orientation, gender, gender identity and expression, marital status, and any other characteristic protected under applicable State or Federal laws and regulations.
By Submitting your application, you understand and agree that your personal data will be processed in accordance with our Candidate Privacy Notice. If you are a California resident, please refer to our California Candidate Privacy Notice.