Site Reliability Engineer
Tusimple was founded in 2015 with the goal of bringing the top minds in the world together to achieve the dream of a driverless truck solution. With a foundation in computer vision, algorithms, mapping, and Artificial Intelligence, Tusimple is working to create the first commercially viable autonomous truck driving platform with L4 (SAE) levels of safety.
Job Description
The Site Reliability Engineer will be a part of a team working on a variety of software engineering tasks to create and maintain scalable solutions and reliable software systems for our autonomous truck platform. You will have an opportunity to impact backend services such as fleet monitoring, machine learning and continuous integration among others.
Responsibilities
- Understand, deploy and provide technical support for infrastructure systems
- Engage in system design and development from the perspective of SRE
- Responsible for identifying and mitigating real and potential system problems and issues
- Ensure and improve security, stability and scalability by creating new code and scripting
- Perform appropriate coding, code commenting, debugging, bug fixing, and other supplementary related activities
- Diagnose software system, and hardware failures that impact deployment
- Automate installations and configuration management
Qualifications
- M.S. or B.S. in Computer Engineering and/or Computer Science
- Strong communication skills and the ability to work across technical teams
- Experience with root cause analysis of a system or program
- Enterprise Linux administrator experience - Debian based systems preferred
- In-depth experience deploying, maintaining, monitoring, and logging network devices
- Fundamental knowledge of TCP/IP stack, application protocols
- Strong understanding of Linux networking operations and functionality including DHCP, DNS, and static addressing
- Comfortable with using git and Github for source control
- Experience with programming in Python
- Strong unit testing and debugging skills
- Familiarity with automated deployment strategies
- Understanding of basic Layer 2 / Layer 3 networking concepts such as subnetting and VLANs
- Experience with container-based architecture, such as Docker and Kubernetes
- Familiarity with CI/CD concepts or platforms
- Experience with basic automation/configuration management tools, such as Ansible, Chef, or Puppet
- Availability to rotate on call shifts
Bonus Points
- Experience with Atlassian products, such as Jira or Confluence
- Experience with switching (channeling, trunking, virtualization, stacking, QoS, spanning-tree)
- Familiarity with automotive grade or ruggedized servers/systems/hardware
- Experience compiling Linux kernels and utilization of kernel compilation parameters
- Experience with Intel or Nvidia hardware in Linux environments
- Understanding of basic Linux security systems and implementation
- Experience with web frameworks (Flask, Node.js, Django, Vue, ReactBootstrap, jQuery, etc)
Perks
- Competitive salary and benefits
- Bonus/paid vacations/insurance
- Daily breakfast, lunch, and dinner
- Full Kitchen with unlimited snacks and fruits
- Medical, Vision, and Dental insurance plan
- Company 401(K) program
- Company paid life insurance
Tusimple is an Equal Opportunity Employer. This company does not discriminate in employment and personnel practices on the basis of race, sex, age, handicap, religion, national origin or any other basis prohibited by applicable law. Hiring, transferring and promotion practices are performed without regard to the above-listed items.