Senior Software Engineer, Site Reliability Engineer - Infrastructure
Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that DiDi's services have reliability and uptime appropriate to users' needs and a fast rate of improvement while keeping an ever-watchful eye on capacity and performance.
- Engage in and improve the whole lifecycle of services—from inception and design, through deployment, operation, and refinement.
- Support services through activities such as developing software platforms and frameworks, system design consulting and capacity planning.
- Maintain services by measuring and monitoring availability, latency and overall system health.
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity.
- Practice sustainable incident response and blameless postmortems.
- B.S. in computer science or related technical field and 5+ years related experience.
- Experience with algorithms, data structures, complexity analysis and software design.
- Experience with Unix/Linux operating systems internals and administration.
- Experience with one or more of the following: C, C++, Java, Go, Python, Perl or Shell.
- Experience with Postgresql/Mysql RDBMS experience.
- Interest in designing, analyzing and troubleshooting large-scale distributed systems.
- A systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive.
- Ability to debug and optimize code and automate routine tasks.
- Experience with networking (e.g., TCP/IP, routing, network topologies and hardware, SDN)
- Experience with container/virtualization.
- Fluent in spoken Mandarin Chinese