The mission of OpsDev team is to energize TechOps' ability and power that control and manage massive resources and traffic in a highly efficient, accurate and consistent way. The team provides productional software, intelligent engines, and stable system architectures devote themselves to build a DevOps ecosystem to integrate all resources and tools, eliminates the gap between Ops and Dev. The main scope focuses on Global Traffic Schedule and Management Platform(NLB, ALB, GSLB, Hybrid CDN, DNS and etc), Hybrid Cloud Resource Schedule and Management Platform(Bromo, Hybrid Cloud Management, Mesos, Kubernetes, Container, Physical Server, VM, CICD and etc), Internal System(CMDB, SPACE, TOC and etc).
- Design and develop Shopee Cloud Native Computing Platform, including provisioning, virtualization, container runtime, scheduling and orchestration; Evolve Shopee Cloud Native infrastructures and empower Shopee businesses via Cloud Native technology stacks.
- Improve Shopee Cloud Native Computing Platform's stability, scalability, sustainability and security; Ensure the smooth running of Shopee Cloud Native Computing Platform.
- Improve resource utilization of the Shopee Computing Platform; Optimize the scheduling model for the mixed running of online services and batch jobs on large-scale.
- Enhance workload isolation on the Shopee Computing Platform; Improve the resource control of containers and virtual machines in memory bandwidth, disk IO, and network QoS.
- Make the Shopee Cloud Native Computing Platform easy to use and maintain; Optimize processes in Shopee Computing Platform and reduce its learning cost based on daily support feedback and business requirements.
- Develop and implement automation and engineering solutions; Detect and fix potential problems in advance via TDD, chaos engineering and regular fire drills, and to react quickly to incidents to reduce unnecessary manual operations and improve response time.
- Bachelor's or higher degree in Computer Science or related fields.
- Passionate about coding and programming, innovation, and solving challenging problems.
- In-depth understanding of computer science fundamentals (data structures and algorithms, operating systems, networks, databases, etc).
- In-depth understanding of Linux internals, such as cgroups v2, namespaces, KVM, etc.
- Strong and hands-on experience with at least one of the programming languages: Go, Python, C++, Java.
- Familiar with Linux dynamic tracing and performance profiling;
- Experience with software troubleshooting.Strong logical thinking abilities.
Skills below are optional but preferable:
- SRE background, have hands-on experience for massive scale systems.
- Experience with Cloud Native technology stack such as Kubernetes, Prometheus, CoreDNS, Istio, Helm, etcd, Jenkins and etc.
- Experiences in the design and development of large-scale systems and platforms.
- Contributed to open-source projects.
- Published papers at top conferences like ASPLOS, Eurosys, NSDI, OSDI and etc.