The mission of OpsDev team is to energize TechOps' ability and power that control and manage massive resources and traffic in a highly efficient, accurate and consistent way. The team provides productional software, intelligent engines, and stable system architectures devote themselves to build a DevOps ecosystem to integrate all resources and tools, eliminates the gap between Ops and Dev. The main scope focuses on Global Traffic Schedule and Management Platform(NLB, ALB, GSLB, Hybrid CDN, DNS and etc), Hybrid Cloud Resource Schedule and Management Platform(Bromo, Hybrid Cloud Management, Mesos, Kubernetes, Container, Physical Server, VM, CICD and etc), Internal System(CMDB, SPACE, TOC and etc).
- Design and develop Shopee Cloud Native Traffic Scheduling Systems such as DNS, Load Balancer, CDN, API Gateway and Service
- Discovery; Evolve Shopee Cloud Native infrastructures and empower Shopee businesses via Cloud Native technology stacks.
- Optimize Shopee Cloud Native Traffic Scheduling Systems' throughput and latency continuously.
- Improve Shopee Cloud Native Traffic Scheduling Systems' availability, stability, security and extensibility; Ensure the smooth running of Shopee Cloud Native Traffic Scheduling Systems .
- Make Shopee Cloud Native Traffic Scheduling Systems' easier to use and more maintainable; Optimize processes in systems and reduce their learning cost based on daily support feedback and business requirements.
- Build automate and engineering solutions; Detect and fix potential problems in advance via TDD, chaos engineering and regular fire drills; Improve auto-heal to reduce unnecessary manual operations.
- Bachelor's or higher degree in Computer Science or related fields.
- Passionate about coding and programming, innovation, and solving challenging problems.
- In-depth understanding of computer science fundamentals (data structures and algorithms, operating systems, networks, databases, etc).
- Familiar with Linux development environments and multi-threaded programming; Experiences in development of large-scale distributed systems.
- Familiar with Linux dynamic tracing and performance profiling; Experience with software troubleshooting.
- Strong and hands-on experience with at least one of the programming languages: Go, Python, C++, Java.
- Strong logical thinking abilities.
Skills below are optional but preferable:
- SRE background, have hands-on experience for massive scale systems.
- Experience with DPDK, XDP, HttpDNS, VxLAN.
- Experience with Nginx, Tengine, OpenResty, LVS.
- Experience with Cloud Native technology stack such as Kubernetes, Prometheus, CoreDNS, Istio, Helm, etcd, Jenkins and etc.
- Experiences in the design and development of large-scale systems and platforms.
- Contributed to open-source projects.
- Published papers at top conferences like ASPLOS, Eurosys, NSDI, OSDI and etc.