- Collaborate and engage with product developers throughout the whole development lifecycle — from consulting on system design through to deployment, operation and refinement.
- Drive cross-team efforts to improve availability, reliability, and performance of services by influencing designs, architectures, standards, and methods.
- Conduct service capacity planning and demand forecasting.
- Build and operate systems that improve the availability, scalability, latency, and efficiency of services.
- Maintain health of systems and services by implementing real-time monitoring, visualization, and alerting.
- Solve problems relating to critical services and build automation to prevent problem recurrence.
- Participate in periodic on-call rotation.
- Bachelor's degree in Computer Science or related technical disciplines, or equivalent practical experience.
- Minimum 2 years of relevant working experience.
- Experience in one or more programming languages including C, C++, Java, Python, Go, and shell scripting.
- Knowledge and understanding of Unix/Linux operating systems and administration.
- Knowledge and understanding of networking, such as different protocols (TCP/IP, UDP, ICMP), DNS, and OSI layers.
- Knowledge and understanding of virtualization and container technologies.
- Familiarity with maintaining web services at scale.
- Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
- Experience with Kubernetes, and advanced features including service mesh.
- Experience with Google Cloud Platform or other public cloud.
- Experiences in deployment, maintenance and tuning of commonly used middleware.
- Experience with monitoring tools including Prometheus, Grafana, etc.
- Experience with configuration management tools including Ansible, SaltStack, etc.
- Experience with load balancing tools including Nginx, HAProxy, etc.