Responsibilities:

  • Collaborate and engage with product developers throughout the whole development lifecycle — from consulting on system design through to deployment, operation and refinement.
  • Drive cross-team efforts to improve availability, reliability, and performance of services by influencing designs, architectures, standards, and methods.
  • Conduct service capacity planning and demand forecasting.
  • Build and operate systems that improve the availability, scalability, latency, and efficiency of services.
  • Maintain health of systems and services by implementing real-time monitoring, visualization, and alerting.
  • Solve problems relating to critical services and build automation to prevent problem recurrence.
  • Participate in periodic on-call rotation.

Minimum Qualifications:

  • Bachelor's degree in Computer Science or related technical disciplines, or equivalent practical experience.
  • Minimum 2 years of relevant working experience.
  • Experience in one or more programming languages including C, C++, Java, Python, Go, and shell scripting.
  • Knowledge and understanding of Unix/Linux operating systems and administration.
  • Knowledge and understanding of networking, such as different protocols (TCP/IP, UDP, ICMP), DNS, and OSI layers.
  • Knowledge and understanding of virtualization and container technologies.
  • Familiarity with maintaining web services at scale.

Preferred Qualifications:

  • Expertise in designing, analyzing, and troubleshooting large-scale distributed systems.
  • Experience with Kubernetes, and advanced features including service mesh.
  • Experience with Google Cloud Platform or other public cloud.
  • Experiences in deployment, maintenance and tuning of commonly used middleware.
  • Experience with monitoring tools including Prometheus, Grafana, etc.
  • Experience with configuration management tools including Ansible, SaltStack, etc.
  • Experience with load balancing tools including Nginx, HAProxy, etc.

 

Apply for this Job

* Required