Job Responsibilities:

* Deploy, manage and maintain new and existing services.
* Responsible for Capacity and Resource management. Use pressure testing to measure, tune and optimize system performance.
* Identify and resolve problems relating to critical service operations. Participate in documentation of Standard Operating Procedure(SOP).
* Manage high severity incidents and high customer impact incidents focusing on fast detection and recovery.
* Develop automated technical operation tools/systems to eliminate repetitive manual operations.

Job Requirements:

* Bachelor's or higher degree in Computer Science, Information Systems or related fields.
* Hands-on experience with at least one of the programming languages: Bash, Go, Python
* Good command of Linux environment. Deep understanding of the Linux Operating System, including Kernel, Memory, Process, Threads, Static / Shared Libraries, IPC, Signals.
* Understanding of standard networking protocols such as: HTTP, DNS, SSL, TCP/IP, ICMP.
* Experience in large-scaled distributed environments. Familiarity with distributed systems including: the CAP Theorem, Microservices.
* Experience with container technology such as Docker, Kubernetes.
* Experience with monitoring tools like Prometheus, Zabbix.
* Strong sense of ownership, customer service, and integrity demonstrated.
* Passion for eliminating repetitive manual processes using automation.
* Fast learning ability and a good team player.
* Fluency in both written and spoken English is a must. Fluency in Mandarin is preferred.

Apply for this Job

* Required