Site Reliability Engineer is on a mission to manage all aspects of technical operations to ensure all Sea services and platforms (Shopee, AirPay, and Garena) are running healthily 24/7 and to provide supreme user experience to our customers. As a part of this team, your tasks include but not limited to setup and maintain monitoring systems, design and build high availability service architecture, manage huge numbers of servers using automation tools and build up operational platforms.
- Setup, manage and maintain Sea product (Shopee, AirPay and Garena) applications and services;
- Participate in product system design, optimization and capacity planning;
- Setup and maintain monitoring of technical performance and statistics of Sea (Shopee, AirPay and Garena) products;
- Communicate and coordinate with Product Managers, Developers and Infra team;
- Perform regular and ad-hoc server-side deployments, releases and troubleshooting;
- Prepare routine operation documentation.
- Bachelor’s or higher degree in Computer Science, Engineering, Information Systems or related fields;
- Extensive and hands-on knowledge with Linux operating system (Ubuntu, CentOS, etc.);
- Knowledge of Computer Network (TCP/IP, DNS, etc.), Computer Organisations and OS;
- Hands-on experience with at least one of the programming languages: Bash, Python, Lua;
- Strong analytical and problem-solving skills with the ability to thrive under difficult and stressful situations;
- Good time management skills to work efficiently;
- Passion and high sense of responsibility for work;
- Fast learning ability and a good team player;
- Detailed-oriented, cautious and prudent;
- Open to fresh graduates who are passionate about technical operations of internet products.
Skills below are optional but preferable
- Experience with automation tools like Ansible;
- Experience with monitoring tools like Nagios, Zabbix, Grafana, Prometheus, etc.;
- Experience with load balancing tools like LVS, Nginx, Openresty or HAProxy;
- Experience with container technology such as Docker, Kubernetes, Apache Mesos;
- Experience with High Availability system design and Server Deployment Process;
- Experience with DevOps.