- Coordinate datacenter operations tasks with remote DCOE staff (server/rack/row/cage provisioning, rolling replacements, power & temperature management, etc.)
- Design and develops automation toolsets to help drive efficiency in Agoda’s IT infrastructure (bare metal deployment, software installation/patching, monitoring and remediation, etc.).
- Conducts performance tuning and troubleshooting investigations, working across the entire organization
- Provide expert advice and guidance to other infrastructure team staff and software developers; can effectively mentor less experiences staff.
- Lead and manage implementation projects from end to end, working across multiple team and departments.
- Manages and operates the internal platform systems (OpenStack, Kubernetes, internal configuration management and deployment tools)
- Manage incidents and daily operational tasks on production and development environments, occasionally outside of business hours
- At least 5 years of IT operations experience with Windows and Linux servers in large environments is a MUST
- Practical experience selecting, deploying and managing HPE servers.
- Excellent troubleshooting skills, capable to break down issues into testable hypotheses and develop tools to assist during troubleshooting. Can troubleshoot “full stack” issues
- Good knowledge of networking architecture within complex e-commerce environments
- High sense of ownership. Actively looks for lingering problems and proactively fixes them
- Good English skills, strong analytical skills, eager to learn new things
- Able to work under pressure and deliver projects on time.
- Self-motivated, approachable, adaptable, with have excellent communication skills (both written and verbal)
- Practical knowledge of Kubernetes or Docker or OpenStack operations and APIs
- A dev-ops background with experience in CI/CD procedures would be of advantage