Why this job is excellent:
TripAdvisor is the world’s largest online travel site, visited by 390 million travellers each month, and our business – TripAdvisor Rentals – is a fascinating and ever-evolving part of the organization. Rentals are a significant revenue generator and key growth area not just for TripAdvisor, but also for the travel industry on the whole. It’s a category that’s rapidly gaining popularity; around two-thirds of travellers say they plan to stay in a rental this year, up from 50% two years ago. Our job is to ensure that we fulfil this demand and, ultimately, help travellers all over the world take amazing holidays, whether they’re staying in a studio in Sydney, a tree house in Costa Rica, or a cottage in the Cotswolds.
As part of the Operational team, you’ll be on the frontlines of a rapidly growing infrastructure. We’re looking for a self-starter with deep OS knowledge as well as programming skills, and who has repeatable and sustainable systems management in mind.
We rely heavily on automation and software systems to improve our operational efficiency and accuracy, from inventory and provisioning of systems, to monitoring and auto-remediation issues happening in our environment. We also continuously codify behaviors and rules that are important for the systems to operate correctly and efficiently. A successful candidate must be like-minded and bring hybrid software and system engineering experience and insight to help build, operate, and maintain our rapid-growing infrastructure.
What You’ll Do:
- You will be a part of a fast moving team in a growing and constantly evolving production environment
- Responsible for reliability, availability and security of our infrastructure by continuously improving it as well as sharing a rotating on-call schedule with the team.
- You will troubleshoot issues across the entire stack: hardware, software, application and network
- You will be responsible for improving the reliability and resilience of our infrastructure through root-cause analysis and reviewing gaps in designs and implementations of our infrastructure.
- You will help maintain services once they are live by measuring and monitoring availability, latency and overall system health
Who You Are:
- Extensive experience handling services in a large scale environment
- Strong experience building and maintaining production systems within the AWS ecosystem
- Expert with Systems in AWS, on premise & Datacentre
- Experience with administrating a RDBMS in a production environment, preferably MySQL
- Strong knowledge of UNIX and TCP/IP network fundamentals.
- Thorough understanding of configuration management concepts. Experience (Puppet) Expertise creating continuous integration servers with tools like Jenkins and creating/maintaining production quality Docker images
- Experience with monitoring, metrics, and visualization tools (Icinga, Graphite, Prometheus, ELK, etc.)
- Ability to code really well in at least one programming language, and have done that to enhance existing software systems.
- Strong analytical, problem-solving, and communication skills
- Organized, good attention to details, and able to work both independently and with a team
- B.S. or higher degree in Computer Science or equivalent experience