TripAdvisor is looking for a Senior Platform Software Engineer to join our team which is responsible for our private cloud platform, enabling our engineering teams to leverage cloud-like features and convenience but retain the cost benefit and performance of on-premise data centers and bare-metal servers. This role focuses on our compute platform, and specifically the compute platform powered by Kubernetes.
One of our compute platforms that we focus heavily on today is on-premise Kubernetes clusters over bare-metal servers, and we offer this compute platform at large scale in multiple datacenters in multiple regions to our internal engineering teams. Our main mission is to evolve this compute platform to provide functionalities that our internal engineering teams need to deliver their products and meet their objectives rapidly, and to provide tooling and processes to simplify and streamline their use of the platform and the functionalities, while achieving high reliability and performance.
In this role you will
- help identify the pain points and inefficiencies in the use of our compute platform, and to expand its functions,
- collaborate with users to design solutions,
- build business cases to justify our investment and effort,
- design and implement the solutions, focusing not just on technical capability but also usability and supportability,
- iterate the implemented solutions as the environment and usage changes
One example project we are working on is how we handle auto-scaling of deployments both vertically and horizontally. Kubernetes provided a well-understood mechanism for handling horizontal auto-scaling, and the community is working on a mechanism for vertical auto-scaling. However, for an engineering team who just want to make sure their deployment scale correctly according to a set of metrics, both of these mechanisms are too low-level and require specific knowledge of the platform to use. Our opportunity here is to leverage our knowledge of both the goals of the users and what the platform would expect the users to do given the their usage and the metrics, and make those scaling decisions automatically on behalf of the user.
Qualifications and skills
- BS or MS in Computer Science or related technical field
- 5+ years of full life cycle software development experience
- Strong understanding of data structures and algorithms
- Strong knowledge of UNIX and TCP/IP network fundamentals
- Strong knowledge in virtualization and containerization
- Ability to code really well in at least one programming language, and have done that to enhance existing software systems. Can take advantage of tools to help code better.
- Strong understanding of large-scale Internet service architectures, such as load-balancing, DNS, CDN, http/https proxy
- Proven ability to pick up new technology and tool very quickly
- Ability to take calculated risks in order to move fast, but have a plan for when things go wrong
- Experience in an operations role supporting a 24/7 production environment
- Organized, good attention to details, and able to work both independently and with a team
- Strong written and verbal communications in English