As a Site Reliability Engineer, you’ll work closely with other SREs and developers to ensure Vimeo remains available and fast. You'll build infrastructure tools, manage production changes, design alerts, and work with developers to ensure new systems are scalable, predictable, and reliable. You’ll be instrumental in building the best video streaming experience on the 'net.
What you'll do:
In your first 30 days, you'll understand Vimeo's web serving stack and monitoring systems. You'll identify deficiencies in these systems or the tools associated with them, and work with the rest of the team to update them.
In your first 90 days, you'll overhaul existing systems or set up new ones to increase our availability, or strengthen our automation. You'll identify architectural deficiencies in our infrastructure and work with other SREs to address them. You'll probably make changes that touch all of our production machines, or every request we handle.
By the end of your first year, you'll be the expert on at least one piece of our infrastructure, and you'll collaborate with other SREs and development teams to design scalable, resilient services.
Day-to-day, you'll retrofit existing systems, and work on longer-term projects to overhaul critical infrastructure. You'll build tools that get used by all of Vimeo engineering. You'll maintain alerts, troubleshoot and triage issues with our hardware, software, and network. You'll take part in a 24x7 on-call rotation.
What we're expecting:
4+ years of experience with Linux
Experience programming with at least one of: Python, Go, Ruby, PHP, or Bash.
Experience with configuration management (including tools like Chef, Puppet, Ansible, or others)
Experience managing high-throughput, high-availability systems
Knowledge of networking protocols, including familiarity with TCP/IP, HTTP, SSL, and DNS
Knowledge of monitoring systems and strategies for writing useful alerts.
BS or MS in Computer Science or related field
Experience with Linux containerization technologies
Experience with cluster managers like Mesos, Kubernetes, or Nomad