As a Site Reliability Engineer, you will work to ensure the safe, swift, and reliable delivery of services to our customers. This role combines software and systems engineering to deliver highly scalable, distributed, fault-tolerant systems. You enjoy creating solutions to operations problems. You have a holistic knowledge of our systems and services and can re-engineer processes when they need it and then clearly communicate the necessary change. You understand how various development teams operate and can reduce their effort to deliver new services. You have the grace to stay calm when production services are down and the courage to ask for help from the right people as needed to bring them back up. You enjoy collaborating with people from other teams and disciplines to make plans a reality.
This position has a range of offered compensation and titles. Additionally we welcome applicants working remotely from the following countries: UK, Canada, Switzerland, Belgium, Israel, New Zealand, Germany and the US.
Develops solutions to increase service stability through automation and process re-engineering.
Builds and supports tools and systems that the enterprise will use to deploy their software into production.
Participates in rotating on-call duties in a 24x7x365 team.
Updates job knowledge by studying state-of-the-art tools and techniques; participating in educational opportunities; reading professional publications; maintaining personal networks; participating in professional organizations.
Sound fundamentals in Linux-based systems including proficiency with commands like SSH, grep, sed, awk, find, etc.
A solid understanding of networking and core Internet protocols (e.g., TCP/IP, DNS, TLS, HTTP)
Programming skills in a modern language. Go, Python, etc.
Ability to script in a shell language (Bash or POSIX Shell).
Proven experience working with container-orchestration systems (Kubernetes, ECS, etc.) in production environments.
Experience with public cloud providers (AWS, Google Cloud Platform, etc.)
Experience building and optimizing container images.
Comfort with frequent, incremental code testing and deployment.