Site Reliability Engineer - London
Live shows make us feel good. They’re a time to hang with our friends, discover new artists or lose ourselves on a dancefloor. We’re on a mission to bring all of this to more fans, more often – and that’s where you come in.
We're looking for a Site Reliability Engineer to join our Engineering team and help shape our new infrastructure to deliver a world-class service and disrupt a global industry.
At DICE, you’ll be part of the company that’s redefining live entertainment. It’s a place where you can be yourself, influence the culture, and create work that you’re proud of.
About the role
Our production infrastructure is hosted in Terraform and Kubernetes. We recently moved one of our two Development Kubernetes clusters from AWS to OVH Dedicated Hosts, and our remaining infrastructure is either in AWS, or composed of third-party services, including Stripe. We are fully committed to be Cloud Agnostic, and we avoid using third-party services if we can use something self-hosted. For the same reason, we are avoiding getting vendor-locked to particular Cloud Services.
As Site Reliability Engineer, you’ll be part of a fully remote team, helping to automate and streamline our processes and operations. You’ll troubleshoot and resolve production issues, ensuring security and stability of our infrastructure.
- Proactively looking for possible improvements that will shape our new infrastructure.
- Improving the security and reliability of the platform.
- Supporting production during the on-call rotations.
- Working with teams across the business to improve DICE services through testing and release procedures.
- Creating sustainable systems and services through automation and improvements.
- Passionate, humble and talented.
- A fan of music and culture.
- Actively responsible.
- Proactive in identifying problems, performance bottlenecks and areas for improvement.
- Extensive experience of AWS or similar, ie Google Cloud and Azure.
- Solid knowledge of Kubernetes, Helm and Puppet.
- Proven experience with Bare Metal servers and Xen Hypervisor.
- Experience in Linux and Network Debugging.
- Deep knowledge of Terraform. Experience using it with terragrunt is a plus.
- Experience using monitoring solutions, including Prometheus, SysDig, Zabbix and Datadog.
- Solid knowledge of Kubernetes Operators, including Prometheus-operator and Spark-operator.
- Experience with Mutating Admission Webhooks.
- An understanding of Mongo, Postgres, Kafka and Redis. The ability to perform basic operations in these programs, such as dumping and restoring, is a plus.
- A basic understanding of networking security, ie Heartbleed.
Our teams work from London, New York, Los Angeles, Barcelona, Paris, Berlin and Milan. We’re building products that will revolutionise the industry for fans, artists and venues – and we’re growing fast. Read about our global expansion and our ongoing mission to transform ticketing.
We know that having a variety of perspectives makes us a better company – it’s why we strongly encourage members of underrepresented communities to apply. Find out how we’re creating a more diverse, equitable and inclusive DICE.
- Unlimited holiday
- Private health insurance
- Workplace pension
- Free therapy and coaching
During the pandemic, we’ve learnt that working from home can help us focus, but many of us are missing the buzz of the office. We’re still figuring out the best way for us all to work together in the future, and we’ll involve the whole team in any decisions we make.
Our process usually involves a quick chat on the phone, a portfolio review or task and a couple of interviews where you’ll meet the people you’ll work with. We’ll keep you fully informed along the way.