Seldon is looking for a Site Reliability Engineer (SRE) to join our team. We are focused on making it easy for machine learning models to be deployed and managed at scale in production. We provide Cloud Native products that run on top of Kubernetes and are open-core with several successful open source projects including Seldon Core, Alibi:Explain and Alibi:Detect. We also contribute to open source projects under the Kubeflow umbrella including KFServing.

 

The role will help Seldon expand its product range with confidence across various cloud and on-prem installations. The role will be a key initial SRE for the company to bring core best practices across the product range.

About the role

  • Manage our Marketplace releases including Google and Redhat marketplace.
  • Extend open and closed products to allow for Cloud Marketplace publishing.
  • Build observability and monitoring for internal services to ensure reliability.
  • Manage internal services on cloud Infrastructure.
  • Setup and maintain CI/CD pipelines for core open and closed source projects.
  • Develop and improve the reliability process across the engineering team.
  • Manage team IAM across infrastructure used by the team.

Essential skills

  • A degree or higher level academic background in a scientific or engineering subject.
  • Working with Linux and the Unix Shell
  • At least 2 years of experience in industry or academia showing completed projects.

Core skills (The role will be focused on these skills so we would expect existing experience or a demonstrable desire to learn these)

  • Familiarity with Cloud Infrastructure (GCP, AWS).
  • Implementing "Infrastructure as Code" (Terraform or similar technologies)
  • Managing Kubernetes and Docker environments.
  • Knowledge of monitoring and alerting technologies.
  • Knowledge of CI/CD systems.
  • Strong programming skills (Golang, Python, Bash).
  • Interest in using and contributing to Open Source tools.

Nice-to-have

  • Experience with maintaining / deploying machine learning models in production.


Benefits

Share options to align you with the long-term success of the company.
Exciting phase of fast-paced start-up challenges with an ambitious team and unlimited potential for professional growth.
Access to discounted lunches, gyms, shopping and cinema tickets.
Healthcare benefits.
Flexible work-from-home policy.
Cycle To Work Scheme.

Logistics

Our interview process is normally a phone interview, a coding task, and 2-3 hours of final interview (carried out virtually). We promise not to ask you any brain teasers or trick questions. We might design a system together on a whiteboard, the same way we often work together, but we won’t make you write code on one. Our recruitment process has an average length of 3 weeks. 

 

Apply for this Job

* Required
  
  
When autocomplete results are available use up and down arrows to review
+ Add Another Education