Our services teams are looking for skilled professionals that will be responsible for building and implementing tooling that helps the ops, support, and delivery teams for products that power one of the world's top 10 most visited websites. This person will also need to develop strong relationships with our DevOps department as well as external partners to help make key infrastructure decisions. As the number of developers on this thriving team continues to grow rapidly, our internal tools are of increasing importance. The Site Reliability Engineer (SRE) will work closely with the team's other developers to design and maintain a modern CI/CD pipeline for getting things reliably from development to stable on production, support effort in improving monitoring and testing of the platforms.

What you'll be doing: 

  • Develop and maintain strong relationships with our DevOps department as well as external partners to help make key infrastructure decisions
  • Assist in planning and implementing solutions for current and future concerns of internal tools and development environments
  • Analyze and suggest improvements for application performance and monitoring or for infrastructure related features
  • Work with application developers to maintain useful and up-to-date CI/CD pipelines
  • Work with QA to enable them in creating better alerts
  • Ensure infrastructure concerns are efficiently handled, from resource management issues that might arise to streamlining & standardizing cross-cutting behavior across applications
  • Being on-call for emergency and incident response
  • Participate in postmortems
  • Consolidate and share knowledge to reduce the time to fix any breakdown which may occur or improve documentation over internal tooling and infrastructure related solutions
  • Automate as much as possible
  • Infrastructure capacity planning

What you'll need to be successful: 

Must Haves: 

  • 2+ years of experience working in DevOps, Development, and/or Systems Administration
  • Working experience with
    • Unix-based systems
    • Docker and Kubernetes
    • Azure / GCP
    • Nginx
    • ELK stack / Distributed tracing / Open Telemetry
    • Monitoring tools (e.g. Grafana, Prometheus)
  • Some experience with either .NET / C#, Javascript or PHP

Nice to Have:

  • Helm
  • GitLab
  • Opsgenie
  • Linkerd
  • Envoy
  • In-depth experience with either .NET / C#, Javascript or PHP
  • Service mesh
  • NoSQL databases
  • Message queues
  • OAuth2

As an equal opportunity employer, we celebrate diversity and are committed to creating an inclusive environment for all employees

In this role you may be exposed to adult content 

Apply for this Job

* Required