Overview 

At Segment, we believe companies should be able to send their data wherever they want, whenever they want, with no fuss. We make this easy with a single platform that collects, stores and sends data to hundreds of business tools with the flip of a switch. Our goal is to make using data easy, and we’re looking for people to join us on the journey. We are excited about building toward a world where engineers at other companies spend their time working on their core product, rather than spending nights and weekends tweaking their customer data into various formats for 3rd party tools

Site Reliability Engineers (SRE) at Segment are members of the engineering team whose primary goal is to ensure the reliability, flexibility, and cost effectiveness of our production infrastructure. 
 
While these responsibilities are shared with the entire engineering team, SREs build and maintain the portions of our stack that ensure the entire engineering team can confidently ship software day in and day out. They complement other engineers with their deeper knowledge of the fundamental pieces of technology that underpin our production infrastructure. The SRE team are our in-house experts on building reliable, maintainable systems and they are responsible for setting the direction that determines how we go about constructing and deploying our production environment.
 

Core Responsibilities: 

  • Build software that improves the reliability, performance, and efficiency of Segment’s high-throughput, large-scale SaaS platform.
  • Collaborate with the entire engineering team on projects as the expert on reliability, performance, and efficiency.
  • Automate away the process of managing capacity, safely deploying software, and mitigating failures.
  • Troubleshoot and mitigate the thorniest problems in our most mission-critical systems. Advise the team during postmortems on effectively avoiding repeated incidents.
  • Share a 24x7 on-call rotation with the other engineers in your focus area.
  • Work with cutting edge technology, share with others through open source, and spread your expertise through contributions to our engineering blog.

Requirements: 

  • CS Degree and/or a demonstrable, solid understanding of CS fundamentals.
  • Proficient coder: strong with at least one programming language.
  • Solid grasp of Linux systems and networking concepts
  • Drive to dig into problems and burrow until the solution is found.
  • Excellent communicator; writes great documentation.

Bonus: 

  • Experience operating large-scale, distributed systems on top of cloud infrastructure such as Amazon Web Services or Google Compute Platform.
  • Broad understanding of the OS and of networking protocols with demonstrated ability to apply this understanding to solve real problems.
  • Strong proficiency with OS tuning and expertise at the application of debugging tools.
  • Strong sense of urgency and ownership over critical problem areas.
  • Demonstrable experience effectively coordinating response for outages and incidents.
  • Rare ability to inspire engineering teams to up their reliability game.

 

Apply for this Job
* Required
File   X
File   X


Share this job: