We are looking for an experienced Site Reliability Engineer to join our Infrastructure Team.  We provide tools and services to all teams in OfferUp for managing an increasingly complex production infrastructure handling billions of requests per day.  Our success is measured by our ability to allow everyone to stand up and deploy services quickly with no downtime. In this role, you will be at the forefront of driving and developing the technology that improves the availability, scalability, performance and reliability of OfferUp.

Responsibilities

  • Work with other SREs to build a comprehensive set of tools to automate and monitor our production infrastructure
  • Work with other engineering teams to build performant, resilient, operable, self-healing services
  • Participate in reasonable on-call rotations with the rest of Engineering
  • Practice sustainable incident response and blameless postmortems

Experience

  • You have the knowledge of various aspects of service design: including messaging protocols & behavior, caching strategies and software design practices
  • 2+ years managing groups of servers at scale, preferably in AWS
  • Demonstrable knowledge of TCP/IP, HTTP, web application security, and experience supporting multi-tier web application architectures
  • Proficient in a modern scripting language, Python preferred
  • Experience with system, runtime and application performance profiling
  • Experience building load testing frameworks

Nice to have

  • Experience with distributed tracing or APM vendors
  • Contribution to open source projects
  • An active interest in serverless computing and containerization

Our team

  • Collaborates and works as a team
  • Avoids doing things twice
  • Solves hard problems for tomorrow, not just for today
  • Stays positive and prefers fixing problems to complaining about them
  • Investigates, considers and adopts new technology where it makes sense
  • Doesn’t tolerate brilliant jerks

Apply for this Job

* Required

File   X
File   X