The Site Reliability Engineering (SRE) team needs another manager: Someone with experience writing performant, distributed software, and managing projects of different sizes. A decision maker, equally comfortable with high-level architectures and teeny-tiny details. A mentor who can help their team improve technical and non-technical abilities.

They should also be familiar with Scala, Golang, Ruby, or Java—ideally more than one.

What you'll do

  • Test and tune network, hardware, and software configurations to maximize performance.
  • Create tools and infrastructure used by the rest of the Tumblr engineering teams.
  • Manage the availability, scalability, and performance of Tumblr platforms.
  • Set short- and long-term priorities and goals for your team.
  • Coordinate cross-team projects.
  • Mentor your peers and reports through individual instruction and code review.
  • Help hire, onboard, and train new members of your team.

What we're looking for

  • Experience managing a fast moving, highly-skilled infrastructure engineering team.
  • A problem-solver who to evaluates every possible solution.
  • Ability to troubleshoot large-scale distributed systems.
  • Previous experience scaling high-traffic websites and apps.
  • Familiarity with Unix systems administration, and solid scripting skills.
  • Willingness—nay, eagerness—to perform on-call duties. And previous experiencing doing so.
  • Knowledge of data structures and algorithms.
  • A sense of ownership, initiative, and drive.
  • Persistence and resourcefulness when obstacles arise.

Tools we like

  • Nginx, Varnish and HAProxy
  • Memcached and Redis
  • MySQL (InnoDB)
  • Puppet
  • git and GitHub
  • Ruby, Go, Scala, PHP
  • Asynchronous services and queues like Oozie and Gearman
  • Hadoop, Pig, ZooKeeper, and other Java/JVM projects
  • Nagios, Icinga2, Pagerduty, OpenTSDB
  • OpenStack, Docker, Mesos
Apply for this Job
* Required
File   X
File   X

Share this job: