Make Tumblr fast, reliable and available for hundreds of millions of users all over the world.  As a site reliability engineer you are a software developer with a love of highly performant, fault-tolerant, massively distributed systems.

What You'll Do:

  • Manage the availability, scalability and performance of Tumblr platforms
  • Create the tools and infrastructure leveraged by the rest of the Tumblr engineering teams
  • Diagnose and repair network, application, and hardware bottlenecks
  • Test and tune network, hardware, and software configurations to maximize performance
  • Deploy and manage monitoring and diagnostic tools
  • Guide our product and platform teams to keep new features fast and stable

What We’re Looking For:

  • Experience scaling high-traffic web sites
  • Experience with Unix systems administration including solid scripting skills in Ruby, PHP or Python
  • Expertise in data structures and algorithms
  • Expertise in troubleshooting large-scale distributed systems
  • Smarts, humility, and equal willingness to learn and teach
  • A sense of ownership, initiative, and drive

Tools We Like:

  • Nginx, Varnish and HAProxy
  • Memcached and Redis
  • MySQL (InnoDB)
  • Puppet
  • PHP5 at its furthest extent
  • git and GitHub
  • Ruby, Scala and PHP
  • Asynchronous services and queues
  • Hadoop, Pig, ZooKeeper, and other Java/JVM projects
  • Nagios/Icinga, OpenTSDB
Apply for this Job
* Required
File   X
File   X

Share this job: