The Site Reliability Engineering (SRE) team needs another manager: Someone with experience writing performant, distributed software, and managing projects of different sizes. A decision maker, equally comfortable with high-level architectures and teeny-tiny details. A mentor who can help their team improve technical and non-technical abilities.
They should also be familiar with Scala, Golang, Ruby, or Java—ideally more than one.
What you'll do
Test and tune network, hardware, and software configurations to maximize performance.
Create tools and infrastructure used by the rest of the Tumblr engineering teams.
Manage the availability, scalability, and performance of Tumblr platforms.
Set short- and long-term priorities and goals for your team.
Coordinate cross-team projects.
Mentor your peers and reports through individual instruction and code review.
Help hire, onboard, and train new members of your team.
What we're looking for
Experience managing a fast moving, highly-skilled infrastructure engineering team.
A problem-solver who to evaluates every possible solution.
Ability to troubleshoot large-scale distributed systems.
Previous experience scaling high-traffic websites and apps.
Familiarity with Unix systems administration, and solid scripting skills.
Willingness—nay, eagerness—to perform on-call duties. And previous experiencing doing so.
Knowledge of data structures and algorithms.
A sense of ownership, initiative, and drive.
Persistence and resourcefulness when obstacles arise.
Tools we like
Nginx, Varnish and HAProxy
Memcached and Redis
git and GitHub
Ruby, Go, Scala, PHP
Asynchronous services and queues like Oozie and Gearman
Hadoop, Pig, ZooKeeper, and other Java/JVM projects