We're looking for an exceptionally talented engineer to help manage our growing infrastructure, ensuring our site stays up and performs well, and refining our processes for operating our production systems. Working closely with the rest of our engineering team, you'll have a great deal of authority in designing and implementing the hardware and software systems we use to host, manage and monitor our production environment.
Thumbtack's infrastructure has always been managed by our small team and a single SRE, and while there haven't been any major disasters, we recognize it's time to take our operations to the next level. Our Python deploys could be much smoother, our monitoring could be more systematic and accessible, our alerting could be much less noisy. We are actively moving our infrastructure from dedicated hardware to the AWS cloud to improve development speed and make our platform more scalable.
We're looking for someone to work with our nascent engineering operations team and push us forward. As an authority on operations, you'll help plan and execute how we manage and monitor our platform as it grows. You'll continually look for new ways to make our systems more reliable and easier to manage, incorporating third-party tools when available and writing software of your own when nothing else fits the bill. You'll anticipate performance bottlenecks and provision new hardware as necessary. And finally, we'd love to find someone who's excited to learn and grow, expanding skills and expertise as our systems continue to grow and develop.
Our current infrastructure:
- Our platform operates primarily on a few dozen dedicated Linux machines on RHEL, Ubuntu, and Debian, all managed via Puppet; we additionally run a small number of machines and services on AWS
- Our main data stores are Postgres (website backend) and Mongo (internal analytics); we also make use of DynamoDB, Riak, and Memcached
- We use DataDog, New Relic, Munin, Graphite and a handful of custom tools for monitoring and alerting
- We practice continuous deployment using a custom one-click deployment system written in Python (Fabric). Auxiliary systems are deployed directly via Puppet.
- Expert with Linux administration, security and configuration management
- Deep knowledge of the steps involved in serving a web request, including a strong understand of TCP/IP, and experience dealing with the corresponding infrastructure components
- Fanatic about monitoring
- Enjoy diagnosing and fixing misbehaving and underperforming Linux servers
- Fluent with the shell and comfortable writing tools in Python to automate our operations and development processes
- Experience with AWS is a plus
- Experience tuning database performance is a plus
- Comfortable working with a great deal of autonomy
- Excited to continually learn, grow and share knowledge
Please visit www.thumbtack.com/jobs to return back to our careers page.