“The front page of the internet," Reddit brings over 330 million people together each month through their common interests, inviting them to share, vote, comment, and create across thousands of communities. Come for the cats, stay for the empathy.
Reddit is poised to rapidly innovate and grow like no other time in its history. This is a unique opportunity to leave your mark on one of the most influential and trafficked corners of the internet.
The Ad Delivery team is one of the core engineering teams in the Ads group. It is responsible for building and maintaining critical components that make up Reddit's ad serving platform. This includes:
- Service-oriented architecture responsible for serving a high volume (500+ MM per day) of ads requests under a strict latency SLA of 100 milliseconds per request
- Mission critical real-time streaming and batch processing systems used for pacing, billing and analytics
- Control System that paces advertising budgets intelligently to ensure optimal ROI for advertisers
- Real-time reporting backend system that provides advertisers insights into how their ad campaigns are performing
As a Site Reliability Engineer, you’ll use your knowledge of operating distributed systems to improve the consistency, reliability, and performance of our growing ecosystem of services. You’ll also use your development experience to contribute to the internal Infrastructure Product that all of Reddit Engineering uses to develop, deploy, and operate their services.
- Collaborate with all Ads Engineering teams to design and develop systems that are resilient and highly performant at tremendous scale
- Build tools and systems that will help support and scale the operation of Reddit’s advertising infrastructure and services
- Draw on your knowledge of distributed systems to identify and fix network, system, and service-level issues
- Design systems and processes that all Ads engineers will use to manage and deploy software in production
- Lead efforts to improve observability and performance of the ad serving platform and reduce costs.
- 4+ years of Infrastructure, Operations, or Site Reliability Engineering experience
- Experience with the development and operation of high-traffic backend systems
- Understanding of Docker containers and runtimes
- A demonstrated ability to debug, fix, and optimize code
- Troubleshooting skills that span applications, networking (TCP/IP), and systems
- Strong working knowledge of Linux (or UNIX) and TCP/IP.
- Excellent communication and collaborative skills
- Proficiency with Mesos or DC/OS
- At least 2 of managing processes built in Golang, Java, Scala, Python
- Experience managing high-scale systems, Ads preferred
- Experience with alerting and metrics collection via StatsD
- Experience with managing Kafka