"The front page of the internet," Reddit brings over 430 million people together each month through their common interests, inviting them to share, vote, comment, and create across thousands of communities. Come for the cats, stay for the empathy.
As a data engineer, you will build and maintain the data infrastructure tools used by the Ads Monetization Org to generate, ingest, and access petabytes of raw data. A focus on performance and optimization will enable you to write scalable / fault tolerant code while collaborating with a team of top engineers, all while learning about and contributing to one of the most powerful streaming event pipelines in the world.
Not only will your work directly impact hundreds of millions of users around the world, but your output will also shape the data culture across all of Reddit!
Note: we are open to candidates based and authorized to work anywhere in the United States or Canada
- Design, build, and maintain streaming data infrastructure systems such as Kafka, Kafka Consumers such as built using Flink, Spark used by all of Reddit’s Ads engineering teams
- Design alerting and testing systems to ensure the accuracy and timeliness of these pipelines. (e.g., improve instrumentation, optimize logging, etc)
- Debug production issues across services and levels of the stack
- Plan for the growth of Reddit’s Ads infrastructure
- Build a great customer experience for developers using your infrastructure
- Work with teams to build and continue to evolve data models and data flows to enable data driven decision-making
- Identify the shared data needs across Reddit Ads, understand their specific requirements, and build efficient and scalable data pipelines to meet the various needs to enable data-driven decisions across Reddit Ads
- A strong engineering background and exposure to Data engineering work
- Experience developing, maintaining and debugging distributed systems built with open source tools
- Experience building infrastructure as a product centered around users needs
- Experience optimizing the end to end performance of distributed systems
- Experience with scaling distributed systems in a rapidly moving environment
- Experience managing and designing data pipelines
- Can follow the flow of data through various pipelines to debug data issues
- Familiarity with ETL design (both implementation and maintenance)
- Experience working on stream processing systems such as Kafka, Flink
- Experience working on real-time analytic systems as Druid
- Experience with Java, Golang, Scala and Python
- Experience with Lambda Architecture systems
- Experience with (or desire to learn) Kubernetes
Reddit is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, please contact us at ApplicationAssistance@Reddit.com.