“The front page of the internet," Reddit brings over 430 million users together each month through their common interests, inviting them to share, vote, comment, and create across thousands of communities. That community of users generates almost 50B analytics events per day, each of which is ingested by the Data Platform team into a data warehouse that sees 15,000 daily queries.
As the Sr. Engineering Manager of Data Infrastructure, you will formulate, socialize, and execute on Reddit’s strategy for Data Engineering. You will enable your teams to push our products to the next level and dictate how data is accessed and consumed across the company. Through this, you will be a key individual in setting the data culture for Reddit.
How You’ll Have Impact:
Google, Microsoft, Facebook, and OpenAI have all trained their premier text-models using Reddit data because we have one of the single most important corpora of conversational data in the world. We are the 6th most trafficked site on the internet and as such generate an enormous volume of content and activity on a daily basis. The decisions your team drives will influence the experience of millions of users around the world.
You will partner with a multitude of stakeholders to understand Reddit’s services across all of our product lines from consumer products, to safety products, to ads. Through this you will determine a path and deliver technology that determines the role data plays with each service, with a focus on shepherding the quality, reliability, and accessibility of data to all partners and powering a range of services from production-facing through to BI applications.
You will be accountable for building, growing, and mentoring a world-class team of engineers to help Reddit reach its goal of bringing community and belonging to everyone.
What You’ll Do:
- Lead architecture and development of all components of a robust and scalable real-time big data streaming platform.
- Build cross-functional relationships with Eng Infra, Security/Privacy teams and Product Engineering teams to maintain a service oriented delivery model to enable success across Reddit tech orgs.
- Coordinate cross-function and cross-team efforts to stay on track with team level and company level OKRs.
- Define measurable data quality metrics for the foundational data and drive data quality across product verticals.
- Drive day to day execution using agile scrum development process to maintain a consistent sprint velocity.
Who you Might Be:
- 5+ years hands-on development experience building data pipelines using multiple database technologies with a focus on Big Data and real-time streaming technology stack.
- Proven track record of architecting large-scale data pipelines expertly and have experience with technologies like Airflow, Kafka, Kubernetes, Flink, Terraform, and similar.
- Manage data monitoring and alerting tools to provide more visibility about scalability and performance metrics.
- Experience in deploying cloud infrastructure at scale - AWS and GCP
- Ability to communicate effectively to different audiences - engineers, partner teams, stakeholders, and leadership.
- Strong sense of ownership