Role Summary: We are seeking a highly skilled Senior Data Engineer to join our team. This role requires deep technical expertise in real-time data streaming, distributed computing, and big data technologies. The successful candidate will have a proven track record of designing and implementing scalable, high-performance data pipelines. As a Senior Data Engineer, you will guide the technical direction of our data infrastructure, mentor our data teams, and ensure robust and scalable data solutions.
Responsibilities:
- Design and implement complex data pipelines for both batch and real-time processing.
- Lead the architectural design of streaming data platforms, ensuring scalability, performance, and data reliability.
- Collaborate with data scientists, analysts, and business stakeholders to gather requirements and translate them into technical specifications.
- Develop and maintain high-quality, maintainable data processing solutions using modern streaming technologies.
- Oversee the development and maintenance of data lakes, data warehouses, and streaming platforms.
- Mentor junior data engineers, fostering a culture of continuous learning and improvement.
- Conduct code reviews to ensure adherence to best practices and data engineering standards.
- Stay abreast of emerging big data technologies and industry trends.
- Optimize data pipelines for maximum throughput and minimal latency.
- Ensure data quality, consistency, and reliability across all data platforms.
- Troubleshoot and debug complex issues in distributed systems.
Must-have Qualifications:
- Bachelor's degree in Computer Science, Information Technology, or a related field.
- 10+ years of experience in data engineering.
- Strong expertise in Apache Spark and Spark Streaming for large-scale data processing.
- Extensive experience with Apache Kafka for real-time data streaming and event processing.
- Proficiency in building and maintaining real-time analytics platforms, particularly with Apache Druid.
- Strong programming skills in Python, Scala, or Java.
- Deep understanding of distributed systems and big data architectures.
- Extensive experience with both batch and stream processing paradigms.
- Strong knowledge of data modeling and optimization techniques.
- Experience with major cloud platforms (AWS, Azure, GCP) and their data services.
- Excellent problem-solving skills and meticulous attention to detail.
- Strong communication and collaborative skills.
Nice-to-have Qualifications:
- Master's degree in Computer Science, Information Technology, or a related field.
- Experience with additional streaming technologies like Apache Flink and Apache Storm.
- Knowledge of data governance and data security best practices.
- Experience deploying real-time machine learning models.
- Familiarity with modern data lake technologies like Delta Lake and Apache Iceberg.
- Experience with NoSQL databases like Cassandra and MongoDB.
- Proficiency in data visualization tools such as Grafana and Kibana.
- Experience with infrastructure-as-code and CI/CD for data pipelines.
- Certifications in relevant cloud platforms or big data technologies.