Title: Senior Data Engineer - Python, Spark
Location: Mountain View, CA or Irvine, CA
Lab Summary:
Samsung is the world’s largest consumer electronics company and the leading provider for smart phones and smart TVs. Samsung smart TVs connect homes to the Internet, providing a full range of intelligence capabilities such as speech recognition, gesture recognition, advanced video processing and personalized recommendation.
The VD intelligence lab at Samsung Research America is building a next-generation data platform to support Smart TV products and services. We have two office locations in California: Irvine and Mountain View. Our research and development include TV analytics, ads targeting, and personalized services.
General Description
We are looking for Engineers with experience with batch and/or streaming jobs. We utilize Spark for batch jobs and Flink for real-time streaming jobs. Experience with Hadoop, Hive, AWS S3 is also an asset.
Responsibilities
- Create new, and maintain existing, Spark jobs written is Scala
- Create new, and maintain existing, Flink jobs written in Scala
- Produce unit and system tests for all code
- Participate in design discussions to improve our existing frameworks
- Define scalable calculation logic for interactive and batch use cases
- Interact with infrastructure and data teams to produce complex analysis across data
Required Qualifications:
- A minimum of 2 years of experience with Scala and/or Java
- A minimum of 5 years of programming experience
- Required experience with Hadoop, Spark
- Knowledge and experience with cloud-based technologies
- Experience in batch or real-time data streaming
- Ability to dynamically adapt to conventional big-data frameworks and open source tools if project demands
- Knowledge of design strategies for developing scalable, resilient, always-on data lake
- Strong development/automation skills
- Must be very comfortable with reading and writing Scala code
- An aptitude for analytical problem solving
- Deep knowledge of troubleshooting and tuning Spark applications and Hive scripts to achieve optimal performance
- Good understanding/knowledge of HDFS architecture and various components such as Job Tracker, Task Tracker, Name Node, Data Node, HDFS high availability (HA) and Map Reduce programming paradigm
- Experienced working with various Hadoop Distributions (Cloudera, Hortonworks, MapR, Amazon EMR) to fully implement and leverage new Hadoop features
- Experience in developing Spark Applications using Spark RDD, Spark-SQL, Spark -Yarn, Spark Mlib and Data frame APIs
- Experience with real-time data processing and streaming techniques using Spark streaming and Kafka, moving data in and out HDFS and RDBMS
- Familiarity with open source configuration management and development tools
Preferred Qualifications:
- Hands on experience and production use of Hadoop/Cassandra, Spark, Flink and other distributed technologies would be a plus
- Other Technologies
- Scalatest
- Gradle/Maven
- Airflow
- SQL
- AWS
Additional Information
Work Hours
Incumbent must make themselves available during core business hours.
Physical Requirements
This position will be performed in an office setting. The position will require the incumbent to sit and stand at a desk, communicate in person and by telephone, frequently operate standard office equipment, such as telephones and computers, and reach with hands and arms.
EEO Statement
Samsung is committed to encouraging a diverse workplace and proud to be an equal opportunity employer. As we highly value diversity in our current and future employees, we do not discriminate (including in our hiring and promotion practices) based on race, religion, color, national origin, gender, gender expression, sexual orientation, age, marital status, veteran status, disability status or any other characteristic protected by law.
If you have a disability or special need that requires accommodation, please let us know.
All your information will be kept confidential according to EEO guidelines.