As a Data Engineer, you will be responsible for developing next generation data pipelines for IoT wearable devices. We are seeking an engineer with experience in the domain, proficient at SDLC and engineering cycles. You have deep experience with the big data software stack(s). You are up to the challenge of evolving StrongArm’s big data systems, scaling these to hundreds of thousands of industrial athletes, and leveraging advanced techniques like ML edge computing, geolocation analysis, and computer vision.
You must be able to uphold quality engineering principles and enjoy holding responsibilities solving advanced problems.
We are highly collaborative. Our open office environment is people-focused - we joke, have fun, and enjoy each other's company when we can, but aren’t afraid to roll up our sleeves and get things done. We are looking for driven and open-minded engineers with exceptional analytical abilities who can wear a number of hats from data engineering, dev-ops, software design, software development, system administration and system architecture.
- Individual contributor using Python with a system trajectory towards Scala
- Provide technical leadership in Data Engineering design principles
- Design and implement highly available / scalable data pipelines using Apache Spark for ETL jobs that process data from 100s of thousands of sensors.
- Build big data interfaces compatible across IoT systems using SWIG
- Partner with the Data Science team to implement advanced statistical models and machine learning that run on edge devices
- Partner with the Embedded Engineering team to build interfaces between IoT devices and data pipelines for ingesting data
- Own data modeling implementation for huge scale
- Optimize data pipelines for performance and scalability
- Establish automated mechanisms to improve data integrity across all big data sets.
- Leverage strategic and analytical skills to understand and solve customer and business centric questions
- Monitor and troubleshoot performance issues for production pipelines
- Learn about new technologies and add to StrongArm’s Big Data tech stack
- Data Warehouse management in Databricks using Delta Lake
- 2+ years experience in data engineering
- Experience building systems using Apache Spark that have processed terabytes of data in production
- Experience productionizing data science models and algorithms to run at scale
- Experience with distributed data streaming frameworks like Spark Structured Streaming, Apache Flink, Kinesis, etc
- Advanced Experience with RDBMS and SQL
- Experience with automated testing for distributed systems in Spark (unit testing, end to end testing, QA, CI/CD)
- Experience designing end to end pipeline architectures
- Experience managing data warehouses in a production environment (Delta Lake, Snowflake, Redshift, Bigquery, Presto)
- Scala and Python proficiency
- Experience leveraging cloud systems to build data pipelines (BigQuery, Redshift, AWS Kinesis, AWS S3, GCS)
- Linux proficiency, this is the means by which things are engineered well.
- Experience extending Apache Spark (DataSource API, Catalyst Optimizer)
- Experience with productionizing machine learning at scale and A/B testing new models: scikit-learn, tensorflow, pytorch
- Experience building systems to train machine learning models at terabyte scale.
- Experience working with Hadoop
- Experience using workflow management systems like Airflow or equivalent
- Experience working with datasets that measure physical phenomena.
- noSQL solutions: Cassandra, HDFS and/or Elasticsearch
- Experience as an open source contributor
- Experience with BI tools (Looker, Redash)
- Experience building systems with data governance
- Strong Mathematics background (Linear Algebra, Statistics, Physics, Complex Variables, Calculus)
StrongArm Technologies is an equal opportunity employer