- Design infrastructure for data, especially for but not limited to consumption in machine learning applications
- Define database architecture needed to combine and link data, and ensure integrity across different sources
- Ensure performance of data systems for machine learning to customer-facing web and mobile applications using cutting-edge open source frameworks, to highly available RESTful services, to back-end Java based systems
- Work with large, fast, complex data sets to solve difficult, non-routine analysis problems, applying advanced data handling techniques if needed
- Build data pipelines, includes implementing, testing, and maintaining infrastructural components related to the data engineering stack.
- Work closely with Data Engineers, ML Engineers and SREs to gather data engineering requirements to prototype, develop, validate and deploy data science and machine learning solutions
Requirements to be successful in this role:
- Strong knowledge and experience in Python, Pandas, Data wrangling, ETL processes, statistics, data visualisation, Data Modelling and Informatica.
- Strong experience with scalable compute solutions such as in Kafka, Snowflake
- Strong experience with workflow management libraries and tools such as Airflow, AWS Step Functions etc.
- Strong experience with data engineering practices (i.e. data ingestion pipelines and ETL)
- A good understanding of machine learning methods, algorithms, pipelines, testing practices and frameworks
- Preferred) MEng/MSc/PhD degree in computer science, engineering, mathematics, physics, or equivalent (preference: DS/ AI)
- Experience with designing and implementing tools that support sharing of data, code, practices across organizations at scale