We are seeking a hands-on team lead for our data engineering team to help us build out and manage our data infrastructure, which will need to operate reliably at scale using a high degree of automation in setup and maintenance.
The role will involve taking ownership of the data engineering roadmap in order to continue the initial setup of our data infrastructure, build new systems where required and importantly prepare our systems and processes for anticipated massive scale in the year ahead.
At a high-level, responsibilities will extend to:
- Empowering the team to build and optimize key ETL pipelines on both batch and streaming data
- Working with data devops / SRE team members to design, implement, operate and scale infrastructure
- Working with machine learning engineers to setup and optimise our MLOps infrastructure
- Ensuring data quality is high by maintaining and extending our data quality framework as well as ensuring thorough test coverage of code in an automated fashion
- Ensuring data governance tooling is implemented and policies thereby adhered to
- Collaborating with data security team members
- Overhauling systems not configured for massive scale while maintaining business continuity
The ability to work both with technical teams including product, engineering, BI/analytics and data science as well as non technical financial teams from fraud, risk and compliance is essential.
On a day to day basis, responsibilities include:
- Working with the data product owner to groom the backlog and assign work to team members
- Reviewing pull/merge requests in order to conduct code quality checks
- Handling the deployment lifecycle
- Collaborating with the team and various stakeholders to identify technical problems, design solutions for them and help implement where required. This involves collaborating with both technical and non-technical staff
- Assisting in growing the team in terms of quality and quantity
- Taking responsibility for upskilling team members
The individual will also need to be able to work with technical leadership to make well informed architectural choices when required. A high degree of empathy is required for the needs of the downstream consumers of the data artefacts produced by the data engineering team, i.e. the software engineers, data scientists, business intelligence analysts, etc and the individual needs to be able to produce transparent and easily navigable data pipelines. Value should be assigned to consistently producing high quality metadata to support discoverability and consistency of calculation and interpretation.
A solid understanding of the retail banking domain is highly desirable, but not required.
Candidates should have a wide set of experience across the following platforms, systems, languages and capabilities:
- Ideally GCP, but strong experience in another platform such as AWS or Azure will suffice
- Event streaming platforms such as Kafka
- CDC technology such as Debezium
- Stream analytics frameworks such as Flink, Spark, GCP Dataflow, etc
- Workflow scheduler such as Apache Airflow
- Cloud data warehouses such as BigQuery, Redshift or Snowflake
- Fluency in using Kubernetes
- Infrastructure as Code tools such as Terraform
- Java and Python.
- Comfortable writing detailed design documents and holding design meetings