Responsibilities:

  • Create robust ETL pipelines for a heterogeneous set of data, including but not limited to on-chain data across various blockchains, social media, and market data that can be consumed efficiently online and offline
  • Assess reliability + availability of data from data vendors’ APIs, creating a data acquisition strategy where data is missing
  • Support all research data identification, extraction, manipulation, and analysis
  • Engage in prototyping and building statistical models for production

Requirements

  • Experience in architecting, maintaining, and deploying data ingestion pipelines for a wide variety of structured and unstructured data 
  • Strong knowledge of designing microservice-oriented real-time systems in GCP
  • Experience with data pipeline orchestration (i.e. Airflow, MLFlow, Neptune, etc.)
  • Software engineering experience in Python (+pandas, jupyter, sklearn)
  • Should be able to create simple data visualizations and backend storage feeding the visualizations
  • Understanding of data structures, networking, and databases
  • Experience working in a Linux environment
  • Background in computer science and statistics
  • Capable of multitasking and prioritizing through different projects to iterate quickly under high pressure situations (read: hacking skills)
  • Resourceful and solutions oriented thinker; comfortable operating in an environment of ambiguity and little structure
  • Excellent communication skills and ability to work with multiple teams across the firm, including operations, accounting, and security as needed
  • Either fresh grad or perhaps up to a year post graduate work experience
  • Time zone and location flexible

Apply for this Job

* Required