You will work as part of an established and growing research team, and help to develop frameworks that will be used to train large models on a distributed cluster. You’ll closely collaborate with researchers, platform and software engineers, and leverage your knowledge of infrastructure and training libraries to accelerate the rollout of a broad array of models.
Your Core Responsibilities:
- Develop & support libraries for:
- Accelerating training on a distributed environment
- Automating ML workflows, experiment evaluation and management, hyperparam tuning (autoML) & retraining
- The efficient use of GPU hardware in research cluster (data pipeline design, leveraging libraries such as GPUDirect & DALI)
- Evaluate and roll out third-party tooling (e.g. MLflow; Neptune; Ray; …)
- Dig into the internals of open-source ML tools to extend their capabilities and fix fundamental bugs
- Leverage experience in scheduling tools (k8s, slurm etc) to support the effective use of contested resource across a broad research team
Your Skills and Experience:
- MS degree in CS or similar fields or equivalent experience
- 3+ years of relevant work experience
- Experience in fundamental ML frameworks like Pytorch and Tensorflow
- Experienced in Python, experience in C++ is highly desired
About Us
IMC is a leading trading firm, known worldwide for our advanced, low-latency technology and world-class execution capabilities. Over the past 30 years, we’ve been a stabilizing force in the financial markets – providing the essential liquidity our counterparties depend on. Across offices in the US, Europe, and Asia Pacific, our talented employees are united by our entrepreneurial spirit, exceptional culture, and commitment to giving back. It's a strong foundation that allows us to grow and add new capabilities, year after year. From entering dynamic new markets, to developing a state-of-the-art research environment and diversifying our trading strategies, we dare to imagine what could be and work together to make it happen.