High Performance Computing Platform Engineer
PDT’s quantitative investment business is powered by a world-class technology platform spanning research computing, low-latency trading, application development, and core engineering domains. Researching new ideas, trading them real-time, and supporting systems are all under the purview of platform engineering teams at PDT. They are trusted partners tasked with solving real-world technical problems to propel PDT forward.
We are looking for talented HPC Platform Engineers to join our research aligned HPC platform team. Members of this team are responsible for the design, implementation, and operational support of scalable and performant HPC systems that are critical to our success. Success on this team requires an excellent technology foundation, good problem-solving abilities, curiosity and discipline to carry projects from idea to delivery, and strong communication and collaboration skills. The HPC platform team works closely alongside other platform teams that specialize in different platform domains, including Container Orchestration, CI/CD, Linux, Cloud, and Network Engineering.
Why join us?
PDT Partners has a stellar 25+ year track record and a reputation for excellence. Our goal is to be the best quantitative investment manager in the world—measured by the quality of our products, not their size. PDT’s very high employee-retention and mobility speaks for itself. Our people are intellectually extraordinary, and our community is close-knit, down-to-earth, and diverse. Our engineers love to work on challenging and complicated problems, and in return, they have a chance to make a direct impact on our bottom line, without the attitude and bureaucracy of a typical Wall Street firm.
Like all Platform teams at PDT, the HPC team is a small flat group of experts that help research, implementation, and software engineering teams take their ideas from inception to production. Responsibilities and patterns of work include:
- Design, implement, and deliver scalable and performant systems. Projects typically involve equal parts engineering and operations, for success in our fast-moving environment. You will be expected to do both for projects small and large, working with a mix of open-source and proprietary tools.
- Implementing automation. We will always choose to work smart over working hard. You will be responsible for conception and implementation of new automation from CI/CD pipelines to production metrics to other automation for the platform infrastructure that your team owns.
- Obsessive User Focus. All members of platform teams collaborate closely with peer engineers and/or researchers to build high-quality, efficient, and reliable systems. This includes adapting to change and at times diving into new domains to deeply understand stakeholder needs.
- Capacity management and benchmark optimization. Our demand for scale and performance is constant and involves challenging optimization problems for workloads critical to research and trading.
- Running our platform systems day-to-day. Our platforms are mission critical for the firm’s success and are very stable, and we want to keep it that way. Everybody on the team contributes to the support of our platforms, which we strive to make light through automation and quality work.
Below is a list of skills and experiences we think are relevant. Even if you don’t think you’re a perfect match, we still encourage you to apply, because we are committed to developing our people:
- Experience with systems programming and/or software engineering
- Practical experience supporting, debugging, and improving production systems and services
- Experience using Linux and other Open Source Software
- Experience with configuration management and infrastructure-as-code frameworks
- Production experience working with a public cloud, AWS preferred
- Experience with one or more HPC-specific technologies listed below required:
- Distributed parallel filesystems (Lustre, GPFS, parallel NFS)
- Batch scheduling systems (slurm, torque, SGE, AWS batch, AWS parallel cluster)
- High-performance networking
Bachelors or Masters degree in an Engineering or Applied Sciences field from a rigorous academic program or equivalent professional experience.