About Graphcore
How often do you get the chance to build a technology that transforms the future of humanity?
Graphcore products have set the standard in made-for-AI compute hardware and software, gaining global attention and industry acclaim. Now we are developing the next generation of artificial intelligence compute with systems that will allow AI researchers to develop more advanced models, help scientists unlock exciting new discoveries, and power companies around the world as they put AI at the heart of their business.
Graphcore recently joined SoftBank Group – bringing large and ongoing investment from one of the world’s leading backers of innovative AI companies.
Job Summary
As a Senior Software Engineer in the Collectives Team, you will drive the effort to design and develop the Collectives Communication Library enabling users to utilize large computing clusters. The ideal candidate will have extensive experience in designing, developing, and maintaining complex software systems involving custom hardware. You will be responsible for leading development efforts, mentoring junior engineers, and driving technical excellence in our projects.
The Team
The Collectives team is responsible for building the Collectives Communication library for new AI hardware Graphcore is working on. The library provides communication primitives optimized to achieve high bandwidth and low latency at very high scale.
Responsibilities and Duties
- Designing, implementing, testing and documenting Collectives Communication Library for new AI hardware accelerator
- Collaborating with other teams to design, implement and test new features
- Troubleshooting and resolving complex technical issues
- Ensuring seamless integration of new hardware with existing AI ecosystem
- Participating in agile development – working as part of a scrum team
Candidate Profile
Essential:
- Extensive experience in software development using C++ programming language
- Experience with Python and C programming
- Excellent problem-solving skills and ability to debug and resolve complex issues
- Strong knowledge of multithreading and inter-process communication (IPC) techniques for development of efficient concurrent applications
- Experience with unit testing frameworks such as Boost.Test and Google Test
- Proficiency with build tools such as CMake, Make and Ninja
- Strong understanding of version control systems (preferred Git)
- Ability to work within a multinational team and with multinational customers
- Excellent written and verbal communication skills
Desirable
- Experience with RDMA networking libraries (for example libibverbs, libfabric)
- Knowledge of multithreading and parallel computing concepts, including experience with parallel algorithms and optimization for AI/ML and HPC systems
- Experience with Continuous Integration/Continuous Delivery (CI/CD) pipelines, including setting up automated workflows and deployments (for example GitHub Actions, GitLab CI)
- Experience with communication libraries (for example NCCL, MPI)
- Knowledge of machine learning frameworks (for example PyTorch)
- Knowledge of modern C++ standards 17/20
Benefits
In addition to a competitive salary, Graphcore offers flexible working, a generous annual leave policy, private medical insurance and health cash plan, a dental plan, pension (matched up to 5%), life assurance and income protection. We have a generous parental leave policy and an employee assistance programme (which includes health, mental wellbeing, and bereavement support). We offer a range of healthy food and snacks at our central Bristol office and have our own barista bar! We welcome people of different backgrounds and experiences; we’re committed to building an inclusive work environment that makes Graphcore a great home for everyone. We offer an equal opportunity process and understand that there are visible and invisible differences in all of us. We can provide a flexible approach to interview and encourage you to chat to us if you require any reasonable adjustments