Graphcore has created a completely new processor, the Intelligence Processing Unit (IPU), specifically designed for artificial intelligence. The IPU’s unique architecture means developers can run current machine learning models’ orders of magnitude faster. More importantly, it lets AI researchers undertake entirely new types of work, not possible using current technologies, to drive the next great breakthroughs in general machine intelligence.
Our team is at the forefront of the artificial intelligence revolution, enabling innovators from all industries and sectors to expand human potential with technology.
What we do really makes a difference.
We are looking for a Systems Operations Engineer to work with our engineering teams, IT team and external datacentres to maintain and develop our fleet of cutting-edge AI systems. As part of our Engineering organisation, you will be involved in the maintenance, performance optimisation, reliability and development of our high-performance custom solutions. These include in-house AI systems containing ground-breaking IPU processors alongside off-the-shelf high-performance servers, switches and storage solutions. This is a hand-on role requiring a solid background in IT and an appreciation of high-performance infrastructure and systems. You may have been working in an IT organisation, a datacentre or as a developer of orchestration / server / infrastructure components.
The Engineering team at Graphcore is responsible for developing ground-breaking AI technology involving custom ASIC design, advanced hardware platforms and complex software. We build this into large-scale solutions for our customers and the Engineering Operations team is responsible for providing such systems to our internal users. Often these internal systems will be using and developing pre-release hardware and software.
Your day-to-day responsibilities will include:
- Monitor and report the status of IPU systems used by internal teams.
- Monitor and manage utilisation of IPU systems used by internal teams, adapting allocation and queues as required.
- Manage configuration of internal systems, providing planned and unplanned system maintenance and updates to keep fleet operational for users.
- Configure and test new IPU hardware and systems as they are rolled-out internally and in external datacentres.
- Provide statistics for stability of internal systems and clear reporting of any issues uncovered. Work with users to provide clear information of any issues to Engineering.
- Drive corrective actions for systems that are not operating correctly, working with IT, Engineering and datacentres as required
- Solid software engineering or IT experience with a proven track record of delivering technical output as an individual contributor.
- Strong Linux scripting ability.
- Knowledge of representative management systems and tools such as Puppet, Ansible, Kubernetes, SLURM, Grafana.
- Experience of high-performance storage or networking solutions would be an advantage.
- Good communication and presentation skills.
- An ability to work independently without daily oversight.
- An understanding of priorities, risks, issues, impacts and constraints.
- Bachelor's degree, HND or equivalent practical experience in a relevant subject.
We welcome people of different backgrounds and experiences and are committed to building an inclusive work environment that makes Graphcore a great home for everyone. We are an equal opportunity employer and want to build a work environment where everyone is happy, productive and respectful so they can do their best work. If you have a disability or additional need that requires accommodation, just let us know.
Please note, we are only considering candidates who have an established right to work for roles based in Bristol, UK or Oslo, Norway.