Are you ready to make your mark in the forefront of technological innovation? As an HPC Cluster Engineer, you'll play a pivotal role in shaping the future of AI, deep learning, and machine learning initiatives. Join us and leverage Nvidia's cutting-edge GPU technology to drive groundbreaking discoveries and revolutionize industries.

Sustainable Talent is thrilled to partner with Nvidia, a global powerhouse with over 25 years of trailblazing advancements in computer graphics, gaming, and accelerated computing.

This is a W-2 full-time contract based in Santa Clara, CA - Hybrid work option. We offer competitive pay based on factors like experience, education, location, etc. and provide full benefits, PTO, and amazing company culture!

Additional locations:  MA, Westford; US, NC, Durham; US, TX, Austin.

What you'll be doing:

  • You'll lead the charge in optimizing our Infiniband network and managing Lustre and GPFS storage solutions, ensuring seamless performance for our cutting-edge initiatives.
  • Your expertise in the SLURM job scheduler will be instrumental in orchestrating the smooth operation of our clusters, from scheduling tasks to managing resources efficiently.
  • As a Linux sysadmin guru, you'll be responsible for maintaining the stability and security of our systems, leveraging your deep understanding of Linux environments.
  • Harnessing the power of Ansible, you'll automate routine tasks and streamline operations, freeing up time for innovation and optimization.
  • Advanced python and bash scripting will drive automation efforts and enable dynamic solutions to complex challenges.

What We Need to See:

  • Demonstrated experience with SLURM, coupled with a solid understanding of Infiniband networks and Lustre/GPFS storage systems, is essential.
  • A proven track record in Linux system administration, ensuring robustness and security in our computing environment.
  • Proficiency in Ansible is a must-have, enabling you to automate tasks and workflows efficiently.
  • Strong scripting abilities in Python and bash are critical for developing custom solutions and optimizing cluster performance.

Ways to Stand Out From the Crowd:

  • Showcase your knowledge of best practices in HPC cluster operations, automation, and upgrades, setting you apart as a seasoned professional in the field.

Sustainable Talent is a M/F+, disabled, and veteran equal employment opportunity and affirmative action employer. 

Apply for this Job

* Required

resume chosen  
(File types: pdf, doc, docx, txt, rtf)
cover_letter chosen  
(File types: pdf, doc, docx, txt, rtf)


Our system has flagged this application as potentially being associated with bot traffic. Please turn off any VPNs, clear your browser cache and cookies, or try submitting your application in a different browser. If this issue persists, please reach out to our support team via our help center.
Please complete the reCAPTCHA above.