Lambda's mission is to accelerate human progress with computation. Our Deep Learning workstations, servers, and cloud services power ML engineers at the forefront of AI research, fueling advancements in quantum computing, cancer detection, autonomous aircraft, drug discovery, self driving cars, and much more. 

Lambda provides Artificial Intelligence and Machine Learning infrastructure to organizations like Apple, Intel, Microsoft, Amazon Research, Tencent, Kaiser Permanente, MIT, Harvard, Stanford, Caltech and the Department of Defense.  

Join us and work at a profitable startup where we’re building powerful research computers and software for Machine Learning and Artificial Intelligence experts around the world. 

 

About the Role

You’ll build software that will be used by some of the world’s top AI research labs. You’ll write the software tools that will assist Fortune 500 companies and top research universities train state of the art neural networks. You’ll make it possible to scale from a single server up to an entire data center with minimal setup and maintenance.   The software you’ll write will enable some of the world’s top scientists and technologists to make world-changing advances in Artificial Intelligence.

 

What You’ll Do

  • Learn about what it takes to build and run HPC clusters for Deep Learning
  • Build operating software for managing GPU hardware infrastructure for Machine Learning
  • Automate the process of creating, provisioning, and expanding HPC clusters for use in machine learning applications
  • Build telemetry systems for clusters to enhance visibility, utilization, and performance
  • Create monitoring and alerting dashboards to improve cluster uptime and reliability
  • Create workflow tools to help scientists manage their experiments

 

Experience that’s great to have

  • Systems level design and development of OS-level tools
  • Understanding of and experience with interconnects, both on-board (PCIe) and network-level (Ethernet, TCP/IP)
  • Extensive experience with Linux
  • Experience developing in a systems programming language (C, C++, Go, Rust, similar)
  • Comfort developing web based graphical interfaces
  • Strong Python and Bash scripting skills

 

Nice to Have 

  • Experience programming GPUs
  • Experience working with High Performance Computing clusters
  • Specific experience with Ubuntu or Debian systems details
  • Previously worked with hardware management systems like IPMI
  • Built RESTful APIs
  • Grafana or Prometheus experience

 

About Lambda

  • We offer generous cash & equity compensation.
  • Investors include Gradient Ventures, Google’s AI-focused venture fund.
  • We are experiencing extremely high demand for our systems, with quarter over quarter, year over year profitability.
  • Our research papers have been accepted into top machine learning and graphics conferences, including NeurIPS ,ICCV, SIGGRAPH, and TOG.
  • We have a wildly talented team of 30, and growing fast.
  • Our remote workforce, based on role, is across the U.S., with headquarters in San Francisco.
  • Health, dental, and vision coverage for you and your dependents.
  • Commuter/Work from home stipends.
  • 401k Plan.
  • 3 weeks Annual Paid Time Off.

 

Equal Opportunity Employer

Lambda Labs is an Equal Opportunity employer. Applicants are considered without regard to race, color, religion, creed, national origin, age, sex, gender, marital status, sexual orientation and identity, genetic information, veteran status, citizenship, or any other factors prohibited by local, state, or federal law.

Apply for this Job

* Required