The multimodal team at xAI creates magical AI experiences beyond text, enabling understanding and generation of content across various modalities, including image, video, and audio.
To accomplish this, we are looking for experienced data and infrastructure engineers to develop and optimize data pipelines related to multimodal data (such as images, videos, and audio), including acquisition, preprocessing, data loading, visualization and management.
Location
The role is based in the Bay Area [San Francisco and Palo Alto]. Candidates are expected to be located near the Bay Area or open to relocation.
Focus
Building tools to assist the acquisition of multimedia data.
Building petabyte-scale, high-throughput data processing systems for multimodal data (including text, images, videos, and audio).
Building high-throughput, and low-latency data decoding and loading pipelines for supporting efficient large-scale training of multimodal models.
Building visualization and management tools for all categories of datasets in house.
Ideal Experience
Expert in developing software for large-scale distributed machine learning systems.
Expert in Spark, GPUs, Kubernetes, and JAX (or PyTorch).
Experienced in standard software engineering best practices (CI/CD) and care about code quality, testing, and performance.
Tech Stack
Python
JAX
Rust
Spark
CUDA
Interview Process
After submitting your application, the team reviews your CV and statement of exceptional work. If your application passes this stage, you will be invited to a 15 minute interview (“phone interview”) during which a member of our team will ask some basic questions. If you clear the initial phone interview, you will enter the main process, which consists of four technical interviews:
One-on-one research discussion & coding interviews (three meetings total)
Meet the Team: Present your past exceptional work and your vision with xAI to a small audience.
Every application is reviewed by a member of our technical team. All interviews will be conducted via Google Meet.