Location: Remote
A DRW portfolio company is seeking a highly skilled and motivated lead GPU benchmarking Engineer to join their team. The ideal candidate will have extensive hands-on experience with GPU hardware, benchmarking tools, performance analysis, programming, and automation. This role involves designing and executing rigorous testing protocols to assess the reliability of GPUs, as well as leading the development and implementation of comprehensive GPU benchmarking frameworks. The candidate should also have the potential to lead and operate at a larger scope, with an eye towards leadership roles such as Chief Technology Officer (CTO).
Key Responsibilities:
- Test Design and Execution:
- Develop and implement comprehensive test plans to evaluate GPUs under prolonged heavy workloads using stress testing software.
- Monitor key metrics such as frame rates, temperature, peak and average power consumption, Peak Flops, Sustained Flops, cross-node bandwidth, and stability over time.
- Benchmark GPUs using industry-standard benchmarking tools to measure and analyze performance.
- Provide leadership and mentorship to a team of engineers, fostering a culture of innovation and technical excellence.
- Data Collection and Analysis:
- Conduct baseline tests on new GPUs to establish initial performance benchmarks.
- Track performance metrics over time to detect and analyze any degradation.
- Utilize GPU driver APIs to collect low-level telemetry during various operational conditions.
- Performance Comparison and Validation:
- Compare performance metrics across different cluster configurations to identify comparative strengths and weaknesses.
- Perform statistical analyses to ensure the validity and reliability of the test results.
- Repeat tests to ensure consistency and accuracy of data.
- Reporting and Documentation:
- Prepare detailed reports outlining test setups, methodologies, and data-driven conclusions.
- Clearly communicate findings, insights, and recommendations to team members and stakeholders.
- Cloud Computing Integration:
- Configure, deploy, and maintain cloud infrastructure for automation, orchestration, and integration.
- Utilize cloud computing resources to create scalable and efficient testing environments.
- Optimize cloud platform usage for benchmarking and data analysis tasks.
Required Qualifications:
- Bachelor's degree in Computer Science, Electrical Engineering, or a related field.
- Proven experience in compute benchmarking, stress testing, and performance analysis.
- Proficiency with benchmarking tools such as 3DMark, CUDA, OpenCL benchmarks, FurMark, MSI Kombustor, SPECviewperf, Unigine Heaven, and Superposition Benchmark.
- Strong understanding of GPU clusters architectures and relevant performance metrics.
- Experience with using the driver APIs to get the raw data directly
- Strong programming and scripting skills, including experience with Python, C/C++, Bash, or PowerShell.
- Familiarity with cloud computing platforms and environments.
- Excellent analytical, problem-solving, and communication skills.
Preferred Qualifications:
- Experience with statistical analysis tools and techniques.
- Familiarity with Tensor, GPU cluster testing methodologies, and large-scale data analysis.
- Demonstrated leadership experience or potential to grow into a Chief Technology Officer (CTO) role.
What We Offer:
- Competitive salary and benefits package.
- Opportunity to work with cutting-edge technology and innovative projects.
- A collaborative and dynamic work environment.
- Career growth and development opportunities.
The annual base salary range for this position is $150,000 to $250,000, depending on the candidate’s experience, qualifications, and relevant skill set. Base salary is only a portion of total compensation, which may also include variable compensation and/or benefits.
For more information about DRW's processing activities and our use of job applicants' data, please view our Privacy Notice at https://drw.com/privacy-notice.
California residents, please review the California Privacy Notice for information about certain legal rights at https://drw.com/california-privacy-notice.
#LI-Remote #LI-GV1