Why work at Voltron Data?
- We are Going for Impact: We are a Series A, venture-backed startup assembling a global team to build a new foundation for data analytics with Apache Arrow. This foundation will usher in a wave of innovation in data processing that can take full advantage of the speed and efficiency offered by modern hardware.
- We are Committed to Bridging Open Source Communities: We are a collection of open source maintainers who have been driving open source ecosystems over the last 15 years, particularly in the C++, Python, and R programming ecosystems.
- We are Building a Diverse, Inclusive Company: We are creating a representative, equitable, and respectful workplace that prioritizes employee growth. Everyone at Voltron Data is bought into the company’s success; all voices are critical to shaping the organization’s future.
Below is a rough timeline of where you can expect to be at different points during your career path starting in this position.
- Spending time learning about the Apache Arrow memory layout, compute primitives, and APIs.
- Familiarizing yourself with the different partners for compute kernels and the query execution engine on Apache Arrow.
- Learning and embracing the Apache development process.
Within a month:
- Implementing new high-performance storage and I/O primitives.
- Benchmarking existing I/O library functions to determine where there are bottlenecks.
- Discovering and implementing optimizations in data reads and writes.
- Participating in peer code review of all PRs related to file storage and interacting with different filesystems.
- Contributing to technical discussions and technical design documents.
Within 6 months:
- Developing a comprehensive set of low level benchmarks for I/O functions targeting various local, networked and cloud storage technologies to enable monitoring for performance regressions.
- Ensuring that all filesystems interactions are compatible and performant across platforms (Linux, MacOS, and Windows).
- Identifying and building reusable software components to ensure a high quality and maintainable codebase.
Within 12 months:
- Analyzing I/O throughput in a massively parallel and distributed query engine to identify inefficiencies and crafting solutions to tackle those inefficiencies.
- Ensuring that the everything related to storage is built as high quality as possible, balancing performance, usability, and maintainability across the Voltron Data and Apache Arrow ecosystems.
Previous experience that could be helpful:
- Strong experience developing in C++, especially using Modern C++.
- Experience developing and using various data lake storage technologies as: S3, Google Compute Storage, Azure Blob Storage.
- Building and using distributed networked file systems such as HDFS or Ceph.
- Experience working with technologies such as io_uring, DMA, RDMA, or GPUDirect Storage.
- Experience with different data storage file formats such as ORC, Parquet, and Avro.
- Experience with data lake table formats such as Iceberg, Delta Lake, and Hudi.
US Compensation - The salary range for this role is between $140,000 to $165,000. We have a global market-based pay structure which varies by location. Please note that the base pay range is a guideline and for candidates who receive an offer, the exact base pay will vary based on factors such as actual work location, skills and experience of the candidate. This position is also eligible for additional incentives such as equity awards.