Beacon Biosignals is seeking an Ops Engineer to build and maintain cloud infrastructure that supports large-scale machine learning on terabytes of biosignal data.
As an Ops Engineer, you'll work closely with Beacon's data scientists to...
- ...fine-tune the EKS clusters that power our data scientists' hefty distributed numerical workloads.
- ...augment data scientists' efforts to apply more robust CI/CD practices to machine learning model development.
- ...equip our data science teams with new tools (e.g. Argo) to ergonomically orchestrate complex experimentation workflows.
- ...build/maintain observability infrastructure that makes it easy for users to monitor, trace, and identify bugs and resource utilization issues.
At Beacon, we believe a diverse team builds more robust systems and achieves higher impact. We encourage people from all backgrounds to apply!
- You optimize in pursuit of lower human costs - temporal and/or fiscal - not lower machine costs.
- You have a battle-tested workflow for debugging performance issues and selecting the layer of the stack that actually merits optimization.
- You enjoy balancing highly reactive/collaborative support work with forward-facing work.
- You want to work in an organization that treats tool development as a company-wide practice instead of an individual role.
- You are familiar with the idiosyncrasies of storing, streaming, and analyzing large volumes of dense artifactual data in the cloud (e.g. audio, video, sensor data, models, etc.).
- You derive immense satisfaction from rendering opaque systems observable and from "leveling up" a team by equipping them with new tools and capabilities.
- You have experience facilitating data scientists' compute cluster usage for distributed, coprocessor-enabled numerical workloads (training, inference, data preprocessing, etc.) in a cloud environment.
- You prefer open-source dependencies to closed-source dependencies because you have a compulsion to read the code that you're running.
- You're excited to work in an environment that makes heavy use of AWS, K8s, Docker, Terraform, GitHub Actions, Argo, and Julia.