Vault Health is a leading virtual-first healthcare platform that specializes in delivering remote diagnostics and specialty care to consumers directly, through their employers, and through their local public health agencies. Vault also leverages its virtual platform to facilitate decentralized clinical trials for companies in the Pharmaceutical and Biotech industries. Vault is a leading provider of at-home FDA-approved COVID-19 testing in the U.S., whose solution has been deployed to numerous local and state governments, airlines, universities, professional athletic teams, companies, and organizations. Today, Vault employs more than 500 employees across the country and expects to continue growing as we expand our products and services.
The Opportunity -
Vault is looking for a Full Stack Customer Reliability Engineer to work in a brand new engineering function invented at Vault. This is a 100% remote position. You will join the (still early) stage of a rapidly growing digital healthcare company on the cusp of making a huge impact on improving people’s lives. As an early employee, you’ll have the chance to define and work on the foundational elements of the company. In addition to our mission, we provide a great learning environment as well as competitive wage and stock options in an early stage company.
Customer Reliability Engineering is a novel concept invented at Vault. Since majority of work in a large networked software application with millions of users happens after a feature goes live, we are looking to solve a traditionally operational task with a novel approach by building a team of Full Stack Engineers. This approach is based on the modern Dev-Ops principles: we believe that operational and product engineers should share a similar set of tools and development culture, so that troubleshooting is efficient, root cause remediation is done correctly within the broader context of the architecture and process.
The mission is to create an engineering team to focus on customer reliability, system performance and scalability, development operation, and application security. The approach is to build a team of full stack engineers who can learn and do both software engineering and system engineering work, so that we can operate and improve large networked computer systems with efficiency. The efficiency will come from toil reduction via automation, using tools like Terraform, New Relic, Splunk, Harness.io, myriad AWS services and from a systematic approach to improve software architecture. So CREs need to code and understand system end to end. This is a high potential engineering role that can lead to much growth in engineering
Our core tech stack is React.js, GraphQL, Python, Flask, PostgreSQL, Terraform and AWS. engineers on the CSE team will spend at least 50% of their time working on coding, automation and improvement, with the rest of their time spent on troubleshooting, communication, and documentation. The job has a similar day-to-day working tempo as a typical software engineer at Vault.
- This job has two major responsibilities: operational task, and system improvement
- On the operation front: you will work on triaging, resolving, and managing escalation of incoming customer inquiries from the Tier2 Escalation group, and alerts from monitoring tools
- Triage incoming escalation of customer inquiries and system alerts
- Providing initial resolution for tickets that can be mitigated quickly
- For more complex issues, provide accurate troubleshooting data to the product engineering team to collaborate on issue resolution
- Provide timely communication to other departments for wide reaching major issues
- On the system improvement front, you will develop a deep understanding of our systems in order to build tools to monitor and alert on the most important system and customer metrics
- Improve our approach to the platform robustness by monitoring and analyzing key customer ticket metrics and share that intelligence with the rest of product and engineering teams
- Improve CI automation, especially in the areas of platform anomaly detection automation and customer configuration validation
- Research and implement tools and processes to help us achieve customer reliability
- Build and own workflows and systems associated with the flow of issues and how they are triaged, tracking, measured and prioritized for future resolution
"Toil reduction" is one of the major focus of this function. A traditional tier3 support works in almost 100% toil — tickets once resolved can keep coming back for the same reason. While some toil is unavoidable, we want to apply engineering skills by continuously improve tooling and automation to eliminate root cause and speed up troubleshooting.
- Passion for customer success and excellent communication skills
- Good troubleshooting skills, technical aptitude and a willingness to learn
- Past experience working as a software Engineer.
- Experience with Object Oriented Design and languages such as Python, Java, .Net or C/C++
- While a B.S. in Computer Science or equivalent is helpful, we, as a company, do not have a degree requirement.
Bonus points for:
- DevOps, SRE, QA Automation, or CI/CD experience
- Experience in Web Performance Optimization, Flask, and GraphQL.
- Experience in the healthcare space and startup environments.
- Experience working in a support function
Vault Health is an equal opportunity employer. All applicants will receive consideration for employment without regard to race, color, religion, sex, gender identity, national origin, age, disability, or veteran status.