Job Title: Senior Staff Software Engineer, Data Platform

Location: San Francisco/Boston 

About Us 

Valo Health is a biotechnology company that was created with the belief that drug discovery and development should be faster and less expensive, with a much higher probability of success. To achieve this goal, we are pioneering a novel, fully integrated approach that combines data and machine learning insights at every step of the process. We are a multi-disciplinary team that brings together experts at every phase of software and drug development to create a cohesive platform. Our end goal is to create life-changing medical treatments by combining expertise in technology and life sciences with a comprehensive view of the entire drug discovery and development process. 

Valo is committed to hiring a world-class team that brings together a wide variety of different skills and experiences. We are committed to inclusion across race, gender, age, religion, identity, and experience, and believe that diversity makes us stronger by bringing in new ideas and perspectives. We strive to create a workplace that cultivates bold innovation through collaboration and empowers our people to unleash their full potential. 

About The Role 

Valo is looking for an experienced Sr. Staff Software Engineer to build out our integrated data platform for Opal to deliver regulatory-grade analysis, better train our algorithms and models, and identify unique insights by enabling data fusion across disparate data sets. We are taking on hard engineering problems that are found in few other places throughout industry, so we are looking for engineers with the flexibility and ingenuity to match.

Opal is an AI-based platform that leverages human-centric data to enable researchers to discover and develop new drugs. Opal is a fully integrated, AI-powered, cloud-native platform that leverages human-centric data to create new approaches to drug discovery and development enabling researchers to minimize the cost and time associated with discovery, development, and delivery of novel therapeutics. The predictive insights produced by Opal rely on high-quality, high-density human-centric data that is sourced from multiple data sets, processed both remotely and on site through a highly complex process.    

You will be responsible for designing and implementing features as well as onboarding users to Valo’s data platform. This is a multi-faceted role which includes understanding, designing, coordinating, and implementing features of the platform that relate to: ingestion, normalization, processing, ontologies, data cataloging, security, data isolation, ML model building, discoverability, data linage and reproducibility. Our platform must support all human-centric data relevant to drug discovery and development: everything from preclinical assays to publicly available ‘Omics data to real-world data (RWD) to clinical trial data, and all of it needs to be harmonized, joinable, and reusable to enable our cutting-edge Data Science and Machine Learning. 

The role will require you to exhibit strong technical judgement and mentorship skills to help shape direction and grow other engineers.  

Successful candidates will exhibit the following leadership traits: 

  • Customer Obsession: Mission- and vision-oriented product evangelist. Comfortable with being the face and voice of the product and mission in the market and across the company. Builder mentality. Sees ambiguity as opportunity, and obstacles as chances to build. 
  • Think Big: Innovative and creative, with a vision that transcends what is visible today. 
  • Earn Trust: Ability to build credibility and rapport with the executive team to drive collaboration and coordination with key stakeholders. High Emotional Quotient (EQ) with strong communication and influencing skills, to create corporate-wide alignment around product vision. 
  • Dealing with ambiguity: Comfortable with charting new territories and navigating with imperfect information and considering decisions of trade-offs. 
  • Bias for Action: High energy, low ego, and focus on finding data-driven solutions.

What You’ll Do… 

  • Lead the definition of platform architecture, including storage designs, pipelines, data APIs and self-service tooling for diverse types of data (structured and unstructured), and diverse workloads/dataflows (transactional, analytics, ML pipelines, research data science) 
  • Design components of platform architecture, including storage designs, pipelines, data APIs and self-service tooling for diverse types of data (structured and unstructured), and diverse workloads/dataflows (transactional, analytics, ML pipelines, research data science) 
  • Incorporate engineering excellence daily from establishing requirements, design processes, to code development and robust testing strategies for data pipelines and software features.
  • Champion the self-service infrastructure and features of the data platform through well written documentation and partnering with data owners to learn the platform
  • Integrate data governance, isolation, and security into all facets of the platform
  • Mentor and develop other engineers

What You Bring... 

  • 8+ years of software engineering experience with at least 5 years focused in the data discipline (data systems, data engineering, data governance, and data pipelining)
  • Experience building data infrastructure using modern cloud technologies and working with data ecosystem tools: Data Processing (i.e. Spark), Workflow Orchestration (i.e. Airflow), Cloud Datawarehouse/Datalake technologies (i.e. Snowflake, Databricks), Data Observability and Cataloging
  • Product mindset for internal customers to champion adoption of data platform infrastructure and technologies
  • Development experience in a professional setting using python/Java, SQL/Spark
  • A strong understanding of key AWS technologies: s3, glue, EC2, EMR, etc.
  • B.S. or M.S. in Computer Science or a similar technical field, or equivalent experience

You May Also Bring... 

  • Experience in drug discovery and development and working with laboratory systems, real world medical or genomic data 
  • Experience implementing controls for regulatory compliance and data governance applications such as HIPAA, GDPR, 21 CFR Part 11, FDA data submissions using Real World Data, data isolation and use restrictions, etc.
  • Experience with translating disperse datasets into unified, coherent data models (entity relationship models, hierarchical ontologies, and/or graph models)

More on Valo 

Valo Health, LLC (“Valo”) is a technology company built to transform the drug discovery and development process using human-centric data and artificial intelligence-driven computation. As a digitally native company, Valo aims to fully integrate human-centric data across the entire drug development life cycle into a single unified architecture, thereby accelerating the discovery and development of life-changing drugs while simultaneously reducing costs, time, and failure rates. The company’s Opal Computational Platform™ is an integrated set of capabilities designed to transform data into valuable insights that may accelerate discoveries and enable Valo to advance a robust pipeline of programs across cardiovascular metabolic renal, oncology, and neurodegenerative disease. Founded by Flagship Pioneering and headquartered in Boston, MA, Valo also has offices in Lexington, MA, San Francisco, CA, Princeton, NJ, and Branford, CT. To learn more, visit www.valohealth.com.

Apply for this Job

* Required

resume chosen  
(File types: pdf, doc, docx, txt, rtf)
cover_letter chosen  
(File types: pdf, doc, docx, txt, rtf)


Voluntary Self-Identification

For government reporting purposes, we ask candidates to respond to the below self-identification survey. Completion of the form is entirely voluntary. Whatever your decision, it will not be considered in the hiring process or thereafter. Any information that you do provide will be recorded and maintained in a confidential file.

As set forth in Valo Health’s Equal Employment Opportunity policy, we do not discriminate on the basis of any protected group status under any applicable law.

Race & Ethnicity Definitions

If you believe you belong to any of the categories of protected veterans listed below, please indicate by making the appropriate selection. As a government contractor subject to the Vietnam Era Veterans Readjustment Assistance Act (VEVRAA), we request this information in order to measure the effectiveness of the outreach and positive recruitment efforts we undertake pursuant to VEVRAA. Classification of protected categories is as follows:

A "disabled veteran" is one of the following: a veteran of the U.S. military, ground, naval or air service who is entitled to compensation (or who but for the receipt of military retired pay would be entitled to compensation) under laws administered by the Secretary of Veterans Affairs; or a person who was discharged or released from active duty because of a service-connected disability.

A "recently separated veteran" means any veteran during the three-year period beginning on the date of such veteran's discharge or release from active duty in the U.S. military, ground, naval, or air service.

An "active duty wartime or campaign badge veteran" means a veteran who served on active duty in the U.S. military, ground, naval or air service during a war, or in a campaign or expedition for which a campaign badge has been authorized under the laws administered by the Department of Defense.

An "Armed forces service medal veteran" means a veteran who, while serving on active duty in the U.S. military, ground, naval or air service, participated in a United States military operation for which an Armed Forces service medal was awarded pursuant to Executive Order 12985.


Form CC-305

OMB Control Number 1250-0005

Expires 05/31/2023

Voluntary Self-Identification of Disability

Why are you being asked to complete this form?

We are a federal contractor or subcontractor required by law to provide equal employment opportunity to qualified people with disabilities. We are also required to measure our progress toward having at least 7% of our workforce be individuals with disabilities. To do this, we must ask applicants and employees if they have a disability or have ever had a disability. Because a person may become disabled at any time, we ask all of our employees to update their information at least every five years.

Identifying yourself as an individual with a disability is voluntary, and we hope that you will choose to do so. Your answer will be maintained confidentially and not be seen by selecting officials or anyone else involved in making personnel decisions. Completing the form will not negatively impact you in any way, regardless of whether you have self-identified in the past. For more information about this form or the equal employment obligations of federal contractors under Section 503 of the Rehabilitation Act, visit the U.S. Department of Labor’s Office of Federal Contract Compliance Programs (OFCCP) website at www.dol.gov/ofccp.

How do you know if you have a disability?

You are considered to have a disability if you have a physical or mental impairment or medical condition that substantially limits a major life activity, or if you have a history or record of such an impairment or medical condition.

Disabilities include, but are not limited to:

  • Autism
  • Autoimmune disorder, for example, lupus, fibromyalgia, rheumatoid arthritis, or HIV/AIDS
  • Blind or low vision
  • Cancer
  • Cardiovascular or heart disease
  • Celiac disease
  • Cerebral palsy
  • Deaf or hard of hearing
  • Depression or anxiety
  • Diabetes
  • Epilepsy
  • Gastrointestinal disorders, for example, Crohn's Disease, or irritable bowel syndrome
  • Intellectual disability
  • Missing limbs or partially missing limbs
  • Nervous system condition for example, migraine headaches, Parkinson’s disease, or Multiple sclerosis (MS)
  • Psychiatric condition, for example, bipolar disorder, schizophrenia, PTSD, or major depression

PUBLIC BURDEN STATEMENT: According to the Paperwork Reduction Act of 1995 no persons are required to respond to a collection of information unless such collection displays a valid OMB control number. This survey should take about 5 minutes to complete.