Job Title: Data Scientist, Knowledge Representation and Data Curation
About Us
Valo Health is a biotechnology company that was created with the belief that drug discovery and development should be faster and less expensive, with a much higher probability of success. To achieve this goal, we are pioneering a novel, fully integrated approach that combines data and machine learning insights at every step of the process. We are a multi-disciplinary team that brings together experts at every phase of software and drug development to create a cohesive platform. Our end goal is to create life-changing medical treatments by combining expertise in technology and life sciences with a comprehensive view of the entire drug discovery and development process.
Valo is committed to hiring a world-class team that brings together a wide variety of different skills and experiences. We are committed to inclusion across race, gender, age, religion, identity, and experience, and believe that diversity makes us stronger by bringing in new ideas and perspectives. We strive to create a workplace that cultivates bold innovation through collaboration and empowers our people to unleash their full potential.
About the Role...
As a Data Scientist, Knowledge Representation and Data Curation, you will execute key initiatives in data strategy and knowledge management at Valo Health. You will extract, interpret, and synthesize an extremely diverse set of biotech data - electronic health records, computational chemistry and biology datasets, digital health and patient engagement, imaging, clinical trial, etc - with the goal to standardize this data, making it machine readable, interoperable, and more discoverable/accessible. Successful candidates will be excellent communicators with a growth mindset and ability to harmonize both people and data.
What You’ll Do…
- Collaborate with all stakeholders of the data, from the producers/submitters to the consumers by communicating and influencing integration among software engineers, data scientists, scientific domain experts, and clinicians
- Define and refine rules and standards (as data types and user requirements evolve) for a wide variety of biological, chemical, clinical, and other biotechnology datasets.
- Create, augment, maintain, and deploy ontologies (both formal and informal vocabularies, taxonomies, and dictionaries)
- Develop, test, and execute data curation pipelines
- Perform various data quality assurance tasks including algorithms and tooling for data quality control
- Write and maintain data wiki, data dictionaries, and schemas
- Employ leading NLP and text mining algorithms to generate training and test data for machine learning applications.
What You Bring…
- BS or MS in biology, chemistry, biochemistry, computer science, epidemiology, data science, biomedical engineering, public health or related field
- Experience with ontologies and graphical representations of data (RDF, OWL, Neo4j)
- Experience with REST APIs and version control
- Experience and working knowledge of SQL and Python and/or R
- Demonstrated experience in the collection, storage, transformation, standardization, harmonization and analysis of data stored in a variety of formats (CSV, JSON, Relational databases like MySQL, PostgreSQL, SQLServer Oracle, unstructured text)
- Experience or education in one or more areas of drug development (biology, chemistry, electronic health records, clinical trials, etc)
- Demonstrated experience in self training and exploring new technologies
- Strong written communication and interpersonal skills
You May Also Bring…
- PhD in computational biology, computational chemistry, biomedical engineering, computer science, data science, epidemiology, public health, or related field
- Experience or skill sets in project management and process implementation
- Experience deploying ontologies in machine learning applications
- Experience with text mining tools, text embedding/language models, and development experience using NLP libraries (BERT, BioBERT, spaCy, NLTK, cTAKES, SciBite, Linguamatics, etc)
More on Valo
Valo Health is a privately held company founded by David Berry, a General Partner at Flagship Pioneering who has founded over 30+ leading companies across life sciences, technology, and sustainability with three companies valued at >$3B+ including Indigo Agriculture, the #1 on the CNBC Disruptor list and ~15 IPOs and acquisition. David was instrumental in creating Flagship VentureLabs, which is where the Firm's 100+ companies - worth more than $38B in aggregate value - have been conceived, incubated and launched. In addition, David has been broadly recognized as a world-leading innovator: elected as a Young Global Leader by the World Economic Forum, named as Innovator of the Year by Technology Review from amongst its Annual TR35 list, and selected as one of 12 Innovators Reshaping Reality by the U.S. State Department, alongside pioneers such as Tim Berners-Lee. David and his companies have been awarded with over 150 additional awards and honors and he holds over 200 patents and patent applications. David currently serves on the United Nations Sustainable Development Solutions Network (UN SDSN), where he was a Founding Leadership Council Member.