Overview

Black Canyon Consulting (BCC) is searching for a Data Wrangler to support National Center for Biotechnology Information (NCBI). This opportunity is full time at the NIH-NCBI in Bethesda, MD and/or remote work.

The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). NCBI is the world’s premier biomedical center hosting over six million daily users that seek research, clinical, genetic, and other information that directly impacts biomedical research and public health.

Job Description

The candidate should have extensive Python experience, including scripting and data processing design. The individual will collaborate with the lead curator to support the Human Variation team in assessing external variation resources for acceptability for submission to dbSNP and dbVar, as well as developing appropriate data import methodologies and pipelines. This work aims to improve the breadth and accuracy of variation data while also minimizing the amount of manual curation required in the process. Other bioinformatic tasks and analyses are performed to guarantee that the data is well-prepared, consistent, and suitable for downstream applications, resulting in relevant biological insights from genetic variation data and subsequent analysis.

The overall goal of the Data Wrangler position is to enhance the efficiency, accuracy, and reliability of genetic variation data within the Human Variation team's databases, specifically dbSNP and dbVar. The candidate is expected to leverage their extensive Python experience to perform a range of bioinformatic tasks and analyses.

Educational Requirements

B.S. in a STEM field (Engineering, Computer Science, Mathematics, Physics)
Alternatively, equivalent industry experience in bioinformatics or a related field

Required Skills:

Data collection, integration and cleaning
Data transformation, normalization, and preprocessing,
Scripting automation, Scripting in Bash, Python, or other shell scripting languages
Implement custom analysis
SQL queries for data extraction, transformation, and loading (ETL)
Experience running operations in a large and complex environment, preferably in data operations
Ability to troubleshoot an operational pipeline to identify highest priority problems and identify solutions
Team collaboration, onboarding, and documentation

Other Desired Experience/Expertise:

Knowledge of existing workflow languages and frameworks
Work experience with production-level bioinformatics databases and pipelines
Familiarity with technical environments, complex databases, and process flows
Experience with XML schemas
Familiarity with Jira and Confluence
Experience with Agile processes, especially scrum

Bonus Skills

Strong presentation skills

We attract the best people in the business with our competitive benefits package that includes medical, dental and vision coverage, 401k plan with employer contribution, paid holidays, vacation, and tuition reimbursement. If you enjoy being a part of a high performing, professional service and technology focused organization, please apply today!

Apply for this Job

* Required

First Name *

Last Name *

Email *

Phone *

Resume/CV *

Dropbox Google Drive

(File types: pdf, doc, docx, txt, rtf)

Cover Letter

Dropbox Google Drive

(File types: pdf, doc, docx, txt, rtf)

When autocomplete results are available use up and down arrows to review

+ Add another education

LinkedIn Profile

GitLab

GitHub

Website

Our system has flagged this application as potentially being associated with bot traffic. Please turn off any VPNs, clear your browser cache and cookies, or try submitting your application in a different browser. If this issue persists, please reach out to our support team via our help center.

Please complete the reCAPTCHA above.

Data Engineer

Overview

Apply for this Job