Black Canyon Consulting (BCC) is searching for a Data Engineer to support National Center for Biotechnology Information (NCBI). This opportunity is full time at the NIH-NCBI in Bethesda, MD and/or remote work.
The National Center for Biotechnology Information (NCBI) is part of the National Library of Medicine (NLM) at the National Institutes of Health (NIH). NCBI is the world’s premier biomedical center hosting over six million daily users that seek research, clinical, genetic, and other information that directly impacts biomedical research and public health.
We attract the best people in the business with our competitive benefits package that includes medical, dental and vision coverage, 401k plan with employer contribution, paid holidays, vacation, and tuition reimbursement. If you enjoy being a part of a high performing, professional service and technology focused organization, please apply today!
The data engineer will work with a talented group of NCBI scientists and software developers, to analyze, monitor and improve the data analysis pipelines in NCBI's world-premier biomedical data resources such as the GenBank, part of the International Nucleotide Sequence Database Collaboration(INSDC), exchanging data with the DNA DataBank of Japan (DDBJ) and the European Nucleotide Archive (ENA) on a daily basis. Currently, the openings are for the Sequence Read Archive (SRA), the world's largest publicly available repository of high-throughput sequencing data, available in multiple cloud providers and NCBI servers, and also a part of INSDC. SRA is a Big Data archive measured in tens of petabytes of stored data. The future development of SRA will make this data more useful for wide variety fields: Medical Health (genetic diseases, cancer, etc.), Public Health (food safety monitoring, antimicrobial resistance, viral outbreaks, etc.), microbial diversity, and many more.
- B.S. in a STEM field (Engineering, Computer Science, Mathematics, Physics)
- Alternatively, equivalent industry experience in bioinformatics or a related field
- Experience running operations in a large and complex environment, preferably in data operations
- Relational databases, SQL
- Scripting in Bash, Python, or other shell scripting languages
- Experience with LINUX/UNIX
- Ability to troubleshoot an operational pipeline to identify highest priority problems and identify solutions
- Excellent interpersonal skills and the ability to work as part of a team
Other Desired Experience/Expertise:
- Knowledge of existing workflow languages and frameworks
- Work experience with production-level bioinformatics databases and pipelines
- Familiarity with technical environments, complex databases, and process flows
- Experience with the NCBI Sequence Read Archive (SRA) or GenBank databases and tools like BLAST or other DNA sequence analysis software
- Cloud technologies like Kubernetes, Athena, and BigQuery
- Experience with XML schemas
- Familiarity with Jira and Confluence
- Experience with Agile processes, especially scrum
- Background in virology is a plus
- Strong presentation skills