Coin Metrics is a leading provider of cryptoasset data for institutions. We deliver transparent and actionable data and analytics to various industry stakeholders including asset managers, custodians, trading venues, research desks, and data/application providers. Coin Metrics’ data empowers its clients and the public to better understand, use and value open crypto networks.
Join a fast-paced startup pioneering novel metrics, data products, and intelligence solutions, which offer insights into the economics, markets, usage, health, and other aspects of public cryptocurrency blockchains like Bitcoin and Ethereum and other crypto networks.
You will be surrounded by talented people passionate about decentralized economies and the data behind them. Break new ground, create exciting new data-driven research and products, and help shape the future of finance.
Coin Metrics is recruiting a Site Reliability Engineer to monitor production and staging systems and help continuously improve our blockchain node and market data hosting infrastructure. In addition, this role will be involved in collecting statistics on reliability, creating and improving monitoring and alerting systems and assisting our Infrastructure Engineers in some ongoing tasks related to the deployment and maintenance of our platform technology including the company’s data storage, collection and processing capabilities.
Join a fast-paced startup pioneering novel metrics, data products, and insights surrounding the economics, markets, usage, health, and other aspects of public cryptocurrency blockchains like Bitcoin, Ethereum and numerous other crypto networks.
You will be surrounded by talented people passionate about decentralized economies and the data behind them. Break new ground, create exciting new data-driven research and products, and help shape future economies.
Key responsibilities include:
- Ongoing monitoring of our infrastructure platform, including servers, networks, applications, databases and other platform technologies.
- Maintaining and improving monitoring, alerting and security of existing infrastructure.
- Building advanced monitoring and alerting tools.
- Compiling and managing infrastructure and security documentation including systems inventory, alert procedures, and process run books.
- Troubleshooting and assisting in diagnosing, solving or escalating infrastructure problems.
- Monitoring databases, queues, container orchestration systems and other fundamental technologies, and participating in devising a strategy of platform reliability.
- 4+ years of Linux systems administration experience
- Previous SRE experience and knowledge of best practices
- Experience with monitoring and alerting technologies (Prometheus, Grafana)
- Experience with PostgreSQL; understanding of replication, failover, backups
- Docker experience
- Solid command of Ansible, Salt or like technologies
- Solid command of scripting languages (Python, Bash, etc.)
- Capability to write concise technical documentation
- Understanding of reverse proxying and load balancing (NGINX, HAProxy, etc.)
- Experience with Git
- Strong sense of ownership, entrepreneurial spirit, and/or startup-like experience, capable of driving towards solutions independently while seeking feedback when appropriate.
- Cryptocurrency experience
- Experience managing large PostgreSQL clusters and highly available
- PostgreSQL setups
- Experience operating cryptocurrency nodes beyond Bitcoin and Ethereum.
- Experience advanced filesystems specifically ZFS
- Experience with CI pipelines (Gitlab CI, Jenkins, etc.)
- Familiarity with overlay networking e.g. Docker Swarm or WireGuard
- Familiarity with AWS or GCP.