MobileCoin is building the future of digital payments. We want to make a cryptocurrency you can use everyday to pay for anything.
We are passionate about implementing scientific and mathematical methods to explore, isolate, and solve problems in the global financial markets while respecting user-privacy. We believe that career fulfillment and enterprise success converge when smart, hard-working, and intellectually curious people come together with a shared goal of innovation, and the pursuit of excellence.
The SRE Role:
The Site Reliability Engineer will join MobileCoin's infrastructure team with a focus on system performance, reliability, and observability. You will work with the Head of Infrastructure and the engineering team to develop and grow the MobileCoin infrastructure to meet the demands of clients and node operators alike.
This is a unique opportunity for a seasoned engineer and technologist to have a large impact in a senior and brilliant team at an early stage of development. It is an opportunity to hone and develop your skills in DevOps and software engineering in a system that is refreshingly and challengingly different from standard multi-tier web-based microservice systems.
- Maintain, monitor and improve our Kubernetes clusters.
- Maintain, improve, scale and secure our Azure infrastructure and Ubuntu Linux systems.
- Assist our development teams in running, packaging, deploying and troubleshooting applications
- Work with developers on streamlining deployment processes with Jenkins and other tooling
- Be responsible for maintenance and improvements to multiple internal services, for example Kubernetes, Prometheus, and Logging
- Monitor, triage and respond to alerts in our 24/7/365 environment.
- Participate in design and code reviews, and ensure that the foundation for our services is best in class.
- Evaluate new technologies, design and implement as appropriate.
- Identify automation opportunities and implement by creating custom or by using off the shelf solutions.
- You have 3+ years of experience of working in cloud-based systems operations, as a Linux systems administrator, SRE or DevOps engineer.
- You’re very comfortable with Linux command line
- You have extensive experience with Docker (building and running containers), and container orchestration (Kubernetes preferred)
- You have experience with Prometheus and Grafana (preferred), or other monitoring systems (InfluxDB, StatsD, Graphite, etc)
- Experience with CI pipelines and Jenkins
- You are security minded and follow standard security best-practices (least-privilege, common attack defenses, etc)
- You have a good understanding of computer networking, TCP/IP, load balancing, distributed computing, web services, and the fundamental protocols used by the internet (HTTP, HTTPS, DNS, etc.).
- You have experience supporting production workloads and are familiar with monitoring concepts and tooling. You’re able to take part in an on-call rotation
- You're proficient in at least one scripting language and you are familiar with a few (Python, Bash, etc.).
- You're enthusiastic about working in a small, growing team, you are open, empathetic, and care about putting the best ideas forward in a collaborative and helpful manner.
- You can work independently and are able to deliver results without supervision.
Nice to Have
- Experience with Azure
- Experience with Terraform
- Experience with Rust and/or C/C++
- Experience with advanced CPU features in a container environment (SGX, GPU, etc)