As a System Reliability Engineer, you'll be responsible for maintaining and monitoring our hosted customers as well as managing our SaaS applications. InfluxData runs over 600 AWS instances on our stack, which include InfluxDB, Telegraf, Kapacitor and going forward Chronograf, our monitoring application. Our infrastructure is CoreOS running Docker containers and some in house built tools written in Go to manage deployment and management of the applications running in the cloud.
We're looking for candidates who value continually mastering the craft of coding and building tools that benefit the open source development community. You'll need to be open to experimenting, which means that sometimes you might fail. (That's okay.) Along with that, you'll need the persistence to keep going and try again. You'll also be working closely with your team, so you'll need to be empathic, supportive, and excited to teach as well as learn.
- Maintain existing hosted customers and deployment applications
- Write automation tools in go to keep customers up and going
- Write alerting and monitoring scripts in Kapacitor’s TICK script language to report and fix errors
- Handle outages and propose new automation methods to keep the systems live
- 2 - 3 years experience with modern go
- Some experience with python
- Experience with automating the sys admin process
- Desire to start working with open source software
- Experience AWS tools and APIs
- Extensive experience with Git
- Open-source contributions
- You must be authorized to work in the United States
We are funded by Mayfield Fund, Trinity Ventures, and Battery Ventures. We cover 100% of medical, dental, and vision insurance for employees and dependents.