Honey is a fast-growing startup based in Los Angeles. Our online shopping platform offers users a smarter way to shop. We open up instant access to exclusive savings, deals, rewards and discovery, all powered by the collective knowledge of Honey's community of online shoppers. We are helping millions save when they shop online, and we're hiring!! We are actively seeking a Senior Site Reliability Engineer to join our Los Angeles team.
If you aren't already in the Los Angeles area, don't fret - we'll move you here!
As a member of our Site Reliability team, you’ll recommend and implement changes across our systems and environments, evaluate new technologies, and contribute to our technical direction. We primarily use Google Cloud Platform, Terraform, Python, Node.js, and CircleCI and have a microservice-based architecture using Docker and running on Kubernetes. We value individuals who are curious, collaborative, able to communicate effectively, and passionate about open-source software and new system architecture trends.
We’re looking for a Senior Site Reliability Engineer to design and implement infrastructure solutions to improve the scalability and efficiency of Honey’s services. The ideal candidate should possess a background in systems and / or software engineering, automation, cloud computing, and build tooling, as well as strong problem solving abilities.
- Collaborative, curious, and able to communicate effectively
- Experience leading teams and / or mentoring team members
- Strong experience with architecture, ideally in cloud-native type environments
- Production experience with major public cloud providers -- we use GCP, but experience with AWS or Azure is great
- Experience managing and resolving production incidents
- Containers and container orchestration (Docker, Kubernetes)
- Expertise in monitoring and metrics (Datadog, Prometheus, New Relic)
- Familiar with IAC / infrastructure automation (Terraform, Chef, Puppet, Ansible)
- Comfort with databases and in-memory key/value stores (MySQL, Postgres, Redis, MongoDB)
- Solid knowledge of Linux/UNIX and networking fundamentals
In this role you’ll:
- Maintain the core infrastructure
- Manage, monitor, and improve highly scalable, distributed systems to create highly available services
- Collaborate with engineers in the deployment and scaling of new product features
- Investigate production incidents, and help determine contributing factors / implement fixes
- Identify and automate repetitive, manual tasks.
- Develop effective tooling, alerts, and responses to both identify and address reliability risks
- Debug software at the code and infrastructure level
- Plan for the growth of Honey’s infrastructure and help define best practices
- Participate in an on-call rotation
Bonus Points For:
- Experience with chaos engineering and related disciplines
- Experience with Golang
- Previous experience with GCP
- Experience with service discovery or service meshes