Honey is a fast-growing startup based in Los Angeles. Our online shopping platform offers users a smarter way to shop. We open up instant access to exclusive savings, deals, rewards and discovery, all powered by the collective knowledge of Honey's community of online shoppers. We are helping millions save when they shop online, and we're hiring! We are actively seeking a Senior Site Reliability Engineer to join our Los Angeles team.
We’re looking for a Senior Site Reliability Engineer to join Honey. This person will focus on managing and maintaining some of Honey's core infrastructure and build tooling, advising on cloud architecture, monitoring, incident management, and containerization best practices, as well as helping maintain build and deployment pipelines. As a Senior Site Reliability Engineer, you'll take ownership of one or more projects in these areas, such as multi-regional deployments, or service meshes. You can expect to work very collaboratively with other SREs on the team, as well as with other software engineering teams within Honey. Your contribution will help us maintain a high degree of autonomy and empowerment for developers, while helping to improve reliability, observability, and stability.
- Collaborative, curious, and able to communicate effectively
- Experience leading teams and / or mentoring team members
- Strong experience with architecture, ideally in cloud-native type environments
- Production experience with major public cloud providers -- we use GCP, but experience with AWS or Azure is great
- Experience managing and resolving production incidents
- Containers and container orchestration (Docker, Kubernetes)
- Expertise in monitoring and metrics (Datadog, Prometheus, New Relic)
- Familiar with IAC / infrastructure automation (Terraform, Chef, Puppet, Ansible)
- Comfort with databases and in-memory key/value stores (MySQL, Postgres, Redis, MongoDB)
- Solid knowledge of Linux/UNIX and networking fundamentals
In this role you’ll:
- Maintain the core infrastructure
- Manage, monitor, and improve highly scalable, distributed systems to create highly available services
- Collaborate with engineers in the deployment and scaling of new product features
- Investigate production incidents, and help determine contributing factors / implement fixes
- Identify and automate repetitive, manual tasks.
- Develop effective tooling, alerts, and responses to both identify and address reliability risks
- Debug software at the code and infrastructure level
- Plan for the growth of Honey’s infrastructure and help define best practices
- Participate in an on-call rotation
Bonus Points For:
- Experience with chaos engineering and related disciplines
- Experience with Golang
- Previous experience with GCP
- Experience with service discovery or service meshes
At Honey, we are committed to building a diverse and inclusive company. We seek to create a culture where everyone can belong because we believe that people do their best work when they can show up every day as their authentic selves. We welcome people of different backgrounds, experiences, abilities, and perspectives.
Honey is an equal opportunity employer. We do not make hiring or employment decisions on the basis of race, religion, color, national origin, gender, gender identity, sexual orientation, age, marital status, veteran status, disability status or genetic information, in compliance with applicable federal, state and local law.