Klaviyo is growing fast and we have openings for all skill levels across all of our teams. Learn more about our engineering culture at https://klaviyo.tech
Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineering team is to ensure uninterrupted service for Klaviyo customers and act as a force multiplier for Klaviyo product teams to deliver better software faster.
The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as debugging distributed systems, building self-healing applications and eking out every drop of performance possible.
As a Site Reliability Engineer you will own foundational Klaviyo services and make a big impact on the productivity of our product engineering teams.
How You'll Make a Difference
Ship foundational services to enable Klaviyo engineering to move faster with confidence
Design and develop systems and processes that enable highly available & scalable systems
Achieve break-throughs in systems throughput by identifying and eliminating bottlenecks
Leverage technology such as Python, AWS, Django, Kubernetes, Bash, Terraform, MySQL, Redis, Cassandra, Postgresql to advance Klaviyo’s platform
Champion best practices by actively collaborating with other teams in a culture that values whiteboarding and technical design review
Contribute to the company in multiple areas, constantly pushing yourself to be a better engineer and to level up all of your peers within your team and within Klaviyo.
Design, write and deliver software to dramatically improve the availability, scalability, latency, and efficiency of Klaviyo’s services
Participate in periodic on call duties with a focus on solving issues when they are discovered, preventing recurrences and minimizing alert fatigue
Implement architectural improvements to achieve breakthrough results in Klaviyo systems’ operational scalability and reliability.
Work hand-in-hand with product-facing engineers and other SREs to ship impactful code
Perform quantitative analysis to understand and scale Klaviyo systems
Uncover and advocate for preventative, upstream solutions with internal stakeholders
Evangelize Site Reliability best practices across the engineering organization
Who You Are
BA or BS Degree in Computer Science, related field, or equivalent experience
Ability to handle yourself in outage situations and to drive failures to root cause analysis and prevention of future issues
Understanding of Linux (we run Ubuntu) and all layers of the networking stack
Experience working on an engineering team building software
Experience writing code using best practices in a language such as Python, Ruby, Go, etc.
Get to know Klaviyo
Klaviyo is a world-leading marketing automation platform dedicated to accelerating revenue and customer connection for online businesses. Klaviyo makes it easy to store, access, analyze and use transactional and behavioral data to power highly-targeted customer and prospect communications. The company's hybrid customer-data and marketing-platform model allows companies to grow by fostering direct relationships with customers, without giving up their valuable data to popular big-tech ad platforms. Over 265,000 innovative companies like Unilever, Custom Ink, Living Proof and Huckberry sell more with Klaviyo. Learn more at www.klaviyo.com.
Klaviyo does not tolerate and prohibits discrimination, harassment or retaliation of or against job applicants, contractors, interns, volunteers or employees by another employee, supervisor, vendor, customer or any third party.