At Elastic, we have a simple goal: to solve the world's data problems with products that delight and inspire. As the company behind the popular open source projects — Elasticsearch, Kibana, Logstash, and Beats — we help people around the world do great things with their data. From stock quotes to real time Twitter streams, Apache logs to WordPress blogs, our products are extending what's possible with data, delivering on the promise that good things come from connecting the dots. The Elastic family unites employees across 30+ countries into one coherent team, while the broader community spans across over 100 countries.
Thanks to our ongoing expansion we have the opportunity to grow our Site Reliability team. We're a part of the Elastic Cloud engineering team with a focus on solving Cloud operations problems and keeping the SaaS online, who aren’t afraid to get our hands dirty. We are the first line of consumers for Elastic's products and our experience helps influence the direction of the stack. While most organizations may have a single or a handful of Elastic Stack deployments, here you’ll be responsible for identifying, troubleshooting and reporting platform problems to product engineers (or fixing the code yourself) in order to ensure that the thousands of Elasticsearch clusters we manage are providing a stable and reliable service. We’re looking for people who are just as passionate about troubleshooting issues with distributed systems as they are to automate, code and collaborate to solve problems.
What you will be doing
- You will design and implement networking and/or load balancing in the Elastic Cloud infrastructure services and collaborate on issues with production engineering colleagues
- You will participate in SRE software engineering, writing code for the continuing reduction of human intervention in operational tasks and automation of processes
- You will monitor the Elastic Cloud platform and Cloud infrastructure, responding to incidents, correcting and improving systems to prevent incidents and planning capacity
- You will manage Cloud provider infrastructure, system deployments and product releases
- You will be an escalation path in resolving Elastic Cloud customer support issues
- You will demonstrate and promote best practices for teams using Cloud platforms
- You will participate in 24x365 on-call schedules
What you bring
- You are either an experienced sysadmin with professional skills in Linux networking, preferably on distributed systems at scale; or a network engineer with real interest, and ideally some experience, in Cloud systems and automation
- You have professional exposure to Linux container networking, firewalls and VPN design and administration; and/or have been responsible for edge networking and load balancing using at least one Cloud service provider
- You are comfortable working alongside security engineers to design and implement Cloud security requirements
- You have significant professional experience using a public Cloud; AWS, GCP, Azure, SoftLayer or OpenStack
- You are comfortable writing software to automate API-driven tasks at scale. SRE use Python and Go regularly but are also encouraged to contribute to the product codebase in Java, Scala, and Python.
- You have used Ansible, Chef, Puppet, SaltStack or another configuration management suite
The details we are looking for
- Well developed knowledge of Linux and Cloud service provider networking, and/or designing and managing load balancing at scale. You know Cloud network security, kernel network tuning and container networking
- You’ve worked with a virtual routing platform (VyOS, Vyatta, Cisco v1000, FRR, Quagga, Bird, etc)
- Have a good understanding of network underlay and overlay technologies
- Have implemented or have a good understanding of SD-WAN
- Must be comfortable reviewing packet captures
- Have a strong understanding of IPSec/IKE
- Well developed knowledge of Internet protocols, including BGP, DNS and TCP
- Healthy knowledge of Linux (may have compiled your own kernel at some point and know how to trace syscalls)
- Relentless desire to automate and build software tools
- Desire to represent work in git, driven by a GitHub workflow through issues and pull requests
- Enjoy working remotely and the communication it requires
- Love a diverse environment, working with people all over the world
Experience in these areas is a plus:
- You’re familiar with VXLAN/evpn, VRFs and/or L2/L3 isolation
- You can tune a NIC for 10G, 25G, or 100G
- FRR, Bird, Calico, or Flannel mean something to you
- Love open source development, and have contributed to some project somewhere (doesn't have to be ours), whether through mailing lists, patches or documentation
We're looking to hire team members invested in realising the goal of making real-time data exploration easy and available to anyone. As a distributed company, we believe that diversity drives our vibe! Whether you're looking to launch a new career or grow an existing one, Elastic is the type of company where you can balance great work with great life.
- Competitive pay based on the work you do here and not your previous salary
- Global minimum of 16 weeks of paid in full parental leave (moms & dads)
- Generous vacation time and one week of volunteer time off
- Your age is only a number. It doesn't matter if you're just out of college or your children are; we need you for what you can do.
Elastic is an Equal Employment employer committed to the principles of equal employment opportunity and affirmative action for all applicants and employees. Qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender perception or identity, national origin, age, marital status, protected veteran status, or disability status or any other basis protected by federal, state or local law, ordinance or regulation. Elastic also makes reasonable accommodations for disabled employees consistent with applicable law.