Site Reliability Engineer, Integrated Systems
At Wayfair, we are looking to strengthen and grow our Production Operations team by bringing on board talented SRE engineers to join our platform team that manages large scale physical and virtual server environment that underpins our global e-commerce platform. In this role, you will be involved with and exposed to a wide variety of systems and technologies. The team is aggressively moving towards “Infrastructure as Code” model. We are looking for someone with “automation” mindset.
What You’ll Do:
- Own SLA for Production Systems
- Drive faster MTTD/MTTR for critical systems
- Troubleshoot independently Sev1/Sev2 incidents
- Own/Manage/Support baseline Operating System image/templates
- Support CI/CD
- Maintain and review DSC/puppet modules and support the provisioning infrastructure
- Own and operate all package repos (python/java/rpm/etc.) using tools such as Pulp, Artifactory, etc.
- Drive efficiencies across hybrid cloud (GCP, Azure, OnPrem)
- Invent innovative ways to drive production operational efficiency
- Drive scalability and operability of supported systems/infrastructure
- Own production systems scaling & throughput
- Work with other teams to provide consultations in systems architecture support for new and existing production systems
- Participate in on-call rotation
- Create and maintain detailed documentation
Some of our larger initiatives include:
- End to End Automation of system builds
- Ongoing scaling of our platform to support forecasted holiday traffic
- Puppet module standardization and improving automation of systems, processes, and services
- Data center expansion and moving to the cloud
What You’ll Need:
- BA or BS degree from a 4-year college or university desired
- Minimum three years systems administration/Site Reliability/Platform/DevOps background
- Experience with infrastructure including but not limited to data center operations, server hardware, web servers (IIS, jboss, etc.), databases (MS SQL, mySQL, mongoDB), virtualization (VMware), networking, storage, monitoring, etc.
- Experience with structured programing languages (PowerShell, Python, etc.)
- Experience with .NET and RestAPI is a plus
- Experience with continuous integration platforms such as Jenkins, Bamboo, Gitlab CI etc.
- Understanding of Agile, ITIL, DevOps practices such as CI/CD, automated testing etc.
- Experience with JVM tuning and optimal configuration
Wayfair is one of the world’s largest online destinations for the home. Whether you work in our global headquarters in Boston or Berlin, or in our warehouses or offices throughout the world, we’re reinventing the way people shop for their homes. Through our commitment to industry-leading technology and creative problem-solving, we are confident that Wayfair will be home to the most rewarding work of your career. If you’re looking for rapid growth, constant learning, and dynamic challenges, then you’ll find that amazing career opportunities are knocking.
No matter who you are, Wayfair is a place you can call home. We’re a community of innovators, risk-takers, and trailblazers who celebrate our differences, and know that our unique perspectives make us stronger, smarter, and well-positioned for success. We value and rely on the collective voices of our employees, customers, community, and suppliers to help guide us as we build a better Wayfair – and world – for all. Every voice, every perspective matters. That’s why we’re proud to be an equal opportunity employer. We do not discriminate on the basis of race, color, ethnicity, ancestry, religion, sex, national origin, sexual orientation, age, citizenship status, marital status, disability, gender identity, gender expression, veteran status, or genetic information.