Bloomreach is the leader in digital and commerce experience™. Our flagship product, brX, is the only digital experience platform built specifically for brands, retailers and B2B companies who want to grow their revenue online while delivering each of their customers a premium, personalized experience. brX is the only platform that combines content and experience management with market-leading, AI-driven search, merchandising and personalization together in one efficient, modern platform.
Bloomreach serves over 250 companies globally including Neiman Marcus, Staples, NHS Digital, Bosch, Puma, and Marks & Spencer. A global network of certified partners includes Accenture Interactive, WPP, and market-leading commerce platforms.
Our India team is a critical product development and innovation hub for the company, with many of the company’s key products and technology initiatives created and developed by the India team. Bloomreach Search and Merchandising (brSM) was built and shipped out of our India Office.
We are currently allowing flexibility for all our employees to work from home until it is deemed safe for us to return back to the office.
We are looking for a Lead Site Reliability Engineer, who will be responsible for the following:
- Design, build and maintain infrastructure through reusable code and tooling
- Automate provisioning and application deployment
- Contribute in system architecture and design
- Define and own Service Level Objectives (SLO) and Service Level Indicators (SLI) for applications
- Proactively test the flexibility and resilience of the system
- Improve service observability by measuring latency, traffic, error rate and saturation
- Interact with other engineering teams to help them improve availability, reliability and resilience of our infrastructure and systems
- Debug, optimize code and automate routine tasks
- Be a reference for the team and lead the overall delivery
The desired profile is:
- Bachelor's degree in Computer Science, a related technical field involving software/systems engineering
- Experience of at least 6 to 8 years in an SRE/Operations/DevOps role running distributed systems in production
- Experience in leading teams
- Experience with programming in at least one of the following languages: C, C++, Java, Python
- Experience with algorithms and data structures
- Experience with AWS (or GCP) Infrastructure Services
- Deep understanding of Kubernetes and Docker along with Linux internals
- Experience with scaling, monitoring and troubleshooting actively running systems
- Comfortable with configuration management tools: Ansible, Chef, Puppet, etc.
- Experience with large-scale data processing (e.g Hadoop), Linux serving systems, and distributed database systems (e.g. Cassandra)
- Understanding of Data Science concepts