Making a job change is a big decision. Why consider Aptos?

You will join a team of remarkable colleagues who are committed and passionate about creating and delivering leading-edge solutions to the retail market. You will be part of an exciting growth journey where we will do everything possible to help you reach and exceed your career dreams. Our colleagues have access to industry-leading training and development opportunities, and the chance to work in a global, diverse culture with offices in 13 countries. You will be part of an inclusive culture that is grounded in our Company's purpose: to make a difference for every colleague, every client, every day.

With years of deep retail DNA, Aptos has been a market-leading platform that drives the world’s largest retailers’ product, promotion, commerce and merchandising decisions across online and brick-and-mortar operations. The opportunity at Aptos has never been greater, as we transition our solutions to cloud-native, microservices architecture. More than 135,000 retail locations impact nearly $2 trillion in annual revenue across fashion, grocery, drug, convenience, general merchandise, discount and sporting goods stores optimized with Aptos’ solutions. We hope you’ll be a part of taking innovative solutions to market with the leader in Unified Commerce.

 

 We’re looking for a passionate and talented teammate to help lead us scale and accelerate infrastructure and software deployment at Revionics.If you’re passionate about cloud-native technologies and infrastructure-as-code, we would love to hear from you. We are focused on implementing advanced cloud-native technologies and practices while driving the continuous delivery posture of the organization.  

 

Our ideal candidate is a self-starter and has excellent communication skills. Our collaborative environment relies heavily on innovation, technical savvy, and problem-solving skills. This is a full-time position in-office at our Bangalore, India location. As a Senior Site Reliability Engineer, you’ll be a major contributor to the company’s success. You’ll work with teams across the organization to build performant, reliable and highly scalable software systems. Your technical leadership will help drive continuous integration & delivery for our market leading AI Saas Products for the retail industryOur Next-Gen Infrastructure stack is based on GCP, Linux, Windows, Terraform, Kubernetes, and Gitlab. 

 

 

Required Skills: 

  • Passion for reliable, scalable, observable software with strong sense of ownership. 
  • 6 + years’ experience developing and monitoring mission-critical systems. 
  • Understanding of and ability to drive site reliability engineering concepts and practices such as SLOs, SLIs, error budgets, and their practical application in cloud environments. 
  • Hands on experience with Docker and Kubernetes preferably on Google Cloud. 
  • Proficiency working with and understanding a containerized development workflow 
  • Strong background in Linux/UNIX administration (e.g., RedHat/CentOS 7/Alpine Linux). 
  • Strong background in Windows administration and troubleshooting (Windows 2019+). 
  • Experience in Collaborating with engineering and operations teams to architect scalable solutions, conduct capacity planning, and optimize resource utilization on GCP. 
  • Experience in leading incident management, root cause analysis, and resolution efforts for critical incidents affecting GCP services, ensuring swift resolution and minimal impact on operations using automation. 
  • Expert in Infrastructure as Code (IaC) tools like Packer and Terraform. 
  • Experience with configuration management tools like Puppet or Ansible. 
  • Experience in deploying large scale Docker based environments with Kubernetes, OpenShift, or similar product. 
  • Experience with languages like Bash, Python, or Go. 
  • Experience with Kubernetes networking components (e.g., CNI plugins, Service Mesh technologies like Istio, etc. 
  • Experience implementing Application clustering / load balancing concepts and technologies. 
  • Proficiency with networking fundamentals, diagnostic, troubleshooting, etc. 
  • Proficient in using command line tools to quickly triage and fix production issues. 
  • Understanding of protocols/technologies like HTTP, SSL, LDAP, SQL, HTML, XML 

 

Responsibilities: 

 

  • Lead best practices for building and operating highly reliable systems. 
  • Lead best practices for high availability, reliability, and performance of systems and services by implementing and adhering to reliability engineering practices. 
  • Lead automation efforts in monitoring and observability and to reduce cloud spend. 
  • Lead efforts to implement robust monitoring systems, define key metrics and alerts, and enhance continuous system health monitoring to proactively identify and address potential issues. 
  • Identify bottlenecks, analyse system performance, and optimize configurations to enhance efficiency and performance of systems and services using automation. 
  • Create and maintain documentation, best practices, runbooks, and share knowledge with the team to ensure a consistent understanding of systems and processes. 

We offer a competitive total rewards package including a base salary determined based on the role, experience, skill set, and location. For those in eligible roles, discretionary incentive compensation may be awarded in recognition of individual achievements and contributions. We also offer a range of benefits and programs to meet employee needs, based on eligibility. 

We are an Equal Opportunity Employer. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability, or status as a protected veteran. By submitting an application for this job, you acknowledge that any personal data or personally identifiable information that you provide to us will be processed in accordance with our Candidate Privacy Notice.

Apply for this Job

* Required

resume chosen  
(File types: pdf, doc, docx, txt, rtf)
cover_letter chosen  
(File types: pdf, doc, docx, txt, rtf)


Our system has flagged this application as potentially being associated with bot traffic. Please turn off any VPNs, clear your browser cache and cookies, or try submitting your application in a different browser. If this issue persists, please reach out to our support team via our help center.
Please complete the reCAPTCHA above.