As a Site Reliability Expert (SRE) part of our global cloud platform & operation team, you'll be supporting Lightspeed's growing development teams with the infrastructures and tools needed to run our products in a reliable, efficient and secure manner by implementing, advising and advocating the well-known DevOps principles.

What you’ll be responsible for

  • Initiating and contributing to the continuous improvement of our software delivery processes and practices in a multi-location, multidisciplinary team to empower and accelerate product development
  • Using automation extensively to design, configure, manage, and monitor systems in support of our product development teams
  • Contributing to the development of CI/CD pipeline that adheres to performance and security standards defined by the organization, emphasizing  cloud platform integration and self-service workflows
  • Assisting with infrastructure and tooling hardening to meet business and compliance requirements Designing and architecting operational solutions with the specific goal of increasing the standardization, automation, repeatability, cost-efficiency and consistency of operational tasks
  • Working with developers and other SRE to design and build scalable,reliable and cost-efficient Cloud infrastructure
  • Writing and maintaining architectural, stakeholder, policy and processes documentation 
  • Adhering to and advocating for best practices, including Infrastructure as Code, monitoring, high availability, disaster recovery, security, and DevOps methodologies
  • Collaborating with development teams and using intuition, experience, and understanding to create SLIs, SLOs, and SLAs 
  • Providing timely assistance and remediation solutions during critical situations and production incidents to help resolve service problems (You will be part of an on-call rotation)

What you’ll be bringing to the team

  • Proficiency developing in one or more languages such as Python, Golang, Ruby, PHP, JavaScript and/or others
  • Experience delivering scalable CI/CD solutions to organizations
  • Good knowledge of Amazon Web Services and/or Google Cloud Platform
  • Good understanding of Agile development and continuous delivery best practices, software engineering tools, processes, methods, and testing
  • Strong experience with Docker, Kubernetes, Helm, Linux Systems and databases (SQL and/or NoSQL)
  • Strong experience with monitoring and alerting tools (New Relic, PMM, etc…)

Apply for this Job

* Required