Stacklet accelerates how organizations manage their cloud security, operations, compliance, asset visibility, and cost optimization. Our company was founded by the creators and maintainers of Cloud Custodian, an open source cloud governance project used today by thousands of well-known global brands. 

Stacklet’s  cloud Governance-as-Code SaaS platform was designed in partnership with several Fortune 1000 organizations to extend (open source) Cloud Custodian with robust management capabilities, real-time cloud asset visibility, out-of-box policy packs, and other features to help organizations innovate securely in the cloud at scale. Stacklet enables organizations to:

  • Meet ever-changing compliance needs by offering best practice policies addressing common security, operations, and cost optimization uses cases in addition to compliance frameworks such as NIST CSF, PCI-DSS, HIPAA, and CIS Benchmarks. The platform allows organizations to easily customize policies according to their unique needs.
  • Control costs by automatically identifying, right-sizing, and de-provisioning unnecessary resources.
  • Improve security posture through real-time policy enforcement and automated remediation across varying cloud platforms and services.
  • Reduce operational overhead and developer friction associated with policy implementation and inconsistent cloud management by using a standard, easy-to-use policy language, and platform across all your clouds.

As a Site Reliability Engineer (SRE), you are responsible for keeping all user-facing services and other Stacklet production systems running smoothly. SREs are a blend of pragmatic operators and software craftspeople that apply sound engineering principles, operational discipline, and mature automation to our environments and the Stacklet codebases.

As an SRE you will:

  • Design, build and maintain core infrastructure pieces that allow successful scaling of the Stacklet platform
  • Debug production issues across services and levels of the stack
  • Partner with development teams to improve services through rigorous testing and release procedures
  • Be on a PagerDuty rotation to respond to availability incidents and provide support for service engineers with customer incidents
  • Use your on-call shift to prevent incidents from ever happening
  • Make monitoring and alerting alert on symptoms and not on outages
  • Document every action so your findings turn into repeatable solutions–and then into automation
  • Improve the deployment process to make it as boring as possible
  • Participate in system design consulting, platform management, and capacity planning
  • Create sustainable systems and services through automation and uplifts
  • Balance feature development speed and reliability with well-defined service level objectives

You may be a fit to this role if you have some of these inclinations:

  • Think about systems - edge cases, failure modes, behaviors, specific implementations
  • Know your way around Linux and the Unix Shell
  • Have strong programming skills - Python and/or Go
  • Have experience with infrastructure as code, including Terraform
  • Have an urge to collaborate and communicate asynchronously
  • Have an urge to document all the things so you don't need to learn the same thing twice
  • Have an enthusiastic, go-for-it attitude. When you see something broken, you can't help but fix it
  • Have an urge for delivering quickly and iterating fast

Nice to have

  • Prior startup experience
  • Public cloud certifications
  • Knowledge of cloud security

Stacklet believes a diverse workforce enhances our ability to deliver world class products and services. We are committed to ensuring equal employment opportunities to all qualified individuals. Qualified applicants will receive consideration for employment without regard to their race, color, religion, gender, gender identity or expression, sexual orientation, national origin, genetics, disability, age, or veteran status.

Apply for this Job

* Required