Sysdig is the secure DevOps company, and we’re at the forefront of the container and Kubernetes revolution. We are passionate, technical problem-solvers, continually innovating and delivering powerful solutions to secure and operate cloud-native applications in production. Our consistent contributions to open source software projects reflect our commitment to the open cloud movement. 

We value diversity and open dialog to spur ideas, working closely together to achieve goals. And we're a great place to work too -- we were awarded the 2019 Bay Area Best Places to Work Award from San Francisco Business Times and the Silicon Valley Business Journal. We are looking for team members who share our commitment to customers and are willing to dig deeper, understand problems, and deliver innovative solutions. Does this sound like the right place for you?

Your Opportunity

As Site Reliability Engineer on our Infrastructure team, you will contribute to improve Sysdig provisioning, monitoring, and cloud platform management. You have an aptitude for analytical and creative problem solving and you are very excited to use the power of automation to manage the stability, availability, and scale of our Infrastructure.

Your Responsibilities:

You will join a highly skilled and globally distributed team of SREs, and you can expect to:

  • Build solutions to enhance the observability, availability, performance, and resilience of the Sysdig SaaS and On-Premise products
  • Implement reliability improvement initiatives, including performance tuning and infrastructure optimization
  • Maintain and support the production environments and communicate directly with customer stakeholders
  • Participate in an on-call rotation with other SREs

Your Background

  • Experience managing Kubernetes clusters in a production environment
  • Solid understanding of Linux systems and networking
  • Proficiency with infrastructure as code/configuration management tools. We love Terraform, but you may have experience with Ansible, Chef, Puppet or SaltStack
  • Familiarity with monitoring tools such as Sysdig, Prometheus, Nagios, Icinga, Zabbix
  • Experience managing multi-tenant solutions with Cassandra, Elasticsearch, Kafka or Redis
  • Proficiency with SQL relational databases, preferably PostgreSQL and MySQL
  • Command of a scripting language such as python or bash
  • Knowledge of CI/CD concepts; hands-on experienced is a strong plus
  • Experience supporting a customer-facing product hosted in a public or private cloud ecosystem
  • Experience diagnosing and troubleshooting complex problems in high-throughput web applications and network services
  • Strong sense of ownership and a focus on customer delight 

Key Technologies

Kubernetes, Docker, Python, Cassandra, Kafka, Elasticsearch, Redis, Terraform, PostgreSQL, AWS

Why work at Sysdig?

  • We’re a well-funded startup that already has a large enterprise customer base
  • We have a pragmatic, approachable culture, from the CEO down
  • We have an organizational focus on delivering value to customers
  • Our open-source tools ( ) are widely used and loved by technologists & developers

Additionally, we offer a variety of benefits and perks, such as:

  • Flexible vacation policy
  • A monthly allowance that can be used for the following types of expenses (Employee wellness, Housecleaning services, Home internet, Phone expenses, Office supplies, Office furniture)



Apply for this Job

* Required
When autocomplete results are available use up and down arrows to review
+ Add Another Education