We are looking for a Production Reliability Engineer. You will play a crucial role in ensuring the stability and performance of our critical system infrastructure.
What You’ll Be Doing:
- Implementing monitoring solutions for real-time metrics.
- Developing alerting systems to identify and respond to issues.
- Implementing automation for deployment, configuration, and maintenance tasks.
- Utilizing tools like Ansible, Puppet, or Chef for configuration management.
- Implementing measures to protect against threats and unauthorized access.
- Communicate technical information to non-technical stakeholders.
What We Look For In You:
- Deep understanding of Linux systems and networking.
- Familiar with profiling tools to identify performance issues.
- Excellent knowledge in building IaaC with Ansible and Terraform.
- Experience in developing CI\CD pipelines, understanding DevOps principles and implementing them in production environments.
- In-depth knowledge and expertise in providing scalability, stability and availability to production high-load systems.
Nice-to-have:
- Understanding SRE practices.
- Experience in self-healing and zero-downtime projects.
Why Should You Join Our Team?
- Great challenges with many opportunities to prove yourself.
- A welcoming group of highly qualified international professionals.
- Great corporate culture with internal events and surprising commitment to fostering a supportive and empowering environment for employees.
- Cutting-edge hardware and technology.
- Comfortable Dubai office or remotely anywhere in the world.
- Flexible schedule.
- 40 paid days off.
- Competitive salary.