LearnUpon is looking for a Site Reliability Engineer to join our team in Dublin.
LearnUpon LMS helps organisations train their employees, partners, and customers. Businesses can manage, track, and achieve their unique learning goals — all through a single, powerful solution. With offices in Dublin (our HQ), Philadelphia, Belgrade, and Sydney, we are a team that puts our customers' experience at the heart of everything we do. We're always striving for the best solution (not the easy one), and we go the extra mile to deliver work we're proud of. Our culture fosters open, collaborative environments where our team and individual accomplishments are celebrated and encouraged. At LearnUpon, where we work together as a friendly, supportive team who, most importantly, like to have fun.
About the Team:
The SRE Team sits within LearnUpon’s Engineering group and is focused on maintaining and expanding our cloud infrastructure and app services. Our charter is to ensure platform scalability and site availability as we look to grow threefold over the next few years.
About the Role:
As a Site Reliability Engineer you will be responsible for the day-to-day operation and management of the LearnUpon platform infrastructure.
While our tool stack is predominantly AWS, Terraform, Ansible and Packer, we welcome anyone with experience with similar technologies to be part of our journey. We prefer choosing the right technology for the right problem so you’ll have plenty of space to grow your skills.
What will I be doing?
- Ensure System Reliability and Efficiency: Continuously monitor and improve the reliability, scalability, and performance of our SaaS platform, ensuring high availability and optimal functioning of services in a cloud-based environment.
- Incident Management and Response: Lead and participate in incident response and post-mortem analysis to effectively manage and resolve production issues, minimise downtime, and implement preventative measures for future incidents.
- Automation and Tool Development: Develop and maintain automation tools and scripts to streamline operations, reduce manual efforts, and increase system efficiency. Focus on automating routine tasks and deployment processes to enhance system stability.
- Cross-functional Collaboration: Work closely with development teams to integrate best practices and reliability into the software development lifecycle. Collaborate with product and support teams to understand customer needs and provide technical solutions.
- Capacity Planning and Resource Optimization: Engage in capacity planning and resource optimization strategies to manage workload demands and resource utilisation, ensuring cost-effective scalability and performance.
- Performance Monitoring and Reporting: Implement and manage monitoring tools to track system performance and health. Generate regular reports and insights on system metrics, identifying trends, and recommending improvements.
- Participate in on-call rota.
What skills do I need?
- At least three years production system administration/SRE experience.
- At least two years serving a large-scale SaaS web application solution with AWS, or similar cloud provider.
- Experience with implementing infrastructure as code (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
- You are able to analyse and optimise performance in high-traffic internet applications.
- Thorough understanding of common Internet protocols (e.g. HTTP, DNS, SMTP).
- Ability to solve complex, high-impact problems.
- Excellent communication skills, team player.
- In-depth experience with MySQL in a web scale environment is a plus.
- Knowledge of AWS services (e.g. EC2, SES/SNS, IAM etc.) would be an advantage.
- Experience deploying microservice environments, using containerisation technologies such as Docker and Kubernetes.
Don’t worry if you don’t tick every box in order to apply, we’re always happy to review applications and take all experience into consideration. We do our best to provide feedback where we can!
Why work with us?
- Competitive salary and company ESOP.
- Comprehensive private health insurance scheme and Company pension scheme.
- 25 days annual leave + 1 annual company wellness day off.
- Work in a fun and supportive environment with regular team events.
- Excellent career progression - take LearnUpon where you think it can go.
- LUPWell Programme, as we know that a positive mental wellbeing plays a major role in both your personal and professional success.
What is the Hiring Process?
Applicants for the position can expect the following hiring process:
- Qualified applicants will be invited to schedule a 30-minute call.
- Successful candidates will then be invited to a series of practical interviews.
- Finally, candidates will have a short interview with our CEO/CTO.
- Successful candidates will be contacted with an offer to join our team.
LearnUpon is an Equal Opportunities Employer.
We do not discriminate on the basis of gender, marital status, family status, age disability, sexual orientation, race, religion, membership of the Traveller community, or any other legally protected status.