LearnUpon is looking for a Site Reliability Engineer to join our team in Dublin.

LearnUpon LMS helps organizations train their employees, partners, and customers. Businesses can manage, track, and achieve their unique learning goals — all through a single, powerful solution.

With offices in Dublin (our HQ), Belgrade, Philadelphia, Salt Lake City and Sydney, we are a global team with lots of diverse cultures, backgrounds, and experiences that puts our customers' experience at the heart of everything we do. 
Our culture fosters an open, collaborative and supportive environment where our accomplishments are celebrated and encouraged. We strive to live by our values, act like owners, lead with curiosity and deliver quality for our customers. We’re proud of our success and we’re humble and hungry to achieve more.

The SRE Team sits within LearnUpon’s Engineering group and is focused on maintaining and expanding our cloud infrastructure and app services. Our charter is to ensure platform scalability and site availability as we look to grow threefold over the next few years. 

As a Site Reliability Engineer you will be responsible for the day-to-day operation and management of the LearnUpon platform infrastructure. While our tool stack is predominantly AWS, Terraform, Ansible and Packer, we welcome anyone with experience with similar technologies to be part of our journey. We prefer choosing the right technology for the right problem so you’ll have plenty of space to grow your skills.

What will I be doing?

  • Ensure System Reliability and Efficiency: Continuously monitor and improve the reliability, scalability, and performance of our SaaS platform, ensuring high availability and optimal functioning of services in a cloud-based environment
  • Incident Management and Response: Lead and participate in incident response and post-mortem analysis to effectively manage and resolve production issues, minimise downtime, and implement preventative measures for future incidents
  • Automation and Tool Development: Develop and maintain automation tools and scripts to streamline operations, reduce manual efforts, and increase system efficiency. Focus on automating routine tasks and deployment processes to enhance system stability
  • Cross-functional Collaboration: Work closely with development teams to integrate best practices and reliability into the software development lifecycle. Collaborate with product and support teams to understand customer needs and provide technical solutions
  • Capacity Planning and Resource Optimization: Engage in capacity planning and resource optimization strategies to manage workload demands and resource utilisation, ensuring cost-effective scalability and performance
  • Performance Monitoring and Reporting: Implement and manage monitoring tools to track system performance and health. Generate regular reports and insights on system metrics, identifying trends, and recommending improvements
  • Participate in on-call rota

What skills do I need?

  • At least three years production system administration/SRE experience
  • At least two years serving a large-scale SaaS web application solution with AWS, or similar cloud provider
  • Experience with implementing infrastructure as code (e.g. CloudFormation, Terraform etc.), automation tooling (e.g. Puppet, Ansible etc.), CI/CD (e.g. Jenkins, Travis CI, GitLab etc.)
  • You are able to analyse and optimise performance in high-traffic internet applications
  • Thorough understanding of common Internet protocols (e.g. HTTP, DNS, SMTP)
  • Ability to solve complex, high-impact problems
  • Excellent communication skills, team player
  • In-depth experience with MySQL in a web scale environment is a plus
  • Knowledge of AWS services (e.g. EC2, SES/SNS, IAM etc.) would be an advantage
  • Experience deploying microservice environments, using containerisation technologies such as Docker and Kubernetes

 Don’t worry if you don’t tick every box in order to apply, we’re always happy to review applications and take all experience into consideration. We do our best to provide feedback where we can!

Why work with us?

  • Work in a fun and supportive environment with regular social events
  • Excellent career progression - take LearnUpon where you think it can go
  • Structured learning environment
  • Competitive salary and company ESOP
  • Employer Contributed Pension
  • Private health insurance
  • 25 days annual leave + 1 annual company wellness day off
  • Flexible Working Arrangements

What is the Hiring Process?

Applicants for the position can expect the following hiring process:

  • Qualified applicants will be invited to schedule a 30-minute call.
  • Successful candidates will then be invited to a series of practical interviews.
  • Finally, candidates will have a short interview with our CTO.
  • Successful candidates will be contacted with an offer to join our team.

LearnUpon is an Equal Opportunities Employer. 

We do not discriminate on the basis of gender, marital status, family status, age disability, sexual orientation, race, religion, membership of the Traveller community, or any other legally protected status. 

Visit our Careers site to find out more about working for LearnUpon, and check us out on Instagram.

Apply for this Job

* Required
resume chosen  
(File types: pdf, doc, docx, txt, rtf)
cover_letter chosen  
(File types: pdf, doc, docx, txt, rtf)


Our system has flagged this application as potentially being associated with bot traffic. Please turn off any VPNs, clear your browser cache and cookies, or try submitting your application in a different browser. If this issue persists, please reach out to our support team via our help center.
Please complete the reCAPTCHA above.