LearnUpon is looking for a Site Reliability Engineer to join our team in Dublin.
LearnUpon LMS helps organizations train their employees, partners, and customers. Businesses can manage, track, and achieve their unique learning goals — all through a single, powerful solution.
With offices in Dublin (our HQ), Philadelphia, Belgrade, and Sydney, we are a team that puts our customers' experience at the heart of everything we do. We're always striving for the best solution (not the easy one), and we’re committed to producing work that we can be proud of.
Our offices are open, collaborative environments where our team and individual accomplishments are celebrated and encouraged. Join LearnUpon, where we work together as a friendly, supportive team who, most importantly, like to have fun.
You’ll be joining our experienced infrastructure team, based in Dublin and will get to work on a wide variety of projects including resource provisioning, application deployment, monitoring, scaling, and automation. Our ideal candidate will be also responsible for researching best practices, and new automation techniques to improve stability and reliability of infrastructure and applications
What will I be doing?
System Administration & Site Reliability
- Using common orchestration tools to manage and improve infrastructure (e.g. Terraform, Ansible, Puppet, etc.)
- Participate in the operations on-call rotation, triaging and addressing production issues as they arise.
- Contribute to internal tools that help us improve our operations processes, manage our infrastructure, and scale our systems.
- Whiteboard a fix to a scaling problem -- and then make it happen.
- Install new / rebuild existing servers and configure hardware, peripherals, services, settings, directories, storage, etc. in accordance with standards and project/operational requirements.
Operations and Support
- Perform daily system monitoring, verifying the integrity and availability of all hardware, server resources, systems and key processes, reviewing system and application logs, and verifying completion of scheduled jobs such as backups.
- Perform regular security monitoring to identify intrusion patterns.
- Perform daily backup operations, including restorative testing.
- Resource utilisation monitoring and solution recommendation.
- Manage user provisioning and automated provisioning systems.
- Provide escalation engineering support to other teams.
- Repair and recover from hardware or software failures. Coordinate and communicate with impacted constituencies.
- Assist in applying OS patches and upgrades on a regular basis, and upgrade administrative tools and utilities. Configure / add new services as necessary.
- Contribute to system configuration and asset management applications.
What skills do I need?
- Bachelor (4-year) degree, with a technical major, such as engineering or computer science.
- At least three years production system administration/SRE experience.
- At least two years serving a large-scale SaaS web application solution with AWS, or similar cloud provider.
- You are able to analyze and optimize performance in high-traffic internet applications.
- Thorough understanding of common Internet protocols (e.g. HTTP, DNS, SMTP).
- Familiarity with APIs used for monitoring, management, user provisioning, and SSO.
- Ability to solve complex, high-impact problems.
- Ability to digest and discuss issues/solutions with team members that may not be familiar with such terminology/technologies
- Excellent communication skills, team player.
Don’t worry if you don’t tick every box in order to apply, we’re always happy to review applications and take all experience into consideration. We do our utmost to provide feedback where we can!
Not required but considered a big plus
- Experience maintaining uptime of a production Ruby/Rails app.
- Use of CI/CD tools
- Scripting language experience / DevOp responsibilities such as Perl, Python etc. or similar languages
- Certification in AWS, any PaaS, and/or related technologies.
Why work with us?
- Work in a fun and supportive environment with regular team events.
- Excellent career progression - take LearnUpon where you think it can go.
- Structured learning environment.
- Competitive salary and company ESOP.
- Employer contributed pension.
- Private health insurance.
- 22 days annual leave.
What is the Hiring Process?
Applicants for the position can expect the following hiring process:
- Qualified applicants will be invited to schedule a 30 minute call.
- Successful candidates will then be invited to a practical interview.
- Finally, candidates will have a short interview with our Head of Security and Infrastructure.
- Successful candidates will be contacted with an offer to join our team.