Running a business is hard, we make it easier. We provide an all-in-one business credit card & spend management platform, built for SMEs. Over 200,000 small businesses have spent more than £10 billion on their Capital on Tap Business Credit Cards.

The Role

Our Site Reliability Engineers work closely with our Platform and Engineering teams to ensure our applications are designed and built with reliability and speed in mind as well as ensuring our application infrastructure is robust and scalable.

As a Site Reliability Engineer at Capital on Tap you will be responsible for designing, building, and monitoring systems to maximise platforms uptime and efficiency for the best possible end-user experience. You are also tasked with identifying and resolving potential outages and performance issues before they become a problem.

Responsibilities

Manage Azure services and resources, Cloudflare edge security, traffic management in code.
Create, manage and monitor development resources within Kubernetes clusters and Serverless (i.e. Function Apps, Automation Accounts) for Product Engineering Teams.
Own Terraform / Ansible / Pulumi Infrastructure as Code for each Product Engineering team.
Continuously identify opportunities for improvement in systems, processes, and technologies, and implement changes to improve the overall reliability and performance of the platforms.
Improve monitoring to provide insights to uptime and availability, and work towards the agreed SLO.
Work with the Product team to identify the company SLA and objectives for all core services/applications.
Work with Platform Engineers to deliver end-to-end automated solutions and pipelines.
Work with software developers and stakeholders to improve the user experience through pipeline management and infrastructure improvements.
Proactively support Platform services and tooling (TeamCity, Octopus, Azure DevOps & more to come)
Improve reliability, quality, and time-to-market of our suite of software solutions. Through solutions such as load testing, chaos engineering and improved deployment strategies.
Own and lead the troubleshooting of incidents that impact the customer experience.

About you

Experience in managing public cloud processes
Experience in Azure DevOps, Octopus and other CI/CD tools
Experience in Powershell, Bash or other scripting languages
Experience with Terraform
Experience working with a cloud monitoring solution (advantageous to have DataDog)
Experience with Kubernetes and Docker (advantageous)

What you'll get in return

🏥 Private Healthcare through Vitality

✈️ Worldwide travel insurance through Vitality

🎁 Anniversary Rewards (£250, £500, £750, 4-week fully paid sabbatical)

👛 Salary Sacrifice Pension Scheme 4-7% match

🏖️ 28 days holiday (plus bank holidays)

📖 Annual Learning Budget

👪 Enhanced Parental Leave

🚲 Cycle to Work Scheme

🚂 Season Ticket Loan

💬 6 free therapy sessions per year

🐶 Dog Friendly Offices

🍫 Free drinks and snacks in our Offices

Check out more of our benefits, values and mission here.

If you want to keep updated on future opportunities at Capital on Tap follow our company page here.

Diversity and Inclusion

We want to be a place where a diverse mix of talented people want to come and do their best work and most importantly feel included and that they can be their authentic selves. We welcome, consider and encourage applications from anyone that shares this passion.

How to Apply

If you want to progress your career within a fast growing, profitable fintech then click apply (all we ask for is your CV and contact details) and we will get back to you within 3 working days

Site Reliability Engineer

Diversity and Inclusion

How to Apply

Apply for this Job