Tickets.com, an MLB company, delivers innovative, cutting-edge technologies to enable frictionless and unforgettable fan experiences in venues across the globe. Together with MLB, Tickets.com is changing the landscape of the live sports and entertainment industry, delivering new digital venue and ticketing experiences to millions of fans. Our Technology team builds platforms and products that provide a new smart ticketing solution and venue experience. Using cutting-edge technology, our platform and applications are consumed by fans, stadiums, and MLB teams.
We are assembling a world-class team to build on these experiences and to scale platforms and products that anticipate emerging opportunities, including dynamic pricing and offers and digital, contactless ticketing. Our mission is to provide premium, innovative live experiences for our clients and their patrons.
Tickets.com is looking for an Associate Site Reliability Engineer passionate about building engaging products for our fans.
The Opportunity: The Associate Site Reliability Engineer will be responsible for providing consulting services and tools to teams across MLB and Tickets.com. This role will work closely with our existing Network Operations Center team members to monitor systems, applications, and services, taking action on alerts, troubleshooting and resolving issues, or escalating as the situation or playbook requires. The Associate Site Reliability Engineer will also work closely with our Site Reliability team to evangelize their mission.
Essential Job Functions:
Assist in creating Incident Response Policies and adopt tools used to respond, troubleshoot, and retrospective.
Assisting in automating telemetry collection and maintaining and enhancing observability tooling to identify, address, and service health issues and debug running systems.
Optimize user experience through the adoption of SLIs, SLOs, and Error Budgets
Adopt cost optimization techniques to maximize the value of our SaaS solutions, including GCP, DataDog, FireHydrant, and Pagerduty
Train teams in Kubernetes operations' value and when it should be applied. æ
Train teams on the value of disaster recovery planning and high availability.
Write code to complement/fill gaps left by SaaS solutions.
Extensively utilize Terraform for infrastructure as code
Engage in incident response.
Use and administer Grafana, including developing and maintaining plugins and Dashboards.
Assist with migration observability to go forward tools, including DataDog
Administer DataDog
Educate and help teams to adopt observability best practices.
Support FinOps program by guiding teams to maximize the value of SaaS solutions.
Enhance observability tooling to make the above possible.
Improve and maintain high-alerting tools
Work with SREs to build ‘batteries included’ solutions that are extensible across the organization.
Requirements:
Bachelor's Degree in Computer Science or equivalent experience.
Experience writing code in a compiled language that runs in production
Experience with observability tools, including DataDog and Grafana
Experience working in Terraform
Understand concepts of APM such as tracing, logs, real user monitoring
Excellent Customer Service Experience
Excellent Collaboration Skills
Dedicated to continuous improvement of your skills and our SRE capabilities
Experience with Cloud Providers: Oracle Cloud Infrastructure, Google Cloud Platform, AWS
APIs and microservices: REST, Web, Graph
Software Development tools – Jira, GIT, Jenkins, ArgoCD, Terraform