Working as a Site Reliability Engineer you will help implement best practices into, and within a highly technical and evolving environment, which includes: Infrastructure as Code (IaC), immutable infrastructure, and a dedicated approach to organizational wide infrastructure and DevOps automation. As the Site Reliability Engineer you will be responsible for the availability, performance, monitoring and incident response, among other things, of the platforms and services that our company manages and owns.
- Participate as a technical leader, providing solutions to operations problems.
- Implement and enforce Information Security Policies and Procedures as directed
- Technology support with on-call rotation schedule (for escalations and incidents).
- User management / systems maintenance (mobile devices, laptops, etc)
- Documentation of key activities for legislative compliance
- Corporate network(s) maintenance & security, including endpoint security and compliance
- Configuration and asset management
- Participate in disaster recovery planning and implementation
- Ensure proper backup of systems and testing backups periodically
- Manage identity and access policies across multiple environments
- Make high-value impacts on productivity throughout the software development lifecycle
- Enhance and maintain technology tools and infrastructure
- Contribute to new product implementation and provide upkeep to existing products and services.
- Ensure 99.9% production SLA, with 24x7x365* availability in emergency situations, and
- Mentor team members on best practices and security implementations as required
- Strong documentation, planning, problem-solving, organizational and customer service skills
- Software debugging abilities in any language (we primarily use Python and PHP) is a strong plus
- Experience with firewall, router, switch, DNS, VPN configuration, troubleshooting and management
- Experience with Relational database engines (we primarily use MySQL, Postres) is a plus
- Proven ability to manage multiple processes, priorities, and conflicting demands in a calm and professional manner
- Strong team player that communicate clearly, concisely and by selecting written, spoken or visual material that best suits the situation and intended audience
- Cloud management experience (we use AWS and GCP)
- Experience with continuous Integration platforms (we use CircleCI and github actionsI)
- Experience with container technologies (we use Docker)
- Experience with provisioning and deployments at scale (we use Rancher/Kubernetes)
- Monitoring experience (e.g., Pagerduty, Datadog, Cloudwatch, Prometheus)
- Take initiative with seeking out tasks, communication, and updating status;
- Ability to prioritize, multi-task, and perform effectively under pressure;
- Be willing to ask for assistance when roadblocks impede progress;
- Ability to define requirements and manage tasks with multiple deadlines under minimal supervision ;
- Display reliability, integrity and trustworthiness;
- Demonstrate a willingness to learn new technologies;
- Possess strong analytical and problem solving skills;
- Displays creative and innovative thinking;
- Acceptance of differences in team members and a strong commitment to teamwork, cooperation and healthy behaviours.
- 3+ years experience in a systems administrator, devops, or SRE role
If you are looking for a new and exciting challenge, and the opportunity to work with some of the most successful professional sport franchises in the world, then look no further. We at Kinduct are looking forward to hearing from you, and to welcoming you to our ever growing family of Kinducterdactyls!