We are looking for an experienced Engineering Manager for our cloud Infrastructure, SRE (Site Reliability Engineering), & DevOps Team. We are keen on leaders with a learning mentality. This high-performing individual needs to have excellent communication and collaboration skills as you will work with senior members of teams across the entire company. Reporting to the VP, Engineering, you will work closely with the CTO, Principal and Staff Developers, and your team to create the strategic plan for infrastructure and devops. You will plan and prioritize work so that we have a stable, efficient, observable and resilient technology environment that allows the company to meet their technology and business goals. We’re looking for a manager with experience leading and managing a team of developers, is proactive and can dive into projects while understanding the importance of open and clear communication within and between all teams.
This position is ideal if you enjoy being a player/coach and want to focus on optimizing efficiency and collaborating with the engineering team to ensure the development engine runs smoothly. As the Manager of this team, you will bring technical knowledge, defining and implementing CloudOps, DevOps, and SRE best practices. Your team will be responsible for managing and optimizing Coconut’s DevOps and infrastructure, (based on AWS, Kubernetes, and Terraform), and building tools to automate the management of a stable, efficient, observable, and resilient technology environment.
**Note: this position will participate in our On-Call team rotation roster
YOU ARE FIRED UP TO
Demonstrate Team Leadership
- Lead by example - act in accordance with our CHEERS values
- Mentor, coach and inspire a team of DevOps, Infrastructure and SRE professionals
- Foster a collaborative and high-performance work environment
- Hire and train team members
- Be accountable for the Infrastructure and DevOps roadmap and the results the team attains
- Work with the Principal DevOps Engineer to set strategic plans and priorities for this function
Oversee Infrastructure and Site Reliability
- Work with your team to design, implement, and manage a cloud-based infrastructure ecosystem for scalability and reliability
- Ensure best practices for infrastructure as code (IaC) and configuration management are applied
- Work closely with the application development teams to ensure a manageable migration into a secure and reliable product environment and on implementing new tools
- Automation and CI/CD:
- Develop and maintain automated deployment pipelines
- Promote continuous integration and delivery (CI/CD) practices
- Design and develop automation and processes to enable teams to deploy, manage, configure, scale and monitor applications
- Monitoring and Alerting:
- Ensure robust monitoring to proactively identify and resolve issues
- Configure and manage alerting systems for real-time status and incident response
- Reliability Engineering:
- Define and measure service level objectives (SLOs) and service level indicators (SLIs) to ensure system reliability
- Lead incident response and post-incident reviews to improve system resilience
- Develop innovative and technical tooling to improve production stability and enable faster recovery
- Security and Compliance:
- Collaborate with security team to implement best practices for securing infrastructure and applications
- Ensure compliance with industry standards and regulations
- Resource Optimization:
- Optimize resource utilization to reduce costs while maintaining performance and reliability
- Monitor & report on hosting & tooling costs
- Documentation and Training:
- Maintain comprehensive documentation of systems, processes, and procedures
- Provide training and knowledge sharing within the team
WHAT YOU BRING TO THE TEAM
- Proven experience in managing/leading a DevOps/SRE/Infrastructure team in a fast-paced environment
- Expertise in cloud platforms and infrastructure management, preferably AWS and Kubernetes
- Experience with provisioning, vendor management, and monitoring resources in a cloud based environment
- Experience configuring and managing data sources like MySQL, Postgres, Redis
- System configuration experience with automation tools such as Puppet/Chef/Ansible
- Proficiency in automation and CI/CD tools such as Spinnaker, CircleCI, Travis CI, or GitLab CI/CD
- Experience with containerization and orchestration techniques and tools (e.g., Docker, Kubernetes)
- Experience with infrastructure as code tools, such as Terraform
- Experience leading & analyzing complex application, database, network, and OS issues for customer-facing systems in a high-uptime environment
- Experience with monitoring and alerting tools (e.g. DataDog, Sentry, OpsGenie)
- Experience with Perl/Python/Java/Bash scripting
- Experience working with large enterprise customers bases
- Experience reporting on key metrics, costing, tooling to the organization and making recommendations for improvements
- Excellent problem-solving, collaboration, and communication skills
- Effective at nurturing relationships and managing multiple stakeholders across different teams
- Strong project management, leadership and cost management abilities
Our Investment in You:
- “Cabana Days” - our version of a flexible work week!
To enable our employees to do their best work, offering flexibility to prioritize what is important and to take time needed for rest and rejuvenation when possible based on business and operational needs. - Ability to do your job in a supported, but still flexible environment
- Supported professional development, learning & career opportunities - be supported in your growth journey!
- Regular 1:1 coaching with your leader and regular connection to a passionate executive team
- Work in a team big enough for growth but lean enough to make a real impact
A full range of benefits to keep you happy & healthy;
- Competitive Salaries - we pay fairly based on experience and expertise, not your ability to negotiate!
- Health & Dental Benefits, Virtual Care, & Disability top up - all starting from day 1!
- Virtual mental health and EAP platform
- WealthSimple GRSP & Matching
- Annual Wellness Benefit ($1000 per year)
- Opportunity to work remote - anywhere in Canada!
- Employee Options - everyone shares in our success!
- Internet Subsidy on each paycheck
- Tiki Bucks Incentive Program - everyone is entitled to earn bonuses!
- A People First Company - 4.4 rating on Glassdoor
- Recently named "Most Admired Corporate Cultures" and ranked #6 in "Best Workplaces in Canada" in company size category
Who we are, and what we do:
Mission
Match customers with the right expert, at the right time, so no opportunity is lost.
Values
Collaboration. Honesty. Empathy. Elevate. Resilience. Service Excellence.
Coconut Software makes it effortless for customers to connect with their bank or credit union. Our appointment scheduling, queue management, and video banking solutions are used by leading financial institutions across North America, including RBC, Arvest Bank, Vancity, and Rogue Credit Union. Organizations that use Coconut benefit from a seamless customer experience that improves NPS, reduces wait times, and increases conversion rates.
To date we have raised close to 40M and have been doubling revenue year after year. The team at Coconut has ambitious growth plans to continue to scale the business to new heights by owning the North American market and delivering innovative solutions to our customers.
Coconut has a company culture that is best in class. We foster a community that is unconditionally inclusive, and in return ask that our people contribute their differing perspectives, ideas and experiences for one common purpose: to advance the way people live and work in an environment of diversity, equity and inclusion and workplace belonging.
Some recent awards we're proud of include:
Coconut Software is committed to treating all people in a way that allows them to maintain their dignity and independence. We believe in integration and equal opportunity. We are committed to meeting the needs of people with disabilities in a timely manner, and will do so by preventing and removing barriers to accessibility and meeting accessibility requirements under the Accessibility for Ontarians with Disabilities Act, 2005.