Who We Are
At Platform Science, we’re working to connect everything that moves.
Founded in 2015, we are an open IoT platform that partners with innovative fleets, application developers, vehicle manufacturers, and equipment providers in the transportation industry to deliver revolutionary solutions to supply chain professionals across the globe.
Our employees are an engaging, diverse group of people who believe in the power of great ideas. We hire people with different experiences and perspectives to build a company culture that fuels growth through innovation.
We value thoughtful actions and empathy for others. We approach challenges with resiliency and creativity, while encouraging transparency because, no matter our backgrounds or responsibilities, we are one team.
About the Role
The Site Reliability Engineering (SRE) Manager will lead a high-performing team that ensures system reliability, scalability, and efficiency while championing SRE principles across the organization. This role involves coaching the team, promoting best practices, and enabling development teams to deliver observable, maintainable, and production-ready applications. The SRE Manager oversees multiple projects, requests, and initiatives while maintaining clear communication and keeping the team aligned and productive.
Essential Responsibilities
- Recruit, train, and mentor a team of Site Reliability Engineers to deliver operational excellence.
- Foster a culture of innovation, collaboration, and adherence to SRE principles like SLOs, error budgets, and production readiness.
- Standardize and train development teams on observability tools such as Prometheus, Grafana, and Datadog.
- Enhance developer and release workflows using CI/CD best practices, GitOps methodologies, and tools like Jenkins, ArgoCD, and Docker.
- Drive application and system resilience through chaos engineering, load testing, and automation.
- Collaborate with teams to define SLIs, SLOs, and manage error budgets.
- Manage on-call rotation schedules, optimize alerting processes, and ensure 24/7 production application support.
- Serve as the escalation point for incident resolution, providing guidance and technical expertise.
- Build tools, dashboards, and processes to improve incident response, production health, and system reliability.
- Conduct quarterly "State of the Service" reviews to assess performance, sustainability, and risks.
- Track and prioritize multiple initiatives while ensuring the team stays focused and aligned with organizational goals.
- Maintain detailed documentation on team projects, requests, policies, and best practices.
- Communicate effectively across teams, departments, and stakeholders to ensure alignment and a clear understanding of SRE initiatives.
- Evangelize SRE practices across the organization and ensure consistent adoption of reliability-focused processes.
Education and Experience
- 5+ years of experience in software engineering or SRE roles.
- 2+ years in a leadership or management position.
- Proven expertise with Kubernetes, ArgoCD, AWS, Prometheus, Grafana, Datadog, FluentD, Jenkins, and Docker.
- Strong knowledge of CI/CD and GitOps practices.
- Excellent verbal and written communication skills.
- Demonstrated ability to track and prioritize multiple projects, requests, and initiatives effectively.
- Bachelor’s degree in Computer Science, Engineering, or equivalent experience.
Platform Science Benefits Highlights
The company offers various benefits to regular, full-time employees including:
- Medical, dental, and vision insurance
- Short-term and long-term disability insurances
- AD&D and life insurance
- 401k plan
- Paid vacation, sick leave and holidays
- Six weeks of paid parental leave
For more information please see the Benefits Highlights brochure for regular, full-time employees.
In addition, you can access the Benefit Highlights brochure for regular, full-time employees by copying and pasting the link into your browser: https://www.platformscience.com/benefits.
This is an exempt role. Our job titles for each posting may span across more than one job level. The estimated base salary for this role is between $134,550 and $200,000. The range displayed on each job posting reflects the minimum and maximum target range for new hire base salaries across all US locations. Compensation packages are based on many factors unique to each candidate, including but not limited to skill set, work experience, relevant trainings and certifications, business needs, market demands and specific geographical location. The base pay range is subject to change and may be modified in the future. This role may also be eligible for bonus, equity, and benefits.
Please note that the compensation details listed in US role postings reflect the base salary only, and do not include bonus, equity, or benefits.
Platform Science collects your personal information to support its business operations, including for human resources, employment, benefits administration, health and safety, and other business-related purposes as well as to be in legal compliance. You can review further details of such collection and use in our Privacy Policy (link for browser: https://www.platformscience.com/privacy-notice).
Qualified applications with arrest or conviction records will be considered for employment in accordance with the Los Angeles County Fair Chance Ordinance for Employers and the California Fair Chance Act.
At this time we only consider candidates in these states: AL, AR, AZ, CA, CO, FL, GA, ID, IL, KY, MA, MD, MI, MN, MO, NC, NH, NV, NY, OH, OK, OR, PA, SC, TN, TX, UT, VA, WA, and WI. In the future we plan to add more states.
Beware job scams! Our recruiters use @platformscience.com emails only. We don’t interview via text/message. We don't ask for software downloads (except Zoom) or sensitive info (like SSN/bank). Suspect fraud? Report it to law enforcement & peopleops@platformscience.com.