Who is Blueprint?
Blueprint is a technology company that focuses on digital transformation. We specialize in cloud and infrastructure, data platform and engineering, data science and analytics, organizational modernization and customer experience optimization. We have a nationwide presence with offices across multiple regions and we serve customers in multiple industry verticals.
We are innovators. Motivators. Thought provokers. Our collective backgrounds bring diverse perspectives that enable us to consistently think differently. We want you to bring your biggest and best ideas to help positively impact our culture, clients and the community around us. We believe in the importance of a healthy and happy team, which is why our benefits include full medical, dental and vision coverage, as well as paid time off, 401k, paid volunteer hours and tuition reimbursement.
What will I be doing?
Blueprint is seeking an experienced Site Reliability Engineer (SRE) to be responsible for the reliability, resiliency, and performance of the technology systems supporting our multibillion, multi-channel e-commerce business. We are hiring high quality engineers with a diverse set of experiences and skill sets to support all e-commerce systems including customer mobile apps, loyalty systems, and our back-end tier of large scale, distributed and highly available services. The position is highly technical and balances between engineering operations and software development to enable rapid product development.
- Leads the team to collect metrics, build proactive dashboards and improve service monitoring to detect problems before customer is impacted.
- Designs, builds and operates software and infrastructure to enable reliable and rapid deployment of microservices with effective proactive monitoring.
- Drives a continuous improvement mindset with the team, embracing a DevOps culture and constantly finding ways to make our systems more reliable.
- Works with product teams to establish SLAs around performance that can then be integrated into our monitoring/alerting solutions.
- Engage in improving uptime through service capacity planning and demand forecasting, software performance analysis and system tuning.
- Understands, experiments, and adopts emerging industry practices in the site reliability engineering space.
- Practices, coaches, and evangelizes reliability best practices.
- Develops a solid understanding and working knowledge for the team’s guest experience, business, and systems.
Required skills: (A minimum of 5 years of experience with the following):
- Experience in an SRE/DevOps role working on highly scalable distributed systems.
- Strong experience working with a variety of monitoring tools, log aggregation, APM, and alerting tools.
- Strong experience in OO principles and proficient in high level languages such as C# or Java.
- Experience working with public cloud providers such as Azure, AWS.
- High degree of professionalism, customer service orientation, initiative, flexibility, and the ability to multi-task.
- Excellent problem-solving approach coupled with strong communication skills and a sense of ownership and drive.
- Experience in designing, analyzing, scaling and troubleshooting large-scale distributed systems.
- Experience with Continuous Integration and Release Pipelines with quality gates.
- Bachelor's degree in Computer Science.
- Knowledge of deployment strategies such as blue/green and canary deployments.
- A passion for building and taking part in highly effective teams and agile/lean development methodologies.