Slyce is the market leader in the emerging technology of image recognition and visual search. Over the past few years, we've been busy raising capital, growing our team, and signing deals with 25+ of the leading retailers in the US including Home Depot, Bed Bath & Beyond, Neiman Marcus, and Macy's. We are a close-knit team with ambitious goals and we're excited to drive our cutting-edge technology further into the marketplace and have fun doing it.
Our technology allows consumers to submit a photo or scan an image using their mobile device and Slyce will recognize what it contains and match it to products sold by retailers that have integrated our technology. The focus of our approach is to take our core services and white label our technology into retailers’ apps and mobile web. We also drive experiences with our core consumer apps, including SnipSnap, which reaches more than 5 million monthly users.
As a Site Reliability Engineer you will work as an integral part of our Cloud Platform team that will enable exciting new features for our visual search and image recognition services. The ideal candidate has extensive Site Reliability knowledge and experience, and has previously managed modern cloud, big data and/or Internet of Things systems in a fast-paced, agile environment.
The specific focus for the Site Reliability Engineer is on configuration, automation, and optimization of the development and production platform including:
- Instrument code, build tools and dashboards to help visualize and understand real-time system health, usage, and performance metrics.
- Oversee the infrastructure and service health monitoring process to enable proactive issue mitigation and expedited issue resolution.
- Design, manage, and maintain internal tools to support engineering, operations, research and/or support processes.
- Troubleshoot and resolve issues in our development, test and production environments.
- Work with the platform team to identify and fix software/system performance bottlenecks and stability issues.
- Contribute to overall system scalability to ensure Slyce’s ability to deliver high availability, low latency services.
- Understand, implement, and automate security controls, governance processes, and compliance validation.
- Oversee and manage ongoing system vulnerability and penetration testing.
- Oversee the continuous integration and deployment (CI/CD) toolchain to ensure that the system code for our high availability, mission critical cloud platform that supports all core Slyce products and services is reliably tested and predictably released.
- Stay up-to-date on relevant technologies, plug into user groups, understand trends and opportunities to ensure we are using the best possible techniques and tools.
- Strong background in Linux/Unix administration and scripting
- Extensive experience managing and configuring public cloud providers, specifically Google Cloud
- Extensive experience with Docker and Docker Compose
- Extensive experience with Kubernetes
- Experience using Istio or similar tool
- Experience with monitoring and analytics using Prometheus/Grafana or similar
- Experience with configuring and maintaining Jenkins and Jenkins Pipeline
- Experience with log parsing and monitoring using Graylog, ELK, or related
- Knowledge of best-practice security and networking techniques for public facing systems
- Strong experience managing distributed messaging through a broker such as Apache Kafka, RabbitMQ, or similar
- Strong experience with MongoDb, PostgreSQL, and related database technologies
- Experience configuring, managing, and scaling ElasticSearch
- Knowledge of best practices and IT operations in an always-up, always-available mission critical service
- Experience writing production ready code in Python and/or C++