Founded in 2013, GIPHY Inc. is the first and largest GIF search engine where thousands of artists, brands, and pop culture moments make today’s expression, entertainment, and info a little more moving. GIPHY allows users to not only search for their favorite GIFs but post, embed, share, create their own, and more.
GIPHY is integrated into thousands of platforms, including iMessage, Facebook, Instagram, Snapchat, Twitter, Tinder, Slack and WhatsApp. We serve over 7BN GIFs per day, seen by more than 500M daily active users who watch more than 11M hours of GIFs every day.
We’re a creative and passionate group of GIF-obsessed individuals continuing to build out what we believe is the future of communication. We have big goals and are looking for talented people to join us.
GIPHY serves over 7 billion GIFs per day, our mobile products have millions of users, and our API is integrated into some of the biggest digital platforms in the world. We’re looking for a Senior Site Reliability Engineer to help run and improve our different platforms so we can provide the best, most relevant content in real time. You will join a team supporting and scaling the production infrastructure for the delivery of these applications.
What You’ll Do:
- Build and manage software delivery, systems integration, and developer support tools
- Design and deploy applications using components of the AWS stack, focusing on high-availability, fault tolerance, and disaster recovery
- Work with developers to troubleshoot, monitor, analyze, and optimize microservices to maintain SLOs
- Conduct performance tuning, load testing, optimization of information/data processing, maintenance, and support of production and development environments
- Serve as a technical expert on the most difficult support and troubleshooting problems
Who You Are:
- 3+ years of experience in Site Reliability or Infrastructure Engineering
- Expert-level knowledge of the Amazon Web Services (AWS) ecosystem
- Strong background in Linux/Unix Administration
- Experience monitoring and supporting 24x7, high availability systems that include web, application and database servers and load balancing systems
- Hands-on experience with a container orchestration system (especially Kubernetes) with experience running, deploying, and debugging containerized microservice deployments in production
- Experience with a scripting language (Python, Ruby, Bash) is required; background in software development (Go, Python, Scala, or Java) is a plus
- Experience with configuration management (Ansible, SaltStack or an equivalent) and defining infrastructure-as-code (Terraform or CloudFormation)
- Experience with relational databases (MySQL, Postgres) is required; experience managing and optimizing databases and data models is a plus
- Working knowledge of web servers, proxies, and caches (e.g., Nginx, Varnish, HAProxy)
- Experience with build (Jenkins, Travis) and deployment automation (Spinnaker, CircleCI) tools for managing software delivery
- Experience with log aggregation tools (Splunk, Elasticsearch, Scalyr)
- Experience with metrics infrastructure tools (DataDog, New Relic, Prometheus)
This is a full time salaried position, including stock options, fully covered health insurance, 4% 401K match, 4 month maternity leave (additional 2 months of transition), 1 month paternity leave (additional 2 months of transition), free lunch every day, free gym membership and lots of other fun perks.