At Cheetah Digital, we are marketers at heart. Our mission is to help the best brands in the world create meaningful and profitable relationships with their customers. Our technology and services solve complex marketing challenges and drive exceptional results for enterprise brands across the globe.
As Cheetahs, we are builders and believers who are comfortable disrupting the norm. We’re shaping the future of the marketing technology industry and are looking for like minded people to help us do it!
The Principal Site Reliability Engineer, Monitoring is a fundamental piece of the Site Reliability Engineering team. Site Reliability Engineering is accountable for the availability, reliability, and performance of the services and platforms as well as the design and architecture of the systems and services provided by Cheetah Digital. Provisioning is responsible for providing installation, configuration, maintenance and support in a highly transactional 24x7 environment across all platforms ensuring all of our customers, globally, have a great experience using the Cheetah Digital Marketing Suite and other campaign platforms.
What You’ll Do
- To help drive resolution of system and application failures based on defined standard operating procedures, as well as ongoing feedback on these procedures, and coordination of internal resources for applying fixes to recurring alerts.
- Provide engineering, installation, configuration, maintenance and support in a highly transactional 24x7 environment
- Provide system and alerts troubleshooting and problem escalation management.
- Demonstrate ownership, and command of the tools used to perform regular checks, and troubleshooting of application issues.
- Lead in support of application / product architecture review to vet project design across the organization from the Monitoring perspective
- To determine impact and follow critical incident escalation procedures when required.
- Translate business requirements into specifications
- Develop and lead the reliability process in a manner that will have a direct impact on equipment uptime, efficiency, cost management, quality, safety, customer satisfaction, innovation, and ultimately profitability.
- Help setting up scripts for monitoring, and driving automation as a means to increase productivity and visibility.
- Work with management to review and define global coding standards and best practices
- To work with other SRE Engineers to ensure the reliability and maintainability of the infrastructure supporting our SaaS platforms.
- Act as the first-tier for managing the monitoring tools, and ensuring these are working correctly.
- Evaluate and build improvements in monitoring capabilities in order to remove the technical impediments for the team’s best interest.
- Handling scripting works or major / complex parts of assigned projects and providing help and supervision to team members.
- Training and mentoring of team members.
- Responsible for the design and architecture of the systems and services provided by the Monitoring team.
- Help solving business needs by actively participating in selecting the best technology to ensure our customer satisfaction, and ensuring the right solutions are in place to proactively tackle any issues.
What We’re Looking For
- Bachelor’s Degree in Computer Science, Information Systems; or equivalent combination of education and experience
- 8+ years of hands-on experience providing daily support of company's applications infrastructure. This includes monitoring, and troubleshooting to ensure high-availability, and uptime.
- 4+ years hands-on experience with Windows/Linux/Unix production environments using command line tools, networking and security concepts.
- Advanced experience working with databases such as MSSQL and/or Oracle.
- Demonstrated experience working in a cross-functional team environment that includes Technical Support, Developers, SAs, DBAs, Systems administrators, etc.
- Strong experience with scripting languages, and small coding to run automated, and scheduled tasks to pull data.
- Strong experience with system administration tasks on Linux or Windows environments.
- Experience with ITIL's framework. Certificate preferred.
- Strong experience building, and sharing technical documentation, and mentoring team members.
- Basic experience with project and product management (CAPM, PMP, SCM). Certificate preferred.
- Hands-on experience in designing and implementing enterprise-wide monitoring solutions. Must be detail-oriented and possess strong problem-solving skills.
- Must have an understanding of monitoring industry best practices to aid in evolving processes / standards. Experience with best practice on DevOps, and SRE.
- Solution Architecture Professional Certification.
- Technical Skills
- Databases: SQLServer, Oracle
- Operating Systems: Linux, Windows
- Knowledge, and hands-on experience with monitoring tools: Nagios, SolarWinds, Shinken, New Relic. Nagios certificate preferred.
- Languages: Python, Perl, PowerShell, Bash
- Frameworks: git
- Excellent communication skills, both verbal and written
- Demonstrated ability to collaborate with local and remote teams in different time zones
- Demonstrated ability to compose clear and concise technical documentation
- Demonstrated ability to present deeply technical topics to small & large less technical audiences
- Strong interpersonal and leadership skills - communication, collaboration, facilitation, and negotiation skills.
Why Cheetah Digital?
We are dedicated to marketers
Marketers have a more challenging job today than ever before and the status quo in marketing technology isn’t cutting it. We’re focused on product innovation and expert services that break norms to help marketers from some of the world’s largest brands execute complex marketing campaigns at scale.
We are innovators and operators
Our business is rapidly growing, transforming globally, and we rely on people who can adapt quickly, shift direction with agile reflexes, and tackle challenges head on.
We are global leaders
Headquartered in Chicago, we operate across 13 countries with 1,500+ employees. We have the balance of a fast-growing company with the scale and stability of a proven leader in the global market for 20 years.
We are progressive
We pride ourselves in creating a globally diverse and inclusive workforce and offer competitive employee benefits, including volunteer time, parental leave, tuition reimbursement, and retirement savings.
Cheetah Digital provides equal employment opportunity without regard to an applicant's race, sex, pregnancy, sexual orientation, gender identity or expression, genetic information, national origin, age, physical or mental disability, medical condition, religion, marital status or veteran status.
Applicants with disabilities may be entitled to reasonable accommodation under the terms of the Americans with Disabilities Act and certain state or local laws. A reasonable accommodation is a change in the way things are normally done which will ensure an equal employment opportunity without imposing an undue hardship on Cheetah Digital. Please inform us if you need assistance completing any forms or to otherwise participate in the application process.