AlphaSense provides an AI-based search engine for market intelligence, used by the largest and fastest-growing firms globally. Our mission is to curate and semantically index the world’s market and company information, including the vast high-value content sets that traditional web search engines cannot reach. With 1000+ enterprise clients, AlphaSense helps knowledge professionals become dramatically more productive, and gain an information edge by discovering critical data points and trends that others miss.
We are seeking a passionate Site Reliability Engineer to help create the next big thing in data analysis and search solutions.
You will join our Cloud infrastructure team supporting our team of development engineers taking care of the AlphaSense platform. We will pair you up with world-class talent in cloud and software engineering and provide a position and environment for continuous learning.
The ideal candidate has a strong system cloud configuration, monitoring, support and scripting skills. He is passionate about system engineering, scalability, stability and never wants to stop learning. Experience with AWS is essential.
Your responsibilities will include:
Establish tools and instrumentation to measure and monitor availability, latency and overall system health
Continuously provide help on Cloud-Native transition
Provide sustainable incident response and blameless postmortems
Learn the system far and wide and know all it’s weak points
Troubleshoot production and development issues
Provide help with deployments, tooling support
Help drive the team towards continuous deployment
Improve system stability by close communication with developers regarding the weak points in the system
Passion to solve all engineers issues in a cloud-native way
Keeping the system green and stable
With help of our strong development team, you should be able to find a way to prevent incidents instead of just fixing those
Create and maintain operational runbooks
BS / MS Degree in Computer Science or related discipline preferred
Experience with AWS
At least basic experience with K8s (helm, operators)
Strong skills in scripting languages (shell scripts, Perl, Python)
Experience with Prometheus, Grafana and other open source monitoring/logging solutions
Interest in designing and troubleshooting of large-scale distributed systems
Strong communication skills as well as a problem-solving mind
Ability to automate routine tasks
Good working knowledge of relational and NoSQL databases
Nice to have:
Understanding of continuous deployment and how to get there
Infrastructure as code experience (Ansible, Terraform, CloudFormation)
Experience with logging setup configuration and maintenance (EKS, FluentD, LogStash)