- Strong understanding of Linux administration.
- Strong understanding of AWS/GCP/Azure cloud.
- In depth understanding of networking.
- Manage multiple data-pipeline infrastructure.
- Tune the servers as per their roles - as Kafka, zookeeper, elasticsearch, Cassandra, influxDB etc.
- Ensures high availability of the Servers and Applications running on these servers using various monitoring/automation scripts/tools
- Writing scripts for procurement, configuration and deployment of instances for managing system admin task’s
- Managing AWS services like VPC, EC2, ELB, Route53, RDS, S3, Elastic Cache and more.
- Managing system resources using salt, Ansible, puppet or similar configuration management tools.
- Managing high availability, low latency applications. Focus on security best practices to ensure assist in security and compliance activities.
- Sound understanding and working experience on high availability, high performance systems
- Be consistently learning, as staying on top of your game is part of your job.
- Leverage both system and software engineering skills in order to address the needs of the teams
- Expert in troubleshooting performance and behavioral problems in Linux-based systems
5+ years’ experience in Software Engineering and DevOps, Site Reliability Engineering, or equivalent field
- Need to be strong Python and Bash/Shell Scripting. (Must)
- Experience on GCP (Google Cloud Platform)/AWS (Must)
- Experience in Linux. (Must)
- Experience in Ansible/Chef/Puppet. (Must)
- Log Management Tools like ELK (Elastic Search, Logstash, Kibana), Splunk. - Added Advantage.
- Knowledge about big data system such as (InfluxDB or ElasticSearch or Cassandra) - Added Advantage.
- In depth knowledge about Networking, UNIX and low level OS internals. - Added Advantage.