Maintain, improve, and support cloud infrastructure for highly-available and scalable production operations at Rakuten. Work with team members and other teams to troubleshoot problems, find solutions and complete projects.
Responsibilities
Maintain, develop, and operate a high-availability and high-rate production environment.
Assist with launching new and upgrading existing applications.
Upgrade system software, apply patches, and implement security standards.
Mentor, advise, and train other team members.
Use monitoring tools to identify operational anomalies and take action to resolve them
Follow, create, and edit SOPs and documentation
Demonstrate ability to be a flexible, organized, thoughtful, and proactive team player who can work with minimal guidance as well as strong abilities in communication.
Explore new technologies and solutions to enhance operational excellence and cost efficiencies
Qualifications
5+ years overall experience with the following skill sets:
Server Administration with Linux (Ubuntu, RedHat/Centos) Operating System
Server Virtualization (KVM, VMware, HyperV)
Scripting (Bash, Python)
Cloud Experience (Google Cloud Platform, Amazon Web Service, OpenStack)
Docker with orchestration experience (e.g. Kubernetes)
Automation (such as CI/CD experience withJenkins, CircleCI)
Monitoring (Prometheus and Grafana preferred)
Configuration and Change Management (such as Ansible and Chef)
Must be able to work on multiple inflight projects
Must have an aptitude and a desire for proactive support of research and innovation. A thoughtful, collaborative approach with end users, staff colleagues, and other IT professionals is essential.
Must possess excellent communication and organizational skills as well the ability to work under minimal supervision in a flexible, team-oriented environment.
Perform on call responsibilities as part of a rotation
Preferences (nice-to-have)
Experience with highly-available clustered server environments a plus.
Ability to provide technical support and perform system maintenance in high-availability production environment.
Relies on experience and judgment to plan and accomplish goals.
Experience with performance engineering and troubleshooting a plus.
Experience with NoSQL databases (e.g. Mongo, Couchbase)
Experience with HashiCorp Products (e.g. Terraform, Vault, Packer, Consul)
Experience on BIND DNS Server administration and configuration.
Good knowledge and understanding of various cloud platform features and services