Sr. Principal Software Engineer – Site Reliability

Company Overview

Coupang is the largest e-commerce company in Korea, delivering millions of items, including fresh groceries, within hours to millions of people, 365 days a year. Our mission is to create a world in which customers wonder, ‘How did I ever live without Coupang?’ Korea is one of the fastest growing e-commerce markets in the world, and Coupang is a leader in this fast-growing industry. Powered by innovative technology and operations, we have set out to transform the customer experience journey–from revolutionizing last-mile delivery to rethinking how customers search and discover on a truly mobile platform. We have invested heavily in infrastructure and technology, building an integrated system that we control from end to end. This enables us to improve as we grow, build new services, and break the tradeoffs between price, selection, and quality that consumers are too often forced to take for granted.

We have been named as one of the ‘50 Smartest Companies in the World’ by MIT Technology Review, and as one of Forbes magazine’s ‘30 Global Game Changers.’ In 2020, we placed second on CNBC’s ‘Disruptor 50’ list.

 About this Role

Site Reliability team will be responsible for availability, uptime and improving the overall alerting, monitoring and incident management strategy for all Coupang services to ensure a more stable, available and a reliable platform to our end customer. This team will work closely with various domain teams in monitoring and triaging any service failures and will establish process, technical expertise and work closely with subject matter experts from multiple domain teams to improve our overall SLAs for uptime and site availability.

Key Responsibilities

  • Overall good sense of troubleshooting Cloud Service Providers like AWS, Azure, Oracle or Google Cloud products
  • Familiarity with POSIX systems and general troubleshooting like reviewing data from logs (Splunk, Graylog etc.)
  • Familiarity with alerting and monitoring tools like VictorOps, CloudWatch and how to handle incident management process
  • Strong DevOps/SRE background like use of Python, Perl or any other programming language, ability to understand technology stack running on Java based applications

Essential Qualifications

  • Strong understanding of Java, Linux and AWS services
  • Experience in dealing with alerting and monitoring tools like Dynatrace,
  • DataDog, VictorOps, GrayLog, Splunk, CloudWatch etc.

Preferred Qualifications

  • Overall good sense of troubleshooting Cloud Service Providers like AWS, Azure, Oracle or Google Cloud products
  • A strong sense of customer support skills and mindset to escalate issues.

 

 

Apply for this Job

* Required