Carousell Group logo

Site Reliability Engineer (GCP)

Carousell Group
2 hours ago
Full-time
On-site
Ho Chi Minh City, Ho Chi Minh City, Vietnam
Developer

Company Description

Chợ Tốt’s technology foundation is growing and expanding to power our next level of growth, serving tens of millions of Vietnamese. Our Site Reliability Engineering (SRE) team works day-to-day with open-source CNCF projects, building robust platforms, automation, and data engineering pipelines that enable the continuous releases of hundreds of microservices.

Join us to solve big-scale distributed system problems in a fast-paced, agile environment. We serve hundreds of millions of requests and manage data pipelines with over a billion messages daily. Because we are part of the larger Carousell Group, solving a technical problem here means your solution can make a regional impact!

Job Description

Why You’ll Love Working With Us:

  • True Blameless Culture: We tackle incidents as a team. Our strict policy is: Fix the incident first, investigate the root cause later—absolutely no finger-pointing.
  • 100% Cloud & Massive Scale: Run entirely on the Google Cloud (GCP) ecosystem and Google Kubernetes Engine (GKE), managing auto scale-up for high-traffic events.
  • AI Integration: We are actively leveraging AI to speed up daily tasks, automate log analysis/troubleshooting, and accelerate software releases.
  • Empowerment & Trust: Access rights start at a minimum but scale up based on your capability. Master the system, and you’ll be granted the highest level of system access.

Key Responsibilities (50% Automation / 50% Operations):

This is a key role requiring solid engineering knowledge, production experience, and hands-on implementation ability. You will:

  • Act as the first line of defense for incident handling, tackling issues manually and promptly when they occur.
  • Ensure the highest levels of production system performance, availability, and scalability.
  • Automate the provisioning of infrastructure on the cloud, systems, and software.
  • Design and operate build & release pipelines, configuration management, and code deployments to multiple environments.
  • Work closely with the development team to integrate new deployment processes and strategies.
  • Seek out problems or opportunities in critical high-impact areas and solve them.

Your First 6 Months:

  • Months 1-2 (Learning Phase): Dedicate time to adapt to Chợ Tốt's core infrastructure. We will sponsor your learning via Coursera to study and pass mandatory Google Cloud / K8s certificates. You will grasp the infrastructure across all 3 environments.
  • Months 3-6 (Execution Phase): Fully master the infrastructure, especially Production. You will handle support requests from Engineers, take on Group-level tasks, and participate in on-call duties.

Qualifications

Minimum Qualifications (Must-Haves):

  • Tech Stack: Hands-on experience in Python AND any of Bash/Perl/Golang.
  • Cloud & Orchestration: Hands-on experience in the Cloud ecosystem (GCP or AWS) and tooling. Experience managing, scaling, and troubleshooting containerized workloads using Kubernetes or similar orchestration platforms.
  • Infrastructure as Code: Hands-on experience in Terraform or similar software like Ansible/Chef.
  • Fundamentals: Solid foundation and knowledge of operating systems, databases, and distributed systems fundamentals.
  • DevOps Culture: Full understanding and experience in DevOps development culture, principles, and practices.
  • Soft Skills & Persona: Self-motivated, detail-oriented, and responsible. You must have a "low ego," be open to feedback, and have strong interpersonal skills to handle cross-team collaboration smoothly. You are careful in execution—always preparing clear, detailed plans and documentation.
  • Language: Very good command of English in both reading and writing (equivalent to IELTS 5.0 - 6.0).

Preferred Qualifications (Nice-to-Haves):

  • Experienced with Golang, React, or NodeJS.
  • Production experience in using and operating software services that utilize: Kafka, PostgreSQL, MongoDB, ElasticSearch, Prometheus.
  • Leveraging AI to automate processes and tasks is a massive plus.
  • Ability to drive sound architecture, implementation, and technical investigations through hands-on development, plus systematic planning and execution.
  • Enjoy learning and ramping up on new technologies quickly.
  • Professional Certification preferred, ideally in Google Cloud Platform.
  • BSc, MSc in Computer Science or Engineering or equivalent.

Additional Information

What’s in it for you:

  • Get 15 days of annual leave and 5 days of sick leave per year
  • Health insurance for employees
  • Get a 13th-month salary and annual performance bonus.
  • Young team and ambitious goals
  • Hard-working and delightful colleagues to work with
  • Vibrant FRUIT culture across the company
  • Online learning access to Coursera, Udemy and O'Reilly

 Thank you for taking your time to read our job description and thank you in advance if you decide to apply for this position. Shortlisted candidates will be contacted within 2 weeks since application, otherwise we might meet when another chance arises.

By proceeding with your application, you are adhering to our PDPA policies. In case you are interested to know more, read about our Candidates Personal Data Privacy Statement