Site Reliability Engineering Lead

役職名:	Site Reliability Engineering Lead
勤務地:	Kuala Lumpur
職種:	IT
給与:	MYR 120,000 - 180,000 (Annual)
求人番号:	PR/158845
求人情報掲載日:	2025/04/07 12:21
勤務形態:	ハイブリッド勤務

COMPANY OVERVIEW
A well-established client of us in Kuala Lumpur is seeking for Site Reliability Engineering Lead.

JOB RESPONSIBILITIES

○ Lead and mentor a team of SREs, fostering a culture of ownership, collaboration, and continuous improvement.

○ Define clear goals, performance metrics, and development plans for the team.

○ Design and implement strategies to improve system reliability, scalability, and performance.

○ Conduct root cause analysis of production incidents and develop preventive solutions.

○ Oversee the deployment, monitoring, and management of production environments.

○ Collaborate with development teams to design cloud-native infrastructure and architecture.

○ Drive automation of operational processes, reducing manual intervention and response times.

○ Optimize CI/CD pipelines to ensure smooth and rapid deployments.

○ Establish incident response protocols and lead efforts during major incidents.

○ Ensure robust monitoring and alerting systems are in place to proactively detect issues.

○ Act as a liaison between engineering, operations, and other teams to align objectives.

○ Share insights and best practices with internal stakeholders to enhance overall system resilience.

JOB REQUIREMENTS

○ Strong experience with cloud platforms (AWS, Azure, Google Cloud) and infrastructure-as-code tools (Terraform, Ansible, etc.).

○ Proficiency in programming/scripting languages (Python, Go, Shell, etc.).

○ Deep knowledge of Kubernetes, containerization, and distributed systems.

○ Proven track record of leading SRE or DevOps teams and managing large-scale production environments.

○ Strong decision-making, prioritization, and problem-solving capabilities.

○ Expertise in implementing and using monitoring tools (Prometheus, Grafana, Datadog, etc.) and logging systems.

○ Familiarity with service-level objectives (SLOs), service-level agreements (SLAs), and error budgets.

○ Excellent communication and collaboration skills to work across cross-functional teams.

○ Ability to mentor and upskill team members, fostering a learning-oriented culture.

○ At least 8 years of experience in SRE, DevOps, or related roles with a focus on reliability engineering

Site Reliability Engineering Lead