FIND INTERNSHIPS

Avp/Vp, Observability & Sre Engineering, Technology Group

Posted on Oct. 24, 2025 by GIC

  • nan

Avp/Vp, Observability & Sre Engineering, Technology Group

Location: Singapore, SG
Job Function: Technology Group
Job Type: Permanent
Req ID: 16808
GIC is one of the world’s largest sovereign wealth funds. With over 2,000 employees across 11 locations around the world, we invest in more than 40 countries globally across asset classes and businesses. Working at GIC gives you exposure to an extraordinary network of the world’s industry leaders. As a leading global long-term investor, we Work at the Point of Impact for Singapore’s financial future, and the communities we invest in worldwide.

Technology Group
We experiment, design, and lead a 24×7 global business where we support core capabilities in asset management, trading, investment operations, and risk management. We deliver secure, reliable, and integrated solutions, and provide insights on new, and emerging technologies.

What impact will you make in this role?
The AVP/VP, Observability & SRE Engineering is responsible for developing and executing the enterprise observability and service reliability strategy across all infrastructure and application domains within the bank.
You will drive proactive monitoring, automation, and resilience initiatives using Datadog, Dynatrace, AWS, and Azure platforms - ensuring operational stability, risk control, and compliance with financial regulatory standards.
The role also establishes SRE frameworks, Site Reliability metrics (SLO/SLI/Error Budgets), and automation pipelines to improve system reliability, reduce Mean Time to Detect (MTTD) and Mean Time to Resolve (MTTR), and strengthen overall operational resilience.

What will you do as an AVP/VP, Observability & SRE Engineering?
Observability Strategy & Governance
  • Define and own the Enterprise Observability Architecture aligned with operational resilience mandates (e.g., MAS TRM, DORA, APRA CPS 230).
  • Deploy and optimize observability platforms such as Datadog, Dynatrace, Splunk for full-stack visibility (infra, application, network, and user experience).
  • Establish governance standards for telemetry data - metrics, logs, and traces - ensuring consistency, retention compliance, and security controls.
  • Integrate observability platforms with incident management, ITSM, and AIOps systems for predictive alerting and anomaly detection.

Reliability Engineering & Automation
  • Contribute in implementing SRE frameworks for infrastructure and business-critical applications.
  • Drive initiatives to automate runbooks, alerts, self-healing actions, and auto-remediation workflows via Python, Ansible, and Terraform.
  • Partner with Application, Infrastructure, and Cyber teams to codify operational reliability into the delivery lifecycle.
  • Conduct resilience testing, chaos engineering, and capacity validation in alignment with business continuity standards.
  • Develop error budget policies and reliability scorecards for key production services.

Cloud Observability & Platform Engineering
  • Architect and manage observability for cloud-native workloads hosted in AWS and Azure, ensuring visibility of compute, storage, and network layers.
  • Integrate cloud observability into landing zones and CI/CD pipelines, ensuring continuous compliance with deployment controls.
  • Implement infrastructure-as-code (IaC) models using Terraform and Ansible for consistent, auditable provisioning.
  • Collaborate with Cloud, DevOps, and Security teams to ensure real-time telemetry aligns with audit and compliance requirements.

Operational Excellence & Stakeholder Management
  • Drive reduction in incident recurrence, MTTR, and manual intervention through observability-led automation.
  • Partner with Service Delivery, Cyber, and Application teams to achieve predictive incident prevention and root cause transparency.
  • Deliver executive dashboards that highlight availability, reliability KPIs, and operational risk indicators.
  • Act as a technical advisor to senior management during major incidents, post-incident reviews, and technology audits.

What makes you a successful candidate?
  • Bachelor’s or Master’s degree in computer science, Engineering, or related discipline.
  • 12+ years’ experience in Infrastructure, Cloud, or SRE roles, with 5+ years in SRE SME capacity in financial institutions or regulated environments.
  • Proven hands-on expertise in:
    • Observability Platforms: Datadog, Dynatrace, Splunk, ELK
    • Automation / IaC: Terraform, Ansible, Python, CI/CD tools
    • Cloud Platforms: AWS (CloudWatch, X-Ray, CloudTrail), Azure (Monitor, Log Analytics, App Insights)
  • Deep understanding of SRE principles, service health modelling, error budgets, and auto-remediation design.
  • Familiarity with financial sector operational resilience frameworks, regulatory compliance, and incident governance.

Candidates must possess certifications in at least one of the following areas:
  • Datadog Certified Observability Professional/Dynatrace Certified Associate
  • Terraform/Ansible/Python Certified Expert

The following certifications are highly desirable:
  • AWS Certified DevOps Engineer/Azure DevOps Expert
  • SRE Foundation/Practitioner (DevOps Institute)
  • ITIL v4 Managing Professional

Work at the Point of Impact
We need to be forward-looking to attract the right people to help us become the Leading Global Long-term Investor. Join our ambitious, agile, and diverse teams - be empowered to push boundaries and pursue innovative ideas, share your views, and be heard. Be anchored on our PRIME Values: Prudence, Respect, Integrity, Merit and Excellence, which guides us in how we make our day-to-day decisions. We strive to inspire. To make an impact.

GIC is a Great Place to Work
At GIC, our offices are vibrant hubs for ideation, professional growth, and interpersonal connection. At the same time, we believe that flexibility allows us to do our best work and be our best selves. Thus, our teams come into the office four days per week to harness the benefits of in-person collaboration, but have the flexibility to choose which days they work from home and adjust this arrangement as situational needs arise. This role will be in our Tampines Office.

GIC is an equal opportunity employer
As an employer, we passionately believe every individual brings with them unique diversity of thought and perspectives to meaningfully enrich perspectives of GIC teams to drive competitive performance. An inclusive environment yields exceptional contribution.

Learn more about our Technology Group here:
https://gic.careers/group/technology-group/


Advertised until:
Nov. 23, 2025


Are you Qualified for this Role?


Click Here to Tailor Your Resume to Match this Job


Share with Friends!

Similar Internships


No similar Intern Jobs at the Moment!