Skip to content
View gkcloudai's full-sized avatar

Block or report gkcloudai

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Maximum 250 characters. Please don’t include any personal information such as legal names or email addresses. Markdown is supported. This note will only be visible to you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
gkcloudai/README.md

Gaurav Kumar

Senior Technical Program Manager | Cloud Infrastructure, AI/ML Platforms, and Responsible AI Governance

I lead large, cross-functional infrastructure and platform programs in financial services and high-scale tech. 14+ years turning ambiguous, multi-team technical bets into shipped, measurable outcomes: resilient platforms, zero-downtime migrations, and governance that scales.

Focus: Cloud infrastructure (GCP / GKE) | AI/ML and LLM platforms | Responsible AI governance | Cybersecurity | FinOps and capacity optimization

Selected work

Program Problem solved Outcome
Distributed Authentication Platform Legacy auth failing under peak load and regional outages Multi-region, zero-downtime design at 7K+ TPS, 99.9% availability
Cloud Cost Intelligence Platform No real-time cost attribution across teams ~$2.3M annual savings, ~30 to 35% BigQuery reduction, 8+ teams onboarded
Responsible AI Governance Framework Inconsistent, unauditable model risk decisions Operating model, lifecycle gates, and policy for AI/ML governance
LLM Platform Program No structured path from LLM prototype to production Reference architecture, eval rubric, and phased rollout plan
Model Eval and Release Pipeline Models shipped without consistent quality gates Runnable evaluation and release-gating pipeline with CI
Cloud Migration Readiness Framework No visibility into dependencies and go-live risk Centralized readiness and dependency governance across 40+ components

Also delivered (production programs)

  • MongoDB Atlas blue/green sharding migration: sub-50ms p99, zero-downtime cutover
  • AIOps and responsible-AI governance program: ~40% engineering velocity gain

Toolbox

GCP / GKE | AWS | MongoDB Atlas | BigQuery | Kafka | Terraform | Grafana | Jira / Confluence | GitHub Copilot

Connect

LinkedIn


Repositories here are program case studies based on real work. Metrics are from production programs; code is illustrative unless a repo states otherwise.

Popular repositories Loading

  1. gkcloudai gkcloudai Public

  2. Distributed-Authentication-Platform Distributed-Authentication-Platform Public

    Multi-region, zero-downtime authentication platform on GKE. Program case study.

    HCL

  3. Cloud-Migration-Readiness-Framework Cloud-Migration-Readiness-Framework Public

    Readiness, dependency, and go/no-go governance for large cloud migrations.

  4. Cloud-Cost-Intelligence-Platform Cloud-Cost-Intelligence-Platform Public

    Real-time cloud cost attribution and FinOps governance. Program case study.

  5. Responsible-AI-Governance-Framework Responsible-AI-Governance-Framework Public

    Operating model, checklists, and policy for responsible AI/ML governance in regulated orgs.

  6. LLM-Platform-Program LLM-Platform-Program Public

    Program brief, reference architecture, eval rubric, and rollout plan for an internal LLM platform.