Available for Senior/Staff SDE | SRE | MLE Roles | GPU/AI Infrastructure Specialist
Bill Hsu

Hi, I'm Bill Hsu

Architected Cross-Cluster Infrastructure for 10,000+ GPUs 🚀

Site Reliability Engineer II @ Alibaba Cloud | Ex-Amazon AGI
Building the infrastructure that powers next-gen AI

↓ See My Impact ↓

What I Bring to Your Team

🏗️

Cross-Cluster Architecture Expert

Designed Dual-Layer Virtual Kubelet for 10,000+ heterogeneous GPUs

  • Unified Resource Orchestration across clusters
  • Federated Identity Mesh with secure AuthN/AuthZ
  • Hybrid Network Fabric for low-latency communication
  • 25x scaling for Unitree G1 robot training
💰

Massive Cost Savings

Proven track record of multi-million dollar optimizations

  • $9M annual savings at Amazon AGI
  • 40% cost reduction at Alibaba Cloud
  • Migration from Serverless to Reserved instances
  • 3,000+ scaling requests handled efficiently

AIOps & Self-Healing Systems

Architecting autonomous infrastructure that fixes itself

  • 90% MTTR reduction (10hr → 1hr)
  • Custom Kubernetes Controllers for auto-remediation
  • Closed-loop telemetry pipeline with SysOM
  • Zero data loss in isolated GPU sandboxes
🔐

Security & Reliability at Scale

Enterprise-grade security for AI/ML infrastructure

  • Novel credential injection via Service Accounts
  • 9-hour token rotation for Cross-Cluster auth
  • Secure Enclave telemetry with DCGM metrics
  • 100 engineer-hours saved monthly via automation

Technical Expertise

Cloud Native & Kubernetes

K8s Internals (Operators, CRDs, Virtual Kubelet)95%
Cross-Cluster Architecture95%
AWS EKS / Helm / Docker90%
Service Mesh & Identity Management90%

Infrastructure & Automation

Terraform / AWS CDK / CloudFormation90%
Golang (Primary Language)95%
Python / TypeScript85%
VPC Networking / gRPC / eBPF85%

AI Infrastructure & GPU

NVIDIA A100/H100 Optimization90%
Ray Cluster/Serve85%
PyTorch Distributed Training80%
Prometheus / Grafana / DCGM90%

Certifications & Achievements

🎓 UCSD MS Computer Science📚 3 Publications🏆 Amazon Nova Co-Author⚙️ Kubernetes Expert

Systems I Could Be Managing for You

LIVE SIMULATION
Active GPUs
9,847
Requests/sec
1,245
Uptime
99.99%
Active Nodes
342

What People Say About My Work

💡 These are simulated testimonials based on actual impact metrics
"

Bill's Dual-Layer Virtual Kubelet architecture revolutionized our cross-cluster GPU management. His solution enabled 25x scaling for Unitree G1 robot training while achieving 40% cost reduction.

Engineering DirectorAnalyticDB AI PlatformAlibaba Cloud

Let's Build Something Amazing Together

Why Schedule a Call?

  • Discuss how I can solve your infrastructure challenges
  • Share ideas about scaling AI/ML systems
  • Explore potential collaboration opportunities
  • Get insights from my experience at scale

My Availability

Pacific Time (PST/PDT)

Mon-Fri: 9 AM - 6 PM

Response within 24 hours