About Me
Message for Recruiters & Hiring Managers
I am Yan-Cheng (Bill) Hsu, an Infrastructure Software Engineer II at Alibaba Cloud, specializing in AI Training Platform resource management. With a Master's degree from UC San Diego, I bring deep expertise in building and managing large-scale GPU infrastructure for AI/ML workloads.
At Alibaba Cloud, I architect a cross-cluster AI training platform for a humanoid-robotics customer, centralizing massive-scale heterogeneous GPU fleets with ~40% TCO reduction and 25x capacity scaling — the foundation for two 0→1 SaaS products (interactive dev workstations and a distributed training & simulation scheduler) and a trust-first multi-agent RCA kernel.
Previously at Amazon AGI Org, I built the GPU infrastructure powering Amazon NOVA, architecting systems that delivered multi-million dollar annual savings and reduced troubleshooting time by ~90% per incident. I'm a co-author on the Amazon Nova technical report.
My research includes publications in Sensors journal and IEEE APSIPA ASC 2023 on deep learning and time series transformers. I combine strong systems engineering skills with AI/ML expertise to build reliable, cost-effective infrastructure at scale.
