How we deployed an 8B parameter model on GCP from oversized Docker images to a low-latency Vertex AI endpoint with vLLM. Real data, real tradeoffs.