Google Cloud VMs can sometimes get stuck or become inaccessible due to disk space exhaustion — often from leftover Docker images, containers, and volumes accumulating after CI/CD pipelines. Here’s a clear, practical step-by-step guide to restore your VM and clean up Docker efficiently.
Scenario
- VM stuck and SSH blocked with disk full error
- VM runs Docker (e.g., GitLab Runner)
- You added an extra disk for recovery but couldn’t free space from inside the VM
- Need to clean Docker system to regain space
Step 1: Create a Rescue VM and Attach the Corrupt Disk
- Create a new temporary rescue VM in the same zone as your corrupt instance (e.g.,
us-central1-a
). - Detach the boot disk from the corrupt VM.
- Attach the corrupt disk as an additional disk (e.g.,
/dev/sdb
) to the rescue VM.
gcloud compute instances detach-disk corrupt-vm --disk=corrupt-disk --zone=us-central1-a
gcloud compute instances attach-disk rescue-vm --disk=corrupt-disk --zone=us-central1-a
Step 2: Mount the Corrupt Disk on the Rescue VM
SSH into the rescue VM, then:
sudo mkdir /mnt/recovery
sudo mount /dev/sdb1 /mnt/recovery
Verify mount success:
df -h /mnt/recovery
Step 3: Inspect Disk Usage on the Corrupt Disk
Check which directories consume the most space:
sudo du -xh /mnt/recovery --max-depth=1 | sort -h
Usually, /var/lib/docker
is the culprit.
Step 4: Clean Up Docker Files Manually
Since the corrupt VM’s Docker daemon is not running, you cannot run docker system prune
directly. Instead, manually remove Docker files from the mounted disk:
sudo rm -rf /mnt/recovery/var/lib/docker/containers/*
sudo rm -rf /mnt/recovery/var/lib/docker/image/*
sudo rm -rf /mnt/recovery/var/lib/docker/volumes/*
This frees up space by deleting containers, images, and volumes.
Step 5: Unmount and Detach the Disk from Rescue VM
sudo umount /mnt/recovery
Then detach the disk:
gcloud compute instances detach-disk rescue-vm --disk=corrupt-disk --zone=us-central1-a
Step 6: Reattach the Disk to the Original VM and Start It
gcloud compute instances attach-disk corrupt-vm --disk=corrupt-disk --zone=us-central1-a --boot
gcloud compute instances start corrupt-vm --zone=us-central1-a
Step 7: Connect via SSH and Verify
Once the VM is running, SSH in and verify free space:
df -h /
You should see available space restored.
By using a rescue VM to mount the corrupt disk and manually cleaning Docker artifacts, you can restore a stuck Google Cloud VM without losing data. Integrate Docker cleanup into your build pipeline to avoid repeating this process.