Recover a Corrupt Google Cloud VM with Disk Space Issues Caused by Docker

Google Cloud VMs can sometimes get stuck or become inaccessible due to disk space exhaustion — often from leftover Docker images, containers, and volumes accumulating after CI/CD pipelines. Here’s a clear, practical step-by-step guide to restore your VM and clean up Docker efficiently.

Scenario

  • VM stuck and SSH blocked with disk full error
  • VM runs Docker (e.g., GitLab Runner)
  • You added an extra disk for recovery but couldn’t free space from inside the VM
  • Need to clean Docker system to regain space

Step 1: Create a Rescue VM and Attach the Corrupt Disk

  1. Create a new temporary rescue VM in the same zone as your corrupt instance (e.g., us-central1-a).
  2. Detach the boot disk from the corrupt VM.
  3. Attach the corrupt disk as an additional disk (e.g., /dev/sdb) to the rescue VM.
gcloud compute instances detach-disk corrupt-vm --disk=corrupt-disk --zone=us-central1-a
gcloud compute instances attach-disk rescue-vm --disk=corrupt-disk --zone=us-central1-a

Step 2: Mount the Corrupt Disk on the Rescue VM

SSH into the rescue VM, then:

sudo mkdir /mnt/recovery
sudo mount /dev/sdb1 /mnt/recovery

Verify mount success:

df -h /mnt/recovery

Step 3: Inspect Disk Usage on the Corrupt Disk

Check which directories consume the most space:

sudo du -xh /mnt/recovery --max-depth=1 | sort -h

Usually, /var/lib/docker is the culprit.

Step 4: Clean Up Docker Files Manually

Since the corrupt VM’s Docker daemon is not running, you cannot run docker system prune directly. Instead, manually remove Docker files from the mounted disk:

sudo rm -rf /mnt/recovery/var/lib/docker/containers/*
sudo rm -rf /mnt/recovery/var/lib/docker/image/*
sudo rm -rf /mnt/recovery/var/lib/docker/volumes/*

This frees up space by deleting containers, images, and volumes.

Step 5: Unmount and Detach the Disk from Rescue VM

sudo umount /mnt/recovery

Then detach the disk:

gcloud compute instances detach-disk rescue-vm --disk=corrupt-disk --zone=us-central1-a

Step 6: Reattach the Disk to the Original VM and Start It

gcloud compute instances attach-disk corrupt-vm --disk=corrupt-disk --zone=us-central1-a --boot
gcloud compute instances start corrupt-vm --zone=us-central1-a

Step 7: Connect via SSH and Verify

Once the VM is running, SSH in and verify free space:

df -h /

You should see available space restored.

By using a rescue VM to mount the corrupt disk and manually cleaning Docker artifacts, you can restore a stuck Google Cloud VM without losing data. Integrate Docker cleanup into your build pipeline to avoid repeating this process.