Skip to content

How to rescue instances?

Openstack offers a rescue mode to recover VMs. It is a command that allows for a different image to boot a VM. This can be used when the virtual machine fails to boot due to a kernel panic, full disk, or when you simply lost access to the private key. By allowing you to boot from a different image, you will be able to mount and edit the files on your current disk and fix the problem.

Symptoms

Kernel Panic

Check your instance Console Log (web UI: Instances > <your instance> > Log)

```sh
[    1.041853] Loading compiled-in X.509 certificates
[    1.043433] Loaded X.509 cert 'CentOS Linux kpatch signing key:ea0413152cde1d98ebdca3fe6f0230904c9ef717'
[    1.046556] Loaded X.509 cert 'CentOS Linux Driver update signing key:7f421ee0ab69461574bb358861dbe77762a4201b'
[    1.050310] Loaded X.509 cert 'CentOS Linux kernel signing key:d4115f110055db56c8d605ab752173cfb1ac54d8'
[    1.053448] registered taskstats version 1
[    1.055861] Key type trusted registered
[    1.057771] Key type encrypted registered
[    1.059249] IMA: No TPM chip found, activating TPM-bypass! (rc=-19)
[    1.061680]   Magic number: 14:548:18
[    1.063246]  ep_81: hash matches
[    1.064844] rtc_cmos 00:00: setting system clock to 2018-08-23 08:02:54 UTC(1535011374)
[    1.067954] md: Waiting for all devices to be available before autodetect
[    1.069982] md: If you don't use raid, use raid=noautodetect
[    1.072041] md: Autodetecting RAID arrays.
[    1.073689] md: autorun ...
[    1.074976] md: ... autorun DONE.
[    1.076358] List of all partitions:
[    1.077771] No filesystem could mount root, tried: 
[    1.079600] Kernel panic - not syncing: VFS: Unable to mount root fs onunknown-block(0,0)
[    1.082286] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 3.10.0-862.11.6.el7.x86_64 #1
[    1.085033] Hardware name: Fedora Project OpenStack Nova, BIOS 0.5.1 01/01/2011
[    1.087639] Call Trace:
[    1.088800]  [<ffffffff871135d4>] dump_stack+0x19/0x1b
[    1.090453]  [<ffffffff8710d11f>] panic+0xe8/0x21f
[    1.091982]  [<ffffffff8776c761>] mount_block_root+0x291/0x2a0
[    1.093704]  [<ffffffff8776c7c3>] mount_root+0x53/0x56
[    1.095394]  [<ffffffff8776c902>] prepare_namespace+0x13c/0x174
[    1.097281]  [<ffffffff8776c3df>] kernel_init_freeable+0x1f8/0x21f
[    1.099244]  [<ffffffff8776bb1f>] ? initcall_blacklist+0xb0/0xb0
[    1.101131]  [<ffffffff87101bc0>] ? rest_init+0x80/0x80
[    1.102813]  [<ffffffff87101bce>] kernel_init+0xe/0xf0
[    1.104497]  [<ffffffff871255f7>] ret_from_fork_nospec_begin+0x21/0x21
[    1.106367]  [<ffffffff87101bc0>] ? rest_init+0x80/0x80
[    1.107997] Kernel Offset: 0x5a00000 from 0xffffffff81000000 (relocation range:0xffffffff80000000-0xffffffffbfffffff)
```

The log says that the instance couldn't boot because it can't find root "Kernel panic - not syncing: VFS: Unable to mount root fs onunknown-block(0,0)". The fix is to use (some) previous, working kernel. Since you can't boot the server, you have to make the fix to the Volume (boot files) by using another instance.

Access denied

The problem can be as simple as:

$ ssh cloud-user@<floating-ip>
cloud-user@<floating-ip>: Permission denied (publickey,gssapi-keyex,gssapi-with-mic).

How to fix the issue, nova rescue

Note that there are always several ways to fix any problem, this FAQ is mainly meant to show one of the ways to fix these kinds of problems. Also meanwhile you are allowed to edit Grub boot parameters, the root single mode access is disabled by default for security reasons. The procedure to perform a rescue is as follows:

  1. You need to have installed the OpenStack command line tools. And you have to login, and see Configure your terminal environment for OpenStack for reference.

  2. Get the server's ID, and store it in an environment variable called: INSTANCE_UUID :

    $ openstack server list
    +--------------------------------------+-----------+--------+----------------------------+-------+----------------+
    | ID                                   | Name      | Status | Networks                   | Image | Flavor         |
    +--------------------------------------+-----------+--------+----------------------------+-------+----------------+
    | 55555566-ffff-4a52-5735-356251902325 | comp1  | ACTIVE | net=192.168.211.211  |       | standard.small |
    +--------------------------------------+-----------+--------+----------------------------+-------+----------------+
    
  3. Shutdown the instance:

    openstack server stop $INSTANCE_UUID
    
  4. Check that the VM is stopped:

    openstack server show $INSTANCE_UUID
    

    The power_state should be Shutdown.

  5. You are now ready to launch the rescue of the instances:

    nova rescue $INSTANCE_UUID --image <image-name>
    

    Ignore the password the command shows, the ssh password login is disabled in all our images.

    Warning

    There is also a command named openstack server rescue which is almost the same as nova rescue but is missing the --image flag which is almost always required when rescuing servers.

    In this step, you should use the same image as your instance. You can get a list of images available by:

    openstack image list
    

    Cirros

    Cirros is a small image designed for rescue operations when access was lost. It provides a default username and password that can be used in Pouta's web console

  6. Make sure that the instance is in rescue mode with:

    openstack server show $INSTANCE_UUID
    

Connecting

Using ssh

Ssh into the instance, the user and IP should be the same as the normal ones.

ssh <default-user>@<floating-ip>

You will get this warning: WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!. This is what is called the host keys, they are stored in the VM's disk, and they change because you are booting using a different disk. Fix it by removing the line of your instance IP address from the file ~/.ssh/known_hosts. An alternative way is the execution of the following command:

ssh-keygen -f "~/.ssh/known_hosts" -R "$INSTANCE_IP"

Using Pouta's web console (cirros)

Login in Pouta's web interface: https://pouta.csc.fi. Look for your instance and click in console.

Web console

The username and password should be printed in the console text, above the login.

Mount the disk

  1. Check what volumes you have. If you don't have any other volumes attached it should look something like this:

    $ lsblk
    NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
    vda    253:0    0  10G  0 disk
    └─vda1 253:1    0  10G  0 part
    vdb    253:16   0  80G  0 disk
    └─vdb1 253:17   0  80G  0 part /
    
  2. Now you want to mount vdb1 to /tmp/mnt and go to that directory:

    $ sudo mkdir -p /tmp/mnt
    $ sudo mount /dev/vdb1 /tmp/mnt/
    

Change bootloader (Grub)

  1. Take a backup of grub:

    $ cp /tmp/mnt/boot/grub2/grub.cfg /tmp/mnt/root/grub.cfg.bak-$(date +"%F")
    
  2. Open /tmp/mnt/boot/grub2/grub.cfg with your favorite text editor. Remove the first menuentry section.

    NOTE: This might not be the correct solution for your specific problem. The first menuentry is normally your latest and default kernel.

Use chroot to change the / folder

In case that your instance has issues due to some broken packages or drivers, then you can switch to your original and fix the problems using the following commands:

$ sudo mv /tmp/mnt/etc/resolv.conf{,.bak}
$ sudo cp /etc/resolv.conf /tmp/mnt/etc/resolv.conf
$ sudo chroot /tmp/mnt

The chroot has now changed your root folder / to /tmp/mnt/ (your VM's disk partition). And can do any fix or change like uninstalling or reinstalling a package.

Get out of rescue

  1. Log out from the instances and unrescue the instance:

    nova unrescue $INSTANCE_UUID
    
  2. It would be a good idea to verify that a restart works after the kernel reinstallation:

    ssh <default-user>@<floating-ip> reboot
    

    wait to boot and ssh to it again:

    ssh <default-user>@<floating-ip>
    

    It should work as before the incident happened.


Last update: January 23, 2023