osdir.com


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[nova][glance][cinder] How to do consistent snapshots with quemu-guest-agent


[apologies for the top-post]

Hi Ralf,

it looks like you've met all the necessary prerequisites. Basically,

1. The image you are booting from must have the hw_qemu_guest_agent=yes
property set (this configures the Nova instance with a virtual serial
device consumed by nova-guest-agent).

2. The instance must run the qemu-guest-agent daemon.

3. The image you are booting from should have the os_require_quiesce=yes
property set. This isn't strictly necessary, as libvirt should always
try to send the freeze/thaw commands over the serial device if your
instance is configured with hw_qemu_guest_agent â?? but if
os_require_quiesce is set then the snapshot will actually fail if
libvirt can't freeze, which is what you probably want.

4. The filesystem used within the guest must support fsfreeze. This
includes btrfs, ext2/3/4, and xfs, and a few others. vfat on Linux does
not support being frozen, though Windows guests with the Windows Qemu
Guest Agent apparently do support freezing if VSS is enabled â?? I am no
expert on Windows guests though.

What happens under the covers is that qemu-guest-agent invokes the
FIFREEZE ioctl on each mounted filesystem in the guest, as seen here:

https://git.qemu.org/?p=qemu.git;a=blob;f=qga/commands-posix.c#l1327
(the comments immediately above that line explain under which
circumstances the FIFREEZE ioctl may fail).

The FIFREEZE ioctl maps to the kernel freeze_super() function, which
flushes the filesystem superblock, syncs the filesystem, and then
disallows any further I/O. Which, to answer your other question, should
indeed persist all in-flight I/O to disk. Unfortunately, nothing in the
code path (that I know of) issues any printk's on success, so dmesg
won't tell you that the filesystem has been flushed/frozen successfully.
You'd only see "VFS:Filesystem freeze failed" in your guest's kernel log
on error. The same is true for FITHAW/thaw_super(), which thaws the
superblock and makes the filesystem writable again.

However, you can (at least on an Ubuntu guest), create a file named
/etc/default/qemu-guest-agent, in which you can define DAEMON_ARGS like
this:

DAEMON_ARGS="--logfile /var/log/qemu-ga.log --verbose"

Then, while you are creating a snapshot with "nova image-create" or
"openstack server image create", /var/log/qemu-ga.log should
be populated with log entries related to the fsfreeze events. The same
should be true for creating a snapshot from Horizon.

On Ubuntu bionic, you should also make sure that you are running
qemu-guest-agent from bionic-security (or a recent daily build of an
Ubuntu cloud image), because at least in the initial bionic release
qemu-guest-agent was suffering from a packaging issue, described in
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1820291.

For RBD-backed Nova/libvirt, things are a bit more complicated still,
due to what appears like somewhat inconsistent/unexpected behavior in
Nova. See the discussion in:

https://lists.ceph.io/hyperkitty/list/ceph-users at ceph.io/thread/3YQCRO4JP56EDJN5KX5DWW5N2CSBHRHZ/

Does this give you enough information so you can verify whether or not
freeze/thaw is working as expected for you?

Cheers,
Florian


On 14/08/2019 10:41, Teckelmann, Ralf, NMU-OIP wrote:
> Hello,
> 
> 
> Working me through documentation and articles I am totally lost on the
> matter.
> 
> All I want to know is:
> 
> - if issueing "openstack snapshot create ...."
> 
> - if klicking "create Snaphost" in Horizon for an instance
> 
> will secure a consistent snapshot (of all volumes in question).
> With "consistent", I mean that all the data in memory are written to the
> disc before starting a snapshot.
> 
> I hope someone can clear up, if using the setup described in the
> following is sufficient to achieve this goal or if I have to do
> something in addition.
> 
> 
> If you have any question I am eager to answer as fast as possible.
> 
> 
> Setup:
> 
> 
> We have a Stein-based OpenStack deployment with cinder backed by ceph.
> 
> Instances are created with cinder volumes. Boot volumes are based on an
> image having the properties:
> 
> - hw_qemu_guest_agent='yes'
> - os_require_quiesce='yes'
> 
> 
> The image is ubuntu 16.04 or 18.04 with quemu-guest-agent package
> installed and service running (no additional configuration besides
> distro-default):
> 
> 
> qemu-guest-agent.service - LSB: QEMU Guest Agent startup script
>    Loaded: loaded (/etc/init.d/qemu-guest-agent; bad; vendor preset:
> enabled)
>    Active: active (running) since Wed 2019-08-14 07:42:21 UTC; 9min ago
>      Docs: man:systemd-sysv-generator(8)
>    CGroup: /system.slice/qemu-guest-agent.service
>            â??â??2300 /usr/sbin/qemu-ga --daemonize -m virtio-serial -p
> /dev/virtio-ports/org.qemu.guest_agent.0
> 
> Aug 14 07:42:21 ulthwe systemd[1]: Starting LSB: QEMU Guest Agent
> startup script...
> Aug 14 07:42:21 ulthwe systemd[1]: Started LSB: QEMU Guest Agent startup
> script.
> 
> I can see the socket on the compute node and send pings successfully:
> 
> ~# ls /var/lib/libvirt/qemu/*.sock
> /var/lib/libvirt/qemu/org.qemu.guest_agent.0.instance-0000248e.sock
> root at pcevh2404:~# virsh qemu-agent-command instance-0000248e
> '{"execute":"guest-ping"}'
> {"return":{}}
> 
> 
> I can also send freeze and thaw successfully:
> 
> ~# virsh qemu-agent-command instance-0000248e
> '{"execute":"guest-fsfreeze-freeze"}'
> {"return":1}
> 
> ~# virsh qemu-agent-command instance-0000248e
> '{"execute":"guest-fsfreeze-thaw"}'
> {"return":1}
> 
> Sending a simple write (echo "bla" > blub.file) in the "frozen" state
> will be blocked until "thaw" as expected.
> 
> Best regards
> 
> 
> Ralf T.