Movatterモバイル変換


[0]ホーム

URL:


Skip to content
DEV Community
Log in Create account

DEV Community

Michal Nowak
Michal Nowak

Posted on

     

Use raw ZFS volume for VirtualBox guest

If you’ve ever run virtual guests on platforms like KVM, Xen, Hyper-V, VMware, or VirtualBox, you probably think of disk attached to the guest as an image file of a special format (qcow2, VHDX, VMDK, VDI), which lies somewhere on the disk of the host file system. Lesser known is that most of these platforms support other storage options, namely network file systems (NFS, SMB), logical volumes (partitions), disks, and, most notably here, ZFS volume.

File-based arrangement certainly has value for some users. From my experience it is the most convenient arrangement for guests you want to rapidly create and destroy, or when you don’t care much about anything else than plain storage.

But it has drawbacks too. Here’s one of them. One relies on virtualization-specific designs which do not translate well from one platform to another. As someone who tried to convert image from qcow2 to VMDK or VHDX, or tried to create an instantJeOS-like VM image for VMware with open source tools, I witnessed, that in the former case one may end up with fixed-size image and, in the latter case, with image booting only from slow virtual IDE controller. Some of those designs may havefatal flaws.

Even if you think you will stick to one platform and you are not interested in platform interoperability, you may want to use your operating system-native tools with your guests. InOpenIndiana, anillumosdistribution, our system components likeIPS (package manager),Zones (OS-level virtualization/"container"), andKVM (HW virtualization) are well integrated withZFS, the advanced file system and volume manager, and can leverage its features like snapshots and encryption.

Thanks to this level of integration on OpenIndiana we can run KVM guests in a Zone and thus make it well isolated from other tenants on the host, control its resources, and more.

Interestingly, VirtualBox canleverage raw device as a storage for guest. For our purposes ZFS volume will be an ideal device. First, I will show you how to create a VirtualBox guest running off a ZFS volume, then we will use ZFS snapshotting feature to save state of the guest, later on we will send the guest to another ZFS pool, and finally we will run the guest from an encrypted ZFS volume.

If you want to try this on your systems, here are the prerequisites:Host with ZFS file system. I use OpenIndiana 2019.04updated to the latest packages to get ZFS encryption, but other Unices may do as well, but I haven't tried. I have two ZFS pools present:rpool is the default one,tank is one I use for some of the examples. VirtualBox withOracle VM VirtualBox Extension Pack needs to be installed and working (I used version 6.0.8). You also need another system with graphical interface, where you can display guest's graphical console, unless you display on yourHost.

Legend:$ introduces user command,# introduces root command.Guest/Host/Workstation denotes environment where the command is being executed.

I advise caution as VirtualBox guests write to raw devices and using wrong volume may result in data loss.

Create guest with ZFS volume

First, as root createrpool/vboxzones/myvbox volume of 10 Gb:

Host # zfs create rpool/vboxzonesHost # zfs create -V 10G rpool/vboxzones/myvbox

Then again as root create VirtualBox guest with SATA controller, reserve port 1 for the disk:

Host # VBoxManage createvm --name myvbox --registerHost # VBoxManage modifyvm myvbox --boot1 disk --boot2 dvd --cpus 1 \--memory 2048 --vrde on --vram 4 --usb on --nic1 nat --nictype1 82540EM \--cableconnected1 on --ioapic on --apic on --x2apic off --acpi on \--biosapic=apicHost # VBoxManage storagectl myvbox --name SATA --add sata \--controller intelahci --bootable onHost # VBoxManage storageattach myvbox --storagectl SATA --port 0 \--device 0 --type dvddrive --medium OI-hipster-text-20190511.isoHost # VBoxManage internalcommands createrawvmdk \-filename /rpool/vboxzones/myvbox/myvbox.vmdk \-rawdisk /dev/zvol/rdsk/rpool/vboxzones/myvbox

The last command creates a pass-through VMDK image which gets attached to the guest, but in reality,nothing gets written to this file and/dev/zvol/rdsk/rpool/vboxzones/myvbox volume is used directly instead. Attach the image to the guest:

Host # VBoxManage storageattach myvbox --storagectl SATA --port 1 \--device 0 --type hdd --medium /rpool/vboxzones/myvbox/myvbox.vmdk \--mtype writethrough

Start the guest, connect to it via RDP client (rdekstop and FreeRDP on Unix):

Host # VBoxManage startvm myvboxdisk --type headlessWorkstation $ xfreerdp /v:$(Host_IP)

If you see the guest’s bootloader you can proceed andinstall the guest from ISO image, in the "Disks" screen of the installer, you’ll see your disk:

alt_text

Create a snapshot of the disk

Record the approximate time of the snapshot by creating a file with a timestamp in it:

Guest $ date > date

Now with this arrangement, we can create a snapshot of the volume:

Host # zfs snapshot rpool/vboxzones/myvbox@first

Wait some time and rewrite the timestamp:

Guest $ date > date

Power off the guest and rollback to thefirst snapshot with the-r option so that ZFS lets us rollback to any, even older, snapshot:

Guest # poweroffHost # zfs rollback -r rpool/vboxzones/myvbox@first

Start the guest and check that the timestamp is set to the one from the first snapshot:

Guest $ cat date

Hooray! We demonstrated that VirtualBox guest works with ZFS snapshots.

Send guest disk to another ZFS pool

Now lets see if we can send the guest disk to another ZFS pool and run it from there.

Guest # poweroffHost # zpool listNAME    SIZE  ALLOC   FREE  CKPOINT  EXPANDSZ   FRAG    CAP  DEDUP  HEALTH  ALTROOTrpool   118G  40.2G  77.8G      -       -   18% 34%  1.00x  ONLINE  -tank    232G  22.2G   210G      -       -   1%  9%   1.00x  ONLINE  -Host # zfs send rpool/vboxzones/myvbox | zfs receive -v tank/vboxzones/myvboxreceiving full stream of rpool/vboxzones/myvbox@--head-- into tank/vboxzones/myvbox@--head--received 9.34GB stream in 153 seconds (62.5MB/sec)

Now, “rewire” the pass-through disk so it points to the new location:

Host # VBoxManage storageattach myvbox --storagectl SATA --port 1 \--device 0 --type hdd --medium emptydriveHost # VBoxManage closemedium disk /rpool/vboxzones/myvbox/myvbox.vmdk \--delete0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%Host # VBoxManage internalcommands createrawvmdk \-filename /rpool/vboxzones/myvbox/myvbox.vmdk \-rawdisk /dev/zvol/rdsk/tank/vboxzones/myvboxRAW host disk access VMDK file /rpool/vboxzones/myvbox/myvbox.vmdk created successfully.Host # VBoxManage storageattach myvbox --storagectl SATA --port 1 \--device 0 --type hdd --medium /rpool/vboxzones/myvbox/myvbox.vmdk \--mtype writethroughHost # VBoxManage startvm myvbox --type headlessWaiting for VM "myvbox" to power on…VM "myvbox" has been successfully started.

On boot the guest will notice that disk’s device ID does not match the recorded one but corrects it to what we actually have and won't warn on the next boot:

alt_text

So, we proved that the guests disk can be send to another ZFS pool and successfully started from there.

Start guest from an encrypted volume

Now, lets see if the guest starts from an encrypted ZFS volume.

First, create an encrypted file systemtank/vboxzones_encrypted with a passphrase entered on the prompt:

Host # zfs create -o encryption=on -o keyformat=passphrase \-o keylocation=prompt tank/vboxzones_encryptedEnter passphrase:Re-enter passphrase:

Then send stream to the new encrypted dataset:

Host # zfs send rpool/vboxzones/myvbox | zfs receive -v tank/vboxzones_encrypted/myvboxreceiving full stream of rpool/vboxzones/myvbox@--head-- into tank/vboxzones_encrypted/myvbox@--head--received 9.34GB stream in 166 seconds (57.6MB/sec)

And recreate the guest pass-through disk to point to the encrypted dataset:

Host # VBoxManage storageattach myvbox --storagectl SATA --port 1 \--device 0 --type hdd --medium emptydriveHost # VBoxManage closemedium disk /rpool/vboxzones/myvbox/myvbox.vmdk \--delete0%...10%...20%...30%...40%...50%...60%...70%...80%...90%...100%Host # rm /rpool/vboxzones/myvbox/myvbox.vmdkHost # VBoxManage internalcommands createrawvmdk \-filename /rpool/vboxzones/myvbox/myvbox.vmdk \-rawdisk /dev/zvol/rdsk/tank/vboxzones_encrypted/myvboxRAW host disk access VMDK file /rpool/vboxzones/myvbox/myvbox.vmdk created successfully.Host # VBoxManage storageattach myvbox --storagectl SATA --port 1 \--device 0 --type hdd --medium /rpool/vboxzones/myvbox/myvbox.vmdk \--mtype writethroughHost # VBoxManage startvm myvbox --type headlessWaiting for VM "myvbox" to power on…VM "myvbox" has been successfully started.

To prove that we really run from an encrypted volume, lets unload the encryption key from memory manually (or restarting theHost) and start the guest again:

Host # zfs unmount /tank/vboxzones_encryptedHost # zfs unload-key tank/vboxzones_encryptedHost # VBoxManage startvm myvbox --type headlessWaiting for VM "myvbox" to power on…VBoxManage: error: Could not open the medium '/rpool/vboxzones/myvbox/myvbox.vmdk'.VBoxManage: error: VD: error VERR_ACCESS_DENIED opening image file '/rpool/vboxzones/myvbox/myvbox.vmdk' (VERR_ACCESS_DENIED)VBoxManage: error: Details: code NS_ERROR_FAILURE (0x80004005), component MediumWrap, interface IMedium

As expected it failed because the encrypted volume is unreadable for VirtualBox. Lets decrypt it and start the guest again:

Host # zfs load-key tank/vboxzones_encryptedEnter passphrase for 'tank/vboxzones_encrypted':Host # VBoxManage startvm myvbox --type headlessWaiting for VM "myvbox" to power on…VM "myvbox" has been successfully started.

After decryption the guest starts correctly.


In these three examples we

  1. created a snapshot and then rolled back to it,
  2. sent guest disk as a ZFS stream to another ZFS pool, and
  3. started guest off an encrypted dataset.Thus we demonstrated that we can leverage OS-level file system features in VirtualBox guests thanks to fairly unknown VirtualBox ability to use plain storage devices.

Concerns and Future directions

Unfortunately not everything works ideally. Whatever I try disk IO of the raw ZFS volume is significantly lower than IO performance with the VDI disk (or the Global Zone, but I treat it here mostly as a base-line for comparison).

I have run “seqwr” and “seqrd” test modes from thefileio test of the sysbench 0.4.1 benchmark on an oldSamsung 840 PRO SSD (MLC) disk which seems to show some wear:

Operation / Storage typeGlobal Zone (host)VDI disk (guest)ZFS dataset (guest)
Sequential Write (2Gb file)36.291 Mb/sec54.302 Mb/sec11.847 Mb/sec
Sequential Read (2Gb file)195.63 Mb/sec565.63 Mb/sec155.21 Mb/sec

There are two things of concern:

  1. ZFS volume is three times slower on sequential write, and ~20 % slower on sequential read then the Global Zone;
  2. the VDI disk performance can actually be higher (sic) than in Global Zone.

The first takeaway may mean that unneededfsync() occurs on some layer and the second takeaway may mean that something fishy is happening with synchronizing files to the underlying file system on the hypervisor side. Perhapsfsync() being ignored to some degree? (In any case these numbers are to be taken with a grain of salt and are good only to show the contrast between various storage types and their potential limitations.)

To summarize, it turns out that the level of integration we see with KVM on various illumos distributions can be met with VirtualBox as we seems to have necessary building blocks. But it needs to be investigated what's the culprit on the IO performance side and how, Zones can be added to the mix so that additional tenant isolation is achieved (especially because VirtualBox is run as root because of the use of ZFS volumes).

Acknowledgement: Johannes Schlüter’sZFS and VirtualBox blog post inspired me to write this article.

Top comments(5)

Subscribe
pic
Create template

Templates let you quickly answer FAQs or store snippets for re-use.

Dismiss
CollapseExpand
 
mgerdts profile image
Mike Gerdts
I've been known to write a little code for SmartOS, illumos, ZFS, etc.
  • Joined

Things to explore as to why it could be slower on zfs:

  • Depending on how vbox is doing the writes, this could be sync writes for every write that is happening in the guest. It would be useful to measure the number of iops seen by the host in each of your tests.
  • If sync writes are happening such that they don't align with the dataset's block size (recordsize for filesystem,volblocksize for volume) you could be seeing write inflation. In the host, look for a lot of reads during a write workload and look to see if the amount of data written by the host is significantly more than that written to the guest.
  • If you are doing writes that are at least 32k (zfs_immediate_write_sz) but vbox is chopping them up into smaller writes, the data may be written to the zil (zil exists even if log devices don't) and again to its permanent home.
  • I believe that the Samsung 840 Pro advertises a physical sector size of 512b, which is almost certainly not true. It is likely 4k or 8k. The zpool will have been created with ashift=9, and all allocations will fall on a 512 byte boundary. If that 512 byte boundary just happens to also be a boundary that is the real sector size (4k or 8k), you will avoid the drive having to do a read-modify-write cycle when writing multiples of the drive's actual physical sector size. If the real sector size is 4k, a 512 byte allocation only has a 12.5% chance of landing on the right boundary. Repeat your benchmarks many times, looking for wild variations.
  • ZFS is doing more work than it would be without ZFS. The protections offered by copy-on-write, checksumming, etc. are quite cheap compared to the cost of operations on hard disks. The cost becomes more noticeable on faster disks. Normally I think of these costs being noticeable on NVMe and not so much on an earlier generation SATA SSD.
  • If the tests are with encryption enabled, I'd be interested to see what it's like without encryption.
CollapseExpand
 
mnohime profile image
Michal Nowak
Engineer. Working on OpenIndiana (an illumos distribution).
  • Location
    Prague, CZ, EU
  • Education
    Bc from technical university
  • Work
    Quality Engineer
  • Joined

Thanks Mike! Very much appreciate your comment. I will go thru those things and see what difference they make, just started withashift, which was indeed set to9 (512B) but should be to13 (8K).

CollapseExpand
 
nobozo profile image
Jon Forrest
  • Joined

Minor typo:

"control it's resources, and more." ->
"control its resources, and more."

CollapseExpand
 
mnohime profile image
Michal Nowak
Engineer. Working on OpenIndiana (an illumos distribution).
  • Location
    Prague, CZ, EU
  • Education
    Bc from technical university
  • Work
    Quality Engineer
  • Joined

Thanks, Jon. Typo fixed.

CollapseExpand
 
osde8info profile image
Clive Da
open source devops & open source webdev & open source sysadmin
  • Location
    LGW
  • Work
    open source advocate
  • Joined

a brilliant use of virtualisation ! using it to experiment with new file systems WITHOUT messing up your host or native env

Are you sure you want to hide this comment? It will become hidden in your post, but will still be visible via the comment'spermalink.

For further actions, you may consider blocking this person and/orreporting abuse

Engineer. Working on OpenIndiana (an illumos distribution).
  • Location
    Prague, CZ, EU
  • Education
    Bc from technical university
  • Work
    Quality Engineer
  • Joined

Trending onDEV CommunityHot

DEV Community

We're a place where coders share, stay up-to-date and grow their careers.

Log in Create account

[8]ページ先頭

©2009-2025 Movatter.jp