- Notifications
You must be signed in to change notification settings - Fork2.3k
Comments
fix: Call KVM_KVMCLOCK_CTRL not only after pause but also before snapshot resume#5494
Conversation
codecovbot commentedOct 28, 2025 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@## main #5494 +/- ##======================================= Coverage 82.84% 82.84% ======================================= Files 269 269 Lines 27737 27737 ======================================= Hits 22978 22978 Misses 4759 4759
Flags with carried forward coverage won't be shown.Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
301f879 toff48e0eCompareff48e0e to3fc6bf3CompareUh oh!
There was an error while loading.Please reload this page.
64f27a2 to0dfdc66CompareManciukic commentedOct 28, 2025
To double check what's the impact on the resume path, I've kicked off a BK A/B build:https://buildkite.com/firecracker/performance-a-b-tests/builds/736 |
zulinx86 commentedOct 28, 2025
thanks! |
Uh oh!
There was an error while loading.Please reload this page.
07cf216 toab0f55eCompareKVM_KVMCLOCK_CTRL ioctl sets `pvclock_set_guest_stopped_request` flag of`kvm_vcpu_arch` [1]. On the next guest time update, if the flag is set,KVM ORs in `PVCLOCK_GUEST_STOPPED` and `kvm_setup_guest_pvclock()`pushes the `hv_clock` into the guest's pvclock page [2]. If the`hv_clock` has not been written to the guest's pvclock page when takinga snapshot, it is not saved in the snapshot memory (i.e.`PVCLOCK_GUEST_STOPPED` isn't set in resumed VMs). So we should callKVM_KVMCLOCK_CTRL ioctl before resuming a VM in addition to afterpausing a VM.[1]:https://elixir.bootlin.com/linux/v6.16.3/source/arch/x86/kvm/x86.c#L5734[2]:https://elixir.bootlin.com/linux/v6.16.3/source/arch/x86/kvm/x86.c#L3286-L3295Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
ab0f55e to0894dbcCompared33011c intofirecracker-microvm:mainUh oh!
There was an error while loading.Please reload this page.
Also includes this patch:firecracker-microvm/firecracker#5494I'm interested in these bug fixes since v1.11.0 (which we were onpreviously):firecracker-microvm/firecracker#5122firecracker-microvm/firecracker#5260We should also look into enabling PCI. From their release notes:> In our micro-benchmarks, we measured up to 50% better latency forblock and network, up to 70% better block throughput, and up to 25%higher network throughput (results depend on instance type and kernel).Also I had previously cherry-picked a firecracker patch that fixed someof the rcu_sched failures. It was formally merged into the firecrackerrepo[here](firecracker-microvm/firecracker#5494) andthis final version has some implementation differences from the patchwe'd applied. I can't directly apply the updated patch on v1.11.0 due tosome merge conflicts, but it applies cleanly on top of v1.13.0
Uh oh!
There was an error while loading.Please reload this page.
Fixes#5322.
Changes
Reason
KVM_KVMCLOCK_CTRL ioctl sets
pvclock_set_guest_stopped_requestflag ofkvm_vcpu_arch1. On the next guest time update, if the flag is set, KVM ORs inPVCLOCK_GUEST_STOPPEDandkvm_setup_guest_pvclock()pushes thehv_clockinto the guest's pvclock page2. If thehv_clockhas not been written to the guest's pvclock page when taking a snapshot, it is not saved in the snapshot memory (i.e.PVCLOCK_GUEST_STOPPEDisn't set in resumed VMs). So we should call KVM_KVMCLOCK_CTRL ioctl before resuming a VM rather than after pausing a VM. That covers both the pause-and-resume case and the restore-and-resume case.License Acceptance
By submitting this pull request, I confirm that my contribution is made under
the terms of the Apache 2.0 license. For more information on following Developer
Certificate of Origin and signing off your commits, please check
CONTRIBUTING.md.PR Checklist
tools/devtool checkbuild --allto verify that the PR passesbuild checks on all supported architectures.
tools/devtool checkstyleto verify that the PR passes theautomated style checks.
how they are solving the problem in a clear and encompassing way.
[ ] I have updated any relevant documentation (both in code and in the docs)in the PR.
CHANGELOG.md.[ ] When making API changes, I have followed theRunbook for Firecracker API changes.
integration tests.
[ ] I have linked an issue to every newTODO.rust-vmm.