Had a bit of a scare today -- my blog disappeared! Seems to be back now, so either it was a glitch or someone thought I was spamming. Gotta be a little less overzealous about linking, I guess -- consider me chastised!
Edit: just read an article spelling out the problem, which is
worth a look if you have a minute.
So I’m still running a Linux box to run a legacy business app that’s about to be replaced, and runs a few legacy VPNs. Setup ages ago, when i didn’t have the experience i have today, the setup on the machine was a mess – originally installed using testing of what was-to-be Debian 3.1 with several custom packages (Postfix, Apache, OpenVPN, etc.), this has been overdue for some fixup work for quite some time.
As a disclaimer, i realize that Debian in any version isn’t a supported OS on Hyper-V R2 – i just want to tell of my experiences with this unsupported configuration.
The hardware, an aging IBM xSeries 306m with a Pentium 4 CPU wasn’t getting any younger and after a drive failure about half a year ago that lead to a system crash (No data loss though – it just crashed the machine, that’s Software RAID for you), it was finally time to modernize this.
The plan is to consolidate all our DMZ workloads (ISA, OCS Edge, XMPP Gateway, Exchange Edge) on Hyper-V 2008 R2 and doing the trickiest part first seemed like a good idea.
So i created a new VM using SCVMM 2008 R2, selected Other Linux 32bit as the guest OS, inserted a Debian 5.0 netboot CD and that’s where the problems already started. While the installation worked well in general, the Framebuffer used by the Debian installed is awfully slow. So it took me about half an hour just to get the install done (on a 5GB partition of the 80GB VHD).
After finishing the installation, i formatted the rest of the disk appropiately and then used rsync to transfer the machine contents over. A short bit after reconfiguring Grub, i could choose to boot either the transferred OS with it’s kernel, or the Debian 5 rescue system i installed alongside.
Booting the transferred system worked well enough, but the tulip driver wasn’t compiled into that (custom) kernel and building the module failed. So i read up a bit, and realized that the newest kernel (2.6.32.8) shipped with experimental Hyper-V VMbus drivers, that allowed synthetic NICs to be used.
I tried to compile the kernel after chrooting into the old installation, but it failed because gcc was too old. Not to worry, i compiled it in the rescue system, but couldn’t install the dpkg that make-kpkg created. So i installed it manually, which worked pretty well.
One reboot later, i was back in business with the extremely verbose Hyper-V drivers cluttering up dmesg, but the Synthetic NICs showed up as seth0 – seth2. After quickly changing all the necessary configuration files, everything was working.
After a bit of more testing, i disconnected the physical machine from the network and plugged the VM into the production VLANs.
I tested everything thoroughly and didn’t find any issues. Sent out an information mail and continued on my merry way.
Half an hour later, i decided to do a quick systems check again – and i realized that the external interface (seth2 in this case) wasn’t working anymore. tcpdump showed no packets being received and other machines in the same VLANs didn’t see any answers to their ARP requests either. So i rebooted the VM, and everything was working again. No error messages of any kind, neither in dmesg nor in the system logs or on the Hyper-V host.
Hoping this was just a fluke, i waited until it happened again – which it did, roughly 10 minutes later. So i decided to skip on the synthetic devices and go with emulated NICs and the tulip driver.
Everything came back up, but i couldn’t ping any devices on the eth0 VLAN from the start, but the other two interfaces worked.
After a few more tries, i arrived at a configuration that has now been stable for 4 hours and 26 minutes, which sounds good so far. For this, i configured a single synthetic NIC that i used as a replacement for the non-working eth0 and three tulip NICs (of which the first was unused).
There are other things that also worry me:
Every reboot of the Linux machine created the following event log entry on the Hyper-V host:
'LINUX' was reset because an unrecoverable error occurred on a virtual processor that caused a triple fault. If the problem persists, contact Product Support. (Virtual machine ID [])
Loading the synthetic NIC drivers logs the following in the event log on the Hyper-V host:
Networking driver on 'LINUX' loaded but has a different version from the server. Server version 3.2 Client version 0.2 (Virtual machine ID []). The device will work, but this is an unsupported configuration. This means that technical support will not be provided until this problem is resolved. To fix this problem, upgrade the integration services. To upgrade, connect to the virtual machine and select Insert Integration Services Setup Disk from the Action menu.
Loading the synthetic NIC drivers also logs all this on the Linux side of things:
VMBUS_DRV: Vmbus initializing.... current log level 0x1f1f0006 (1f1f,6)
VMBUS: +++++++ Build Date=Feb 17 2010 12:37:00 +++++++
VMBUS: +++++++ Build Description=Version 2.0 +++++++
VMBUS: +++++++ Vmbus supported version = 13 +++++++
VMBUS: +++++++ Vmbus using SINT 2 +++++++
VMBUS: Windows hypervisor detected! Retrieving more info...
VMBUS: Vendor ID: Microsoft Hv
VMBUS: Interface ID: Hv#1
VMBUS: OS Build:7600-6.1-16-0.16485
VMBUS: Hypercall page VA=f80c9000, PA=0x36afe000
VMBUS_DRV: irq 0x5 vector 0x35
VMBUS: SynIC version: 1
VMBUS: Vmbus connected!!
VMBUS_DRV: generating uevent - VMBUS_DEVICE_CLASS_GUID={c5295816-f63a-4d5f-8d1a4daf999ca185}
VMBUS: Channel offer notification - child relid 1 monitor id 0 allocated 1, type {32412632-86cb-44a2-9b5c50d1417354f5} instance {00000000-0000-8899-0000000000000000}
hv_netvsc: module is from the staging directory, the quality is unknown, you have been warned.
NETVSC_DRV: Netvsc initializing....
VMBUS_DRV: child driver (f80dc570) registering - name netvsc
VMBUS: Channel offer notification - child relid 2 monitor id 255 allocated 0, type {cfa8b69e-5b4a-4cc0-b98b8ba1a1f3f95a} instance {58f75a6d-d949-4320-99e1a2a2576d581c}
VMBUS_DRV: generating uevent - VMBUS_DEVICE_CLASS_GUID={32412632-86cb-44a2-9b5c50d1417354f5}
VMBUS_DRV: child device (f73a8634) registered
VMBUS: Channel offer notification - child relid 9 monitor id 1 allocated 1, type {f8615163-df3e-46c5-913ff2d2f965ed0e} instance {9d44a66e-4b09-41d5-80d807ae24bf537d}
VMBUS_DRV: generating uevent - VMBUS_DEVICE_CLASS_GUID={cfa8b69e-5b4a-4cc0-b98b8ba1a1f3f95a}
VMBUS_DRV: child device (f73a5a34) registered
VMBUS: Channel offer notification - child relid 1 monitor id 0 allocated 1, type {32412632-86cb-44a2-9b5c50d1417354f5} instance {00000000-0000-8899-0000000000000000}
VMBUS_DRV: generating uevent - VMBUS_DEVICE_CLASS_GUID={f8615163-df3e-46c5-913ff2d2f965ed0e}
VMBUS_DRV: device object (f73a5ee4) set to driver object (f80dc5c0)
VMBUS: Channel offer notification - child relid 2 monitor id 255 allocated 0, type {cfa8b69e-5b4a-4cc0-b98b8ba1a1f3f95a} instance {58f75a6d-d949-4320-99e1a2a2576d581c}
VMBUS: Channel offer notification - child relid 9 monitor id 1 allocated 1, type {f8615163-df3e-46c5-913ff2d2f965ed0e} instance {9d44a66e-4b09-41d5-80d807ae24bf537d}
VMBUS: channel f73aac00 open success!!
NETVSC: *** NetVSC channel opened successfully! ***
NETVSC: Sending NvspMessageTypeInit...
NETVSC: NvspMessageTypeInit status(1) max mdl chain (34)
NETVSC: Sending NvspMessage1TypeSendNdisVersion...
NETVSC: Establishing receive buffer's GPADL...
NETVSC: Sending NvspMessage1TypeSendReceiveBuffer...
NETVSC: Receive sections info (count 1, offset 0, endoffset 1048000, suballoc size 1600, num suballocs 655)
NETVSC: Establishing send buffer's GPADL...
NETVSC: Sending NvspMessage1TypeSendSendBuffer...
NETVSC: *** NetVSC channel handshake result - 0 ***
NETVSC: Device 0xf6552e80 mac addr 00155d031a09
NETVSC: Device 0xf6552e80 link state up
VMBUS_DRV: child device (f73a5e34) registered
So, it works. But not without troubles. I’ve still got the physical machine to fall back on, but i sure hope Microsoft will get this to work better.
These issues are the reason why i decided to deploy my private server using ESXi instead of Hyper-V – because i need both Linux and Windows guests.