#xmlist NameIDMemVCPUsStateTime(s) Domain-0010248r-----76770.8 caliban722561-b----4768.3 Here we"re going to demonstrate connectivity between the domain caliban caliban (IP address 192.0.2.86) and the dom0 (at 192.0.2.67). (IP address 192.0.2.86) and the dom0 (at 192.0.2.67).
#arping192.0.2.67 ARPING192.0.2.67from192.168.42.86eth0 Unicastreplyfrom192.0.2.67[00:12:3F:AC:3D:BD]0.752ms Unicastreplyfrom192.0.2.67[00:12:3F:AC:3D:BD]0.671ms Unicastreplyfrom192.0.2.67[00:12:3F:AC:3D:BD]2.561ms Note that the dom0 replies with its MAC address when queried via ARP.
#tcpdump-ivif72.0 tcpdump:WARNING:vif72.0:noIPv4addressa.s.signed tcpdump:verboseoutputsuppressed,use-vor-vvforfullprotocoldecode listeningonvif1.0,link-typeEN10MB(Ethernet),capturesize96bytes 18:59:33.704649arpwho-hascaliban(00:12:3f:ac:3d:bd(ouiUnknown))tell 192.168.42.86 18:59:33.707406arpreplycalibanis-at00:12:3f:ac:3d:bd(ouiUnknown) 18:59:34.714986arpwho-hascaliban(00:12:3f:ac:3d:bd(ouiUnknown))tell 192.168.42.86 The ARP queries show up correctly in the dom0.
Now, most of the time, you will see appropriate output in tcpdump tcpdump as shown. This tells you that Xen is moving packets from the domU to the dom0. Do you see a response to the ARP who-has? (It should be ARP is-at.) If not, it"s possible your bridge in the dom0 isn"t set up correctly. One easy way to check the bridge is to run as shown. This tells you that Xen is moving packets from the domU to the dom0. Do you see a response to the ARP who-has? (It should be ARP is-at.) If not, it"s possible your bridge in the dom0 isn"t set up correctly. One easy way to check the bridge is to run brctl show brctl show: #brctlshow bridgenamebridgeidSTPenabledinterfaces eth08000.00304867164cnocaliban prospero arielNoteIn Xen.org versions before Xen 3.2, the bridge name is, by default, xenbr0 xenbr0 for for network-bridge network-bridge. Xen 3.2 and later, however, named the bridge eth0 (0, in this case, is the number of the related network interface). RHEL/CentOS, by default, creates another bridge Xen 3.2 and later, however, named the bridge eth0 (0, in this case, is the number of the related network interface). RHEL/CentOS, by default, creates another bridge, virbr0 virbr0, which is part of the libvirt stuff. In practical terms, it functions like which is part of the libvirt stuff. In practical terms, it functions like network-nat network-nat, with a DHCP server handing out private addresses on the dom0 with a DHCP server handing out private addresses on the dom0.
Now, for troubleshooting purposes, a bridge is like a switch. Make sure the bridge (switch) your domU interface is connected to is also connected to an interface that touches the network you want the domU on, usually a pethX pethX device. (As explained in device. (As explained in Chapter5 Chapter5, network-bridge network-bridge renames renames ethX ethX to to pethX pethX and creates a fake and creates a fake ethX ethX device from device from vif0.x vif0.x when it starts up.) when it starts up.) Check the easy stuff. Can anything else on the bridge see traffic from the outside world? Do tcpdump -n -i peth0 tcpdump -n -i peth0. Are the packets flowing properly?
Check your routes. Don"t forget higher-level stuff, like DNS servers.
The DomU Interface Number Increments with Every Reboot When Xen creates a domain, it looks at the vif=[] vif=[] statement. Each string within the statement. Each string within the [ ] [ ] characters (it"s a Python array) is another network device. If I just say characters (it"s a Python array) is another network device. If I just say vif=["",""] vif=["",""] it creates two network devices for me, with random MAC addresses. In the domU, they are (ideally) named it creates two network devices for me, with random MAC addresses. In the domU, they are (ideally) named eth0 eth0 and and eth1 eth1. In the dom0, they are named vifX.0 vifX.0 and and vifX.1 vifX.1, where X X is the domain number. is the domain number.
Most modern Linux distros, by default, lock ethX ethX to a particular MAC address on the first boot. In RHEL/CentOS, the setting is to a particular MAC address on the first boot. In RHEL/CentOS, the setting is HWADDR= HWADDR= in in /etc/sysconfig/network-scripts/ifcfg-ethX /etc/sysconfig/network-scripts/ifcfg-ethX. Most other distros use udev udev to handle persistent MAC addresses, as described in to handle persistent MAC addresses, as described in Chapter5 Chapter5. We circ.u.mvent the problem by specifying the MAC address on the vif= vif= line in the line in the xm config xm config file: file: vif=["mac=00:16:3E:AA:AA:AB","mac=00:16:3E:AA:AA:AC"]
Here we"re using the XenSource MAC prefix, 00:16:3E 00:16:3E. If you start your MAC with that prefix, you know it won"t conflict with any a.s.signed hardware MAC addresses.
If you don"t specify the MAC address, it"ll be randomly generated every time the domU boots, which causes some inconvenience if your domU OS has locked down ethX ethX to a particular MAC. For more on the possible effects and why it"s a good idea to specify a MAC address, see to a particular MAC. For more on the possible effects and why it"s a good idea to specify a MAC address, see Chapter5 Chapter5.
iptables The iptables iptables rules can also be a source of trouble with Xen. As with any rules can also be a source of trouble with Xen. As with any iptables iptables setup, it"s easy to mess up in subtle ways and break everything. The best way we"ve found to make sure that setup, it"s easy to mess up in subtle ways and break everything. The best way we"ve found to make sure that iptables iptables rules are working is to send packets through and watch what happens to them. Run rules are working is to send packets through and watch what happens to them. Run iptables -L -v iptables -L -v to see counters for how many packets have hit each rule or have been affected by the chain policy. to see counters for how many packets have hit each rule or have been affected by the chain policy.
NoteThe interface counters for vifs that are examined from the dom0 end will be inverted; outgoing traffic will report as incoming, and vice versa. See Chapter5 Chapter5 for more information about why that happens for more information about why that happens.
You may also have trouble getting antispoof to work. If you enable antispoof but find you can still spoof arbitrary IP addresses in the domU, add the following to your network startup: echo1>/proc/sys/net/bridge/bridge-nf-call-iptables This will cause packets sent through the bridges to traverse the forward chain, where Xen puts the antispoof rules. We added the command to the end of /etc/xen/scripts/network-bridge /etc/xen/scripts/network-bridge.
Another problem can occur if you"re using vifnames, as we suggest in Chapter5 Chapter5. Make sure the names are short-eight characters or less. Longer names can get truncated, and different parts of the system truncate at different lengths (at least in CentOS 5.0). In our particular case, we saw problems where the actual vifnames were truncated at one length, and our firewall rules (for antispoof) were truncated at another length, blocking all packets from the domain in question. It is better to avoid the problem and keep the vifnames short.
Memory Issues Xen (or rather, the Linux driver domain) can act rather strangely when memory is running low. Because Xen and the dom0 require a certain amount of contiguous, unswappable memory, it"s surprisingly easy (in our experience) to find the oom-killer snacking on processes like candy. This even happens when there"s plenty of swap available.
The best solution we"ve found-and we freely admit that it"s not perfect-is to give dom0 more memory. We also prefer to fix its memory allocation at something like 512MB so that it doesn"t have to cope with Xen constantly adjusting its memory size.
The basic way of tuning dom0"s memory allocation is by adjusting the dom0_mem dom0_mem kernel parameter, which sets an upper limit, and the kernel parameter, which sets an upper limit, and the dom0-min-mem dom0-min-mem parameter in parameter in /etc/xen/xend-config.sxp /etc/xen/xend-config.sxp, which sets a lower limit. Again, we usually set both of these to the same value.
To set the maximum amount of memory available to the dom0, edit menu.lst menu.lst and put the option after the kernel line, like this: and put the option after the kernel line, like this: kernel/xen.gzdom0_mem=512Mnoreboot In the absence of units, Xen will a.s.sume that the value is in KB.
Next, edit /etc/xen/xend-config.sxp /etc/xen/xend-config.sxp and add a line that says: and add a line that says:[85]
(dom0-min-mem512) We do this because we"ve seen the dom0 have problems with ballooning. Ballooning usually works, but, like taking backups from a nonquiescent filesystem, usually works usually works is not good enough for something as important as the dom0. is not good enough for something as important as the dom0.
[85] Recent versions of Xen also support the option Recent versions of Xen also support the option (enable-dom0-ballooning no) (enable-dom0-ballooning no).
Other Messages xenconsole:Couldnotreadttyfromstore:Nosuchfileordirectory This message usually shows up in response to an attempt to connect to a domain"s virtual console (especially when Xen"s kernel doesn"t match its userland; for example, if we"ve upgraded Xen"s supporting tools without changing the hypervisor).
If this is a paravirtualized domain, first try killing and restarting the xenconsoled xenconsoled process. Make sure it dies. We have seen cases where process. Make sure it dies. We have seen cases where xenconsoled xenconsoled hangs and must be killed with a hangs and must be killed with a -9 -9.
#pkillxenconsoled&&/usr/sbin/xenconsoled Then reconnect with xm console xm console.
If the problem persists, you"re most likely trying to access a domain that doesn"t have the necessary Xen frontend console device configured in. There are several possibilities: If this is a custom kernel, you may have simply forgotten to include it, for example. Check the configuration of the domain"s kernel and the initrd for the xvc driver.
If you are accessing an HVM domain running a default (nonenlightened) kernel that doesn"t include the console driver, try using the framebuffer or booting a different kernel. You might also be able to set serial=pty serial=pty in the domain config file and set the domU OS to use com1 as the console. See in the domain config file and set the domU OS to use com1 as the console. See Chapter12 Chapter12 for details. for details.
VmError:(22,"Invalidargument")
This error can mean a number of things. Often the problem is a version mismatch between the tools and the running Xen hypervisor. Although the binaries installed in /usr/sbin /usr/sbin may be correct, the underlying Python modules may be wrong. Check that they"re correct using whatever evidence is available: dates, comments in the files themselves, output of may be correct, the underlying Python modules may be wrong. Check that they"re correct using whatever evidence is available: dates, comments in the files themselves, output of xm info xm info, and so on.
The error can also indicate a PAE mismatch. In this case xend-debug.log xend-debug.log will give a succinct description of the problem: will give a succinct description of the problem: #tail/var/log/xen/xend-debug.log ERROR:NonPAE-kernelonPAEhost.
ERROR:ErrorconstructingguestOS Incidentally, your dom0-which is, after all, just a special Xen guest domain-can also suffer from this problem. If it happens, the hypervisor will report a PAE mismatch in a large boxed-off error message at boot time and immediately reboot.
"noversionforstruct_modulefound:kerneltainted"
We got this error while trying to install the binary Xen distribution on a Slackware machine. The binary distro comes with a very minimal kernel, so it needs an initrd with appropriate modules. For some reason, the default script loaded modules in the wrong order, causing some loads to fail with the preceding message.
We fixed the problem by changing the load order in the initrd; specific directions would depend on your distro.
A Constant Stream of 4GiB seg fixup Messages Sometimes, on booting a newly installed i386 domain, you"ll be greeted with screens full of messages like this: 4gbsegfixup,processinit(pid1),cs:ip73:b7ec2fc5 These are related to the /lib/tls /lib/tls problem: Xen is complaining because it"s having to emulate a 4GiB segment for the benefit of some process that"s using negative offsets to access the stack. You may also see a giant message at boot, reminding you to address this issue. problem: Xen is complaining because it"s having to emulate a 4GiB segment for the benefit of some process that"s using negative offsets to access the stack. You may also see a giant message at boot, reminding you to address this issue.
To solve this problem, you want to use a glibc that does not do this. You can compile glibc with the -mno-tls-direct-seg-refs -mno-tls-direct-seg-refs option or install the appropriate libc6-xen package for your distribution (both Red Hatlike and Debian-like distros have created packages to address this problem). option or install the appropriate libc6-xen package for your distribution (both Red Hatlike and Debian-like distros have created packages to address this problem).
With Red Hat (and its derived distros), you can also run these commands: #echo"hwcap0nosegneg">/etc/ld.so.conf.d/libc6-xen.conf #ldconfig This will instruct the dynamic loader to avoid that particular optimization.
For Debian-based distros (using the 2.6.18 kernel), you can simply run: #apt-getinstalllibc6-xen If all else fails (or if you are just too lazy to find a version of gcc with no-tls-direct-seg-refs no-tls-direct-seg-refs), you can do as the error message advises and move the TLS library out of the way: #mv/lib/tls/lib/tls.disabled In our experience, there isn"t any problem with moving the library. Everything will continue to function as expected.
The Importance of Disk Drivers (initrd Problems) Often when using a distro kernel, a Xen domU will boot but be unable to locate its root device. For example: VFS:Cannotopenrootdevice"sda1"orunknown-block(0,0) Pleaseappendacorrect"root="bootoption Kernelpanic-notsyncing:VFS:Unabletomountrootfsonunknown-block(0,0) The underlying problem here-at least in this case-is that the domU kernel doesn"t have the necessary drivers compiled in, and the ramdisk was not specified. A look at the boot output confirms this, with the messages: XENBUS:Devicewithnodriver:device/vbd/769 XENBUS:Devicewithnodriver:device/vbd/770 XENBUS:Devicewithnodriver:device/vif/0 Nearly all distro kernels come with a minimal kernel and require an initrd with the disk driver to finish booting. These messages may simply come from the kernel before the initrd has loaded, or they can indicate a serious problem if the initrd doesn"t contain the necessary drivers.
If the kernel managed to load its initrd correctly and failed to switch to its real root, you"ll find yourself stuck in the initrd with a very limited selection of files. In this case, make sure that your devices exist (/dev/sda1 in this example) and that you"ve got the Xen disk frontend kernel module. in this example) and that you"ve got the Xen disk frontend kernel module.
We also commonly see this within PyGRUB domUs after a kernel upgrade (and new initrd) if the modules config (/etc/modules on Debian, on Debian, /etc/modprobe.conf /etc/modprobe.conf on Red Hat) didn"t specify on Red Hat) didn"t specify xenblk xenblk. For RHEL/CentOS domUs, you can solve this problem by running mkinitrd mkinitrd with the with the --preload xenblk --preload xenblk switch. switch.
If you use an external kernel and want to use a distro kernel, you must specify a ramdisk= ramdisk= line in the domain config file, and specify a ramdisk that includes the line in the domain config file, and specify a ramdisk that includes the xenblk xenblk (and (and xennet xennet, if you want network before boot) drivers.
Another solution to this problem would be to compile Xen from source and build a sufficiently generic domU kernel, with the xenblk xenblk and and xennet xennet drivers already compiled in. Even if you continue to boot the dom0 from the distro kernel (probably a good idea), this will sidestep the distro-specific issues found with both Red Hat and Debian kernels. drivers already compiled in. Even if you continue to boot the dom0 from the distro kernel (probably a good idea), this will sidestep the distro-specific issues found with both Red Hat and Debian kernels.
This may cause problems with some domU distros because the expected initrd won"t be there. Sometimes it can be difficult to build an initrd against a kernel with disk drivers built in. However, the generic kernel will usually at least boot.
We often find it useful to keep these generic kernels as a secondary rescue boot option within the domU PyGRUB config because they work no matter how badly the initrd is messed up.
XenStore Sometimes the XenStore gets corrupted, or xenstored xenstored dies, or for various other reasons the XenStore ceases to store and report information. For example, this may happen if the block device holding the XenStore database becomes full. dies, or for various other reasons the XenStore ceases to store and report information. For example, this may happen if the block device holding the XenStore database becomes full.
The most obvious symptom is that xm list xm list will report domain names incorrectly, for example: will report domain names incorrectly, for example: #xmlist NameIDMem(MiB)VCPUsStateTime(s) Domain-0025542r-----16511.2 Domain-10101271-b----1671.5 Domain-11112551-b----442.0 Domain-1414631-b----1758.2 Domain-1515621-b----7507.7 Domain-16161271-b----11194.9 Domain-66941-b----5454.2 Domain-77621-b----270.8 Domain-991271-b----1715.7 Obviously, this is problematic. For one thing, it means that all commands that can take a name or ID, such as xm console xm console, will no longer recognize names.
Unfortunately, xenstored xenstored cannot be restarted, so you"ll have to reboot. If you"re running a version of Xen prior to 3.1 (including the RHEL 5.x version), you"ll have to remove cannot be restarted, so you"ll have to reboot. If you"re running a version of Xen prior to 3.1 (including the RHEL 5.x version), you"ll have to remove /var/lib/xenstored/tdb /var/lib/xenstored/tdb first, then reboot. first, then reboot.
Xen"s Logs These error messages make a good start for Xen troubleshooting, but sometimes they"re not helpful enough to solve the problem. In these cases, we need to dig deeper.
dmesg and xm dmesg Although the output of xm dmesg xm dmesg isn"t a log in the usual sense of a log file, it"s an important source of diagnostic output. If you"ve got a problem whose source isn"t obvious from the error message, begin by looking at the Xen kernel message buffer. As you probably know, the Linux isn"t a log in the usual sense of a log file, it"s an important source of diagnostic output. If you"ve got a problem whose source isn"t obvious from the error message, begin by looking at the Xen kernel message buffer. As you probably know, the Linux dmesg dmesg command prints out the Linux kernel"s message buffer, which ordinarily contains all kernel messages since the system"s last boot (or, if the system"s been up for a while, it displays a succession of boring status messages). command prints out the Linux kernel"s message buffer, which ordinarily contains all kernel messages since the system"s last boot (or, if the system"s been up for a while, it displays a succession of boring status messages).
Because Xen could be said to act as a kernel in its own right, it includes an equivalent tool, xm dmesg xm dmesg, to print out messages from the hypervisor boot (the lines that begin with (XEN) (XEN) in the startup messages). For example: in the startup messages). For example: #xmdmesgtail-3
(XEN)(file=platform_hypercall.c,line=129)Domain0saysthatIO-APIC REGSELisgood (XEN)microcode:error!Baddatainmicrocodedatafile (XEN)microcode:Errorinthemicrocodedata In this case, the errors are harmless. The processor simply runs on its factory-installed microcode.
NoteLike the kernel, Xen retains only a fixed-size message buffer. Older messages go off into oblivion.
Logs and What Xen Writes to Them If xm dmesg xm dmesg isn"t enlightening, Xen"s next line of communication is its extensive logging. Let"s look at the various logs that Xen uses and what we can do with them. isn"t enlightening, Xen"s next line of communication is its extensive logging. Let"s look at the various logs that Xen uses and what we can do with them.
We can summarize Xen"s logs as follows, in rough order of importance: /var/log/xen/xend.log /var/log/xen/xend-debug.log /var/log/xen/xen-hotplug.log /var/log/syslog /var/log/debug Most of your Xen troubleshooting will involve the first two logs. xend.log xend.log is the main is the main xend xend log, as you might suppose. It records domain startups, shutdowns, device creation, debugging whatever, and occasionally includes giant incomprehensible Python dumps. It"s the first thing to check. log, as you might suppose. It records domain startups, shutdowns, device creation, debugging whatever, and occasionally includes giant incomprehensible Python dumps. It"s the first thing to check.
xend-debug.log has information relating to more experimental features of Xen, such as the framebuffer. It"ll also have verbose tracebacks when Xen runs into trouble. has information relating to more experimental features of Xen, such as the framebuffer. It"ll also have verbose tracebacks when Xen runs into trouble.
Because xend xend uses the syslog facility, messages from Xen also show up in the system-wide uses the syslog facility, messages from Xen also show up in the system-wide /var/log/syslog /var/log/syslog and and /var/log/debug /var/log/debug.
NoteWe hasten to add that syslog is almost humorously configurable. Even the term system-wide system-wide only applies to the default configuration; syslog can consolidate logs across multiple hosts, categorize messages into various channels, write to arbitrary files, and so on, but we"re going to a.s.sume that, if you"ve configured syslog, you can translate what we say about Xen"s use of it to apply to your configuration only applies to the default configuration; syslog can consolidate logs across multiple hosts, categorize messages into various channels, write to arbitrary files, and so on, but we"re going to a.s.sume that, if you"ve configured syslog, you can translate what we say about Xen"s use of it to apply to your configuration.
Finally, if you"re using HVM, qemu-dm qemu-dm will write its own logs. By and large, you can safely ignore these. In our experience, problems with HVM domains haven"t been the fault of QEMU"s device emulation. will write its own logs. By and large, you can safely ignore these. In our experience, problems with HVM domains haven"t been the fault of QEMU"s device emulation.
If the kernel messages prove to be unenlightening, it"s time to take a look at the log files. First, let"s configure Xen to ensure that they"re as round, firm, and fully packed as possible.
THE IMPORTANCE OF A DEBUG BUILDFor troubleshooting (and, in fact, general use) we recommend building Xen with all of its debugging options turned on. This makes the error messages more informative and plentiful, making it easier to figure out where problems are coming from and, with any luck, eliminate them.Although it might seem that copious debugging output would cause a perfor-mance hit, in our experience it"s negligible when running Xen normally. A debug build gives you the option of running Xen with excessive debugging output, but it performs about as well as a normal build when you"re not using that mode. If you find that the error messages are unhelpful, it might be a good idea to make sure that you have all the the debugging k.n.o.bs set to full. To enable full output for the hypervisor, add the options loglvl=all guest_loglvl=all loglvl=all guest_loglvl=all to your hypervisor command line (usually in to your hypervisor command line (usually in /boot/grub/menu.lst /boot/grub/menu.lst).See Chapter14 Chapter14 for more information on building Xen, including how to set the debugging options. for more information on building Xen, including how to set the debugging options.
Applying the Deb.u.g.g.e.r If even the maximum-verbosity logging isn"t enough, it"s time to attack the problem at the Python level, with the deb.u.g.g.e.r.
One investigation to try is to run the xend xend server in the foreground and watch its debug output. This will let you see somewhat more information than simply following the logs. server in the foreground and watch its debug output. This will let you see somewhat more information than simply following the logs.
With current versions of Xen, the debug functionality is included in the releases.[86] Enable the debug output with the following: Enable the debug output with the following: #exportXEND_DEBUG=1 #exportXEND_DAEMONIZE=0
#xendstart This will start xend xend in the foreground and tell it to print debug messages as it goes along. in the foreground and tell it to print debug messages as it goes along.
You can also get copious debugging information for the XenStore by setting XENSTORED_TRACE=1 XENSTORED_TRACE=1 somewhere where somewhere where xend xend"s environment will pick it up, perhaps at the top of /etc/init.d/xend /etc/init.d/xend or in root"s or in root"s .bashrc .bashrc.
Xen"s Backend Architecture: Making Sense of the Debug Information Of course, all this debugging output is more useful with some idea of how Xen is structured.
If you take a look at the actual xend xend executable, the first thing you"ll notice is that it"s really very short. There"s not much to it; all of the heavy lifting is done in external Python libraries, which live in executable, the first thing you"ll notice is that it"s really very short. There"s not much to it; all of the heavy lifting is done in external Python libraries, which live in /xen/xend/server /xen/xend/server in one of the Python library directories. (In the case of the system I"m sitting in front of, this is in one of the Python library directories. (In the case of the system I"m sitting in front of, this is /usr/lib/python2.4/site-packages/xen/xend/server /usr/lib/python2.4/site-packages/xen/xend/server.) Likewise, xm xm is also a short Python script. The take-home message here is that most of the error messages that you"ll see emanate from somewhere in this directory tree, and they"ll helpfully print the responsible file and line number so you can examine the Python script more closely. For example, look at this line from is also a short Python script. The take-home message here is that most of the error messages that you"ll see emanate from somewhere in this directory tree, and they"ll helpfully print the responsible file and line number so you can examine the Python script more closely. For example, look at this line from /var/log/xen/xend.log /var/log/xen/xend.log: [2007-08-0720:14:266008]WARNING(XendAPI:672)APIcall: VM.get_auto_power_onnotfound At the beginning is the date, time, and xend xend"s Process ID (PID). Then comes the severity of the error (in this case, WARNING WARNING, which is merely irritating). After that is the file and line number where the error occurred, followed by the contents of the error message.
XEN"S HIERARCHY OF INFORMATIVE MESSAGESWARNING is only one point along the continuum of messages. At the lowest extreme of severity, we have is only one point along the continuum of messages. At the lowest extreme of severity, we have DEBUG DEBUG, which the developers use for whatever output strikes their fancy. It"s often useful, but it generates a lot of data to wade through. Slightly more significantly, we have INFO INFO. Messages at this level are supposed to be interesting or useful to the administrator but not indicative of a problem.Then comes WARNING WARNING, which indicates a problem, but not a critical one. For example, the previous message tells us that we"d have trouble if we"re relying on the VM.get_auto_power_on VM.get_auto_power_on function but that nothing bad will happen if we don"t try touseit. function but that nothing bad will happen if we don"t try touseit.Finally, Xen uses ERROR ERROR for genuine, beyond-denial errors-the sort of thing thatcan"t be put off or ignored. Generally this means that a domain is exiting abnormally. for genuine, beyond-denial errors-the sort of thing thatcan"t be put off or ignored. Generally this means that a domain is exiting abnormally.
Armed with this information, you can do several things. To continue our earlier example, we"ll open /usr/lib/python2.5/site-packages/xen/xend/XendAPI.py /usr/lib/python2.5/site-packages/xen/xend/XendAPI.py and add a line near the top of the file to import the deb.u.g.g.e.r module, and add a line near the top of the file to import the deb.u.g.g.e.r module, pdb pdb.
importpdb Having done that, you can set a breakpoint. Just add a line near line 672: pdb.set_trace() Then try rerunning the server (or redoing whatever other behavior you"re concerned with) and note that xend xend starts the deb.u.g.g.e.r when it hits your new breakpoint. starts the deb.u.g.g.e.r when it hits your new breakpoint.
At this point you can do everything that you might expect in a deb.u.g.g.e.r: change the values of variables, step through a function, step into subroutines, and so forth. In this case, we might backtrace, figure out why it"s trying to call VM.get_auto_power_on VM.get_auto_power_on, and maybe wrap it in an error-handling block.
Domain Stays in Blocked State This heading is a bit of a misnomer. The reality is that the "blocked" state reported by tools like xm list xm list simply means that the domain is idle. The true problem is that the domain seems unresponsive. simply means that the domain is idle. The true problem is that the domain seems unresponsive.
Usually we find that this problem is related to the console; for example: [[email protected]~]#xmcreate-csebastian.cfg Usingconfigfile"/etc/xen/sebastian.cfg".
GoingtobootFedoraCore(2.6.18-1.2798.fc6xen) kernel:/vmlinuz-2.6.18-1.2798.fc6xen initrd:/initrd-2.6.18-1.2798.fc6xen.img Starteddomainsebastian rtc:IRQ8isnotfree.
i8042.c:Nocontrollerfound.
(and then an indefinite hang). Upon breaking out and looking at the output of xm list xm list, we note that the domain stays in a blocked state and consumes very little CPU time.
[~]#xmlist NameIDMem(MiB)VCPUsStateTime(s) Domain-0034762r-----407.1 sebastian134991-b----19.9 A quick look at /var/log/xen/xend-debug.log /var/log/xen/xend-debug.log suggested an answer: suggested an answer: 10/09/200720:11:48AutoprobingTCPport 10/09/200720:11:48Autoprobingselectedport5900 Port 5900 is VNC. Aha! The problem was that Xen wasn"t using the virtual console device that xm xm console connects to. In this case, we traced it to user error. We specified the framebuffer and forgot about it. The kernel, as instructed, used the framebuffer as console rather than emulated serial console that we were expecting. When we started a VNC client and connected to port 5900, it gave us the expected graphical console. console connects to. In this case, we traced it to user error. We specified the framebuffer and forgot about it. The kernel, as instructed, used the framebuffer as console rather than emulated serial console that we were expecting. When we started a VNC client and connected to port 5900, it gave us the expected graphical console.
NoteIf we had put a getty getty on xvc0, even though we wouldn"t have seen boot output, we"d at least get a login prompt when the machine booted on xvc0, even though we wouldn"t have seen boot output, we"d at least get a login prompt when the machine booted.
Debugging Hotplug Xen makes extensive use of udev to create and destroy virtual devices, both in the dom0 and the domU. Most of its interaction with Linux"s hotplug subsystem gets logged in /var/log/xen/xen-hotplug.log /var/log/xen/xen-hotplug.log. (We"re going to treat hotplug as synonymous with udev because we can"t think of any system that still uses the pre-udev hotplug implementation.) First, we examine the effects of the script. In this case, we use udevmonitor udevmonitor to see udev events. It should show an to see udev events. It should show an add add event for each event for each vif vif and and vbd vbd as well as an as well as an online online event for the event for the vif vif. These go through the rules in /etc/udev/rules.d/xen-backend.rules /etc/udev/rules.d/xen-backend.rules, which executes appropriate scripts in /etc/xen/scripts /etc/xen/scripts.
At this point you can add some extra logging. At the top of the script for the device you"re interested in (e.g., blktap), put: set-x exec2>>/var/log/xen-hotplug.log This will cause the sh.e.l.l to expand the commands in the script and write them to xen-hotplug.log xen-hotplug.log, enabling you (hopefully) to trace down the source of the problem and eliminate it.
Hotplug can also act as a bit of a catchall for any virtual device problem. Some hotplug-related errors take the form of the dreaded Hotplug scripts not working Hotplug scripts not working message, like the following: message, like the following: Error:Device0(vkbd)couldnotbeconnected.Hotplugscriptsnotworking.
This seems to be a.s.sociated with messages like the following: DEBUG(DevController:148)Waitingfordevicesirq.
DEBUG(DevController:148)Waitingfordevicesvkbd.
DEBUG(DevController:153)Waitingfor0.
DEBUG(DevController:539)hotplugStatusCallback /local/domain/0/backend/vkbd/4/0/hotplug-status In this case, however, these messages turned out to be red herrings. The answer came out of xend-debug.log xend-debug.log, which said: /usr/lib/xen/bin/xen-vncfb:errorwhileloadingsharedlibraries: libvncserver.so.0:cannotopensharedobjectfile:Nosuchfileor directory As it developed, libvncserver libvncserver was installed in was installed in /usr/local /usr/local, which the runtime linker had been ignoring. After adding /usr/local/lib /usr/local/lib to to /etc/ld.so.conf /etc/ld.so.conf, xen-vncfb xen-vncfb started up happily. started up happily.
strace One important generic troubleshooting technique is to use strace to look at what the Xen control tools are really doing. For example, if Xen is failing to find an external binary (like xen-vncfb), strace can reveal that problem with a command like the following: #strace-etrace=open-fxmcreateprospero2>&1grepENOENTless Unfortunately, it"ll also give you a lot of other, entirely harmless output while Python proceeds to pull in the entirety of its runtime environment based on crude guesses about filenames.
Another example of strace"s usefulness comes from when we were setting up PyGRUB: #stracexmcreate-cprospero (snipped) mknod("/var/lib/xen/xenbl.4961",S_IFIFO0600)=-1ENOENT(Nosuchfileor directory) As it turned out, we didn"t have a directory required by PyGRUB"s backend. Thus: #mkdir-p/var/lib/xen/ and everything works fine.