In a default RHEL install, the kdump utility is configured to reserve an area of memory to capture kernel dumps in the event of a kernel panic. The GRUB boot entry is named something like "crashkernel=128M@16M" which tells it to reserve a chunk of memory for this special kernel.
There's a utility called "crash" which can analyze the contents of these dumps. The dumps themselves are located in /var/crash/YYYY-MM-DD-HH:MM/vmcore.
First check to see if the utility is installed (if not you can install it using yum):
[root@saturn ~]# which crash
/usr/bin/crash
The crash utility requires specific kernel debug packages based on the kernel that was running when the crash occurred. They aren't in the default yum repositories for some reason. They can be found here for RHEL 5: ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/
[root@saturn ~]# uname -a
Linux saturn.coldcache.com 2.6.18-194.26.1.el5 #1 SMP Fri Oct 29 14:21:16 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux
[root@saturn ~]# rpm -ivh ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/kernel-debuginfo-common-2.6.18-194.26.1.el5.x86_64.rpm
[root@saturn ~]# rpm -ivh ftp://ftp.redhat.com/pub/redhat/linux/enterprise/5Server/en/os/x86_64/Debuginfo/kernel-debuginfo-2.6.18-194.26.1.el5.x86_64.rpm
Once those are installed, then we're ready to run crash and point it to the vmcore file.
[root@saturn ~]# crash /usr/lib/debug/lib/modules/2.6.18-194.26.1.el5/vmlinux /var/crash/2011-02-12-07\:04/vmcore
crash 4.1.2-8.el5
Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Red Hat, Inc.
Copyright (C) 2004, 2005, 2006 IBM Corporation
Copyright (C) 1999-2006 Hewlett-Packard Co
Copyright (C) 2005, 2006 Fujitsu Limited
Copyright (C) 2006, 2007 VA Linux Systems Japan K.K.
Copyright (C) 2005 NEC Corporation
Copyright (C) 1999, 2002, 2007 Silicon Graphics, Inc.
Copyright (C) 1999, 2000, 2001, 2002 Mission Critical Linux, Inc.
This program is free software, covered by the GNU General Public License,
and you are welcome to change it and/or distribute copies of it under
certain conditions. Enter "help copying" to see the conditions.
This program has absolutely no warranty. Enter "help warranty" for details.
GNU gdb 6.1
Copyright 2004 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB. Type "show warranty" for details.
This GDB was configured as "x86_64-unknown-linux-gnu"...
KERNEL: /usr/lib/debug/lib/modules/2.6.18-194.26.1.el5/vmlinux
DUMPFILE: /var/crash/2011-02-12-07:04/vmcore
CPUS: 8
DATE: Sat Feb 12 07:03:02 2011
UPTIME: 00:01:06
LOAD AVERAGE: 0.77, 0.25, 0.09
TASKS: 211
NODENAME: saturn.coldcache.com
RELEASE: 2.6.18-194.26.1.el5
VERSION: #1 SMP Fri Oct 29 14:21:16 EDT 2010
MACHINE: x86_64 (2813 Mhz)
MEMORY: 39.4 GB
PANIC: "Oops: 0002 [1] SMP " (check log for details)
PID: 5534
COMMAND: "insmod"
TASK: ffff81087f591080 [THREAD_INFO: ffff81087cc5e000]
CPU: 4
STATE: TASK_RUNNING (PANIC)
crash>
The above output shows that there was a kernel panic caused by an "insmod" command with PID 5534 at 7:03am.
You're then dropped into a crash command prompt which lets you run other commands to get more information.
View the contents of dmesg at that time by typing "log":
crash > log
(output truncated for brevity)
kobject_add failed for ipmi_bmc.17 with -EEXIST, don't try to register things with the
same name in the same directory.
Call Trace:
[] kobject_add+0x170/0x19b
[] device_add+0x85/0x372
[] platform_device_add+0xd8/0x129
[] :ipmi_msghandler:ipmi_register_smi+0x5cc/0xab7
[] autoremove_wake_function+0x0/0x2e
[] :ipmi_si:try_smi_init+0x494/0x685
[] :ipmi_si:ipmi_pci_probe+0xa0/0x17f
[] pci_device_probe+0x104/0x184
[] driver_probe_device+0x52/0xaa
[] __driver_attach+0x65/0xb6
[] __driver_attach+0x0/0xb6
[] bus_for_each_dev+0x43/0x6e
[] bus_add_driver+0x76/0x110
[] __pci_register_driver+0x51/0xa6
[] :ipmi_si:init_ipmi_si+0x5f6/0x746
[] sys_init_module+0xaf/0x1f2
[] tracesys+0xd5/0xe0
ipmi_msghandler: Unable to register bmc device: -17
ipmi_si: Unable to register device: error -17
Unable to handle kernel NULL pointer dereference at 0000000000000000 RIP:
[] _spin_lock+0x0/0xa
PGD 36b8c4067 PUD 36b9b4067 PMD 0
Oops: 0002 [1] SMP
last sysfs file: /devices/pci0000:40/0000:40:0b.0/0000:4f:00.0/host1/rport-1:0-1/target1:0:1/1:0:1:1/timeout
CPU 4
Modules linked in: ipmi_si(U) ipmi_devintf(U) ipmi_msghandler(U) autofs4 hidp l2cap bluetooth
lockd sunrpc dm_round_robin dm_multipath scsi_dh video backlight sbs power_meter i2c_ec i2c_core dell_wmi
wmi button battery asus_acpi acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc
lp parport ide_cd k8temp serio_raw shpchp cdrom bnx2 hwmon k8_edac edac_mc sg hpilo pcspkr
dm_raid45 dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror dm_log dm_mod
qla2xxx scsi_transport_fc cciss sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd ehci_hcd
Pid: 5534, comm: insmod Tainted: G 2.6.18-194.26.1.el5 #1
RIP: 0010:[] [] _spin_lock+0x0/0xa
RSP: 0018:ffff81087cc5fc90 EFLAGS: 00010292
RAX: 0000000000000000 RBX: ffff81037994cc38 RCX: ffff81037fe1d800
RDX: ffff81037f96b000 RSI: ffffffff801510b5 RDI: 0000000000000000
RBP: ffff81037994cc10 R08: ffff81087cc5e000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000080 R12: 0000000000000000
R13: ffffffff8033bce0 R14: 0000000000000000 R15: 0000000000000000
FS: 00002b05aa7f9210(0000) GS:ffff81068710d440(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000373272000 CR4: 00000000000006e0
Process insmod (pid: 5534, threadinfo ffff81087cc5e000, task ffff81087f591080)
Stack: ffffffff8028397c ffffffffffffffff ffff81037994cc00 ffff81036e2ab000
ffffffff801c5f3f ffff81036e2ab000 ffff81037994cc00 ffff81037f96b000
ffff81036e2ab000 ffff81037f96b000 ffffffff801c96ce ffff81037f96b034
Call Trace:
[] klist_del+0x15/0x2a
[] device_del+0x22/0x1a9
[] platform_device_unregister+0x9/0x12
[] :ipmi_msghandler:cleanup_bmc_device+0xde/0xe9
[] :ipmi_msghandler:cleanup_bmc_device+0x0/0xe9
[] kref_put+0x6f/0x7a
[] :ipmi_msghandler:ipmi_bmc_unregister+0x6a/0x79
[] :ipmi_msghandler:ipmi_unregister_smi+0xc/0xf4
[] :ipmi_si:try_smi_init+0x59e/0x685
[] :ipmi_si:ipmi_pci_probe+0xa0/0x17f
Show the process tree for PID 5534:
crash> ps -p 5534
PID: 0 TASK: ffffffff80308b60 CPU: 0 COMMAND: "swapper"
PID: 1 TASK: ffff81010c4a97a0 CPU: 4 COMMAND: "init"
PID: 3141 TASK: ffff81037fd260c0 CPU: 6 COMMAND: "rc"
PID: 4358 TASK: ffff81087f1450c0 CPU: 2 COMMAND: "S91hpasm"
PID: 4373 TASK: ffff81087f521100 CPU: 6 COMMAND: "sh"
PID: 5306 TASK: ffff81087f71a100 CPU: 6 COMMAND: "sh"
PID: 5434 TASK: ffff81087f11f7e0 CPU: 6 COMMAND: "hp-OpenIPMI"
PID: 5534 TASK: ffff81087f591080 CPU: 4 COMMAND: "insmod"
The above shows that that the culprit is an IPMI kernel driver and it was loaded by S91hpasm startup script which invokes another script named hp-OpenIPMI. Disabling or removing that driver resolved the kernel panic issue for this server.