I'm using watchdog embedded in my servers based on Intel S5500BC and Intel SE7501BR2:<br />
<br />
ACPI: SSDT 000000008f7d1000 001D8 (v02 INTEL IPMI 00004000 INTL 20061109)<br />
<br />
[root@phobos ~]# bmc-watchdog -g<br />
Timer Use: SMS/OS<br />
Timer: Running<br />
Logging: Enabled<br />
Timeout Action: Hard Reset<br />
Pre-Timeout Interrupt: None<br />
Pre-Timeout Interval: 0 seconds<br />
Timer Use BIOS FRB2 Flag: Clear<br />
Timer Use BIOS POST Flag: Clear<br />
Timer Use BIOS OS Load Flag: Clear<br />
Timer Use BIOS SMS/OS Flag: Clear<br />
Timer Use BIOS OEM Flag: Clear<br />
Initial Countdown: 480 seconds<br />
Current Countdown: 439 seconds<br />
<br />
[root@phobos ~]# dmesg | grep -i watchdog<br />
NMI watchdog enabled, takes one hw-pmu counter.<br />
iTCO_wdt: Intel TCO WatchDog Timer Driver v1.07rh<br />
<br />
Watchdog works, all was fine before updating kernel to 2.6.32-358.2.1. After update watchdog begin to reset my systems suddenly without reasons:<br />
<br />
[root@phobos ~]# ipmitool -I open sel list | grep Watchdog<br />
42f | 07/28/2011 | 04:50:25 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
434 | 07/28/2011 | 05:03:00 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
440 | 07/28/2011 | 05:26:04 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
445 | 07/28/2011 | 05:38:14 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
44a | 07/28/2011 | 05:50:26 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
44f | 07/28/2011 | 06:02:38 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
686 | 07/05/2012 | 04:42:12 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6db | 03/19/2013 | 10:58:05 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6df | 03/21/2013 | 11:22:54 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6e3 | 03/25/2013 | 06:16:11 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6e7 | 03/25/2013 | 13:25:49 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6eb | 03/26/2013 | 04:05:54 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6f1 | 03/30/2013 | 04:06:23 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6f5 | 04/01/2013 | 02:35:23 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6f9 | 04/01/2013 | 07:30:58 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
6fd | 04/01/2013 | 16:40:48 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
701 | 04/01/2013 | 20:11:23 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
705 | 04/02/2013 | 17:21:29 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
709 | 04/03/2013 | 19:42:51 | Watchdog 2 #0x03 | Hard reset | Asserted<br />
(time here is in UTC, time in logs is in UTC+6)<br />
<br />
[root@phobos ~]# less /var/log/yum.log | grep -i ipmi<br />
Mar 29 11:46:19 Installed: ipmitool-1.8.11-14.el6_4.1.x86_64<br />
[root@phobos ~]# less /var/log/yum.log | grep -i kernel<br />
Feb 14 09:06:01 Updated: kernel-firmware-2.6.32-279.22.1.el6.noarch<br />
Feb 14 09:06:23 Installed: kernel-2.6.32-279.22.1.el6.x86_64<br />
Mar 17 11:38:39 Updated: dracut-kernel-004-303.el6.noarch<br />
Mar 17 11:38:47 Updated: kernel-firmware-2.6.32-358.2.1.el6.noarch<br />
Mar 17 11:39:14 Installed: kernel-2.6.32-358.2.1.el6.x86_64<br />
<br />
I have this in /var/log/messages before reset occurs (time is in UTC+6):<br />
Apr 1 12:36:09 phobos kernel: IPMI message handler: BMC returned incorrect response, expected netfn 7 cmd 35, got netfn 7 cmd 22<br />
Apr 1 12:59:10 phobos kernel: IPMI message handler: BMC returned incorrect response, expected netfn 7 cmd 35, got netfn 7 cmd 22<br />
Apr 1 13:32:33 phobos kernel: imklog 5.8.10, log source = /proc/kmsg started.
↧