This issue was happening on a large Raid10 with a failing disk. It was in production so I had to repair the issue by replacing the disk but I still have the crashdumps. Instead of dropping the disk the kernel would crash.<br />
<br />
I found an issue that someone on the Debian bug list thought may be the issue. I'm not sure if this has made it into the CentOS kernel.<br />
<br />
<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682233">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682233</a> [<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682233" target="_blank">^</a>]<br />
<a href="http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb">http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb</a> [<a href="http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb" target="_blank">^</a>]<br />
<br />
The drives are SATA WD Raid Editions(WDC WD5003ABYX-01WERA1) on a LSI 9211-8i thru an LSI SAS2X36 expander.<br />
<br />
I'm was originally running old an older LSI firmware and driver, however, I am currently running the latest of both. Still crashing.<br />
<br />
Raid Info(Its currently rebuilding onto the spare):<br />
<br />
/dev/md4:<br />
Version : 1.1<br />
Creation Time : Mon Sep 17 11:42:08 2012<br />
Raid Level : raid10<br />
Array Size : 5372224000 (5123.35 GiB 5501.16 GB)<br />
Used Dev Size : 488384000 (465.76 GiB 500.11 GB)<br />
Raid Devices : 22<br />
Total Devices : 23<br />
Persistence : Superblock is persistent<br />
<br />
Intent Bitmap : Internal<br />
<br />
Update Time : Mon Jan 21 13:23:50 2013<br />
State : active, degraded, recovering <br />
Active Devices : 21<br />
Working Devices : 23<br />
Failed Devices : 0<br />
Spare Devices : 2<br />
<br />
Layout : near=2<br />
Chunk Size : 512K<br />
<br />
Rebuild Status : 37% complete<br />
<br />
Name : ???.a2hosting.com:4 (local to host ???.a2hosting.com)<br />
UUID : 248488c8:93b3e4bc:971a6676:3d77fb4d<br />
Events : 447295<br />
<br />
Number Major Minor RaidDevice State<br />
0 8 1 0 active sync /dev/sda1<br />
1 8 161 1 active sync /dev/sdk1<br />
2 8 17 2 active sync /dev/sdb1<br />
3 8 177 3 active sync /dev/sdl1<br />
4 8 33 4 active sync /dev/sdc1<br />
5 8 193 5 active sync /dev/sdm1<br />
6 8 49 6 active sync /dev/sdd1<br />
7 8 209 7 active sync /dev/sdn1<br />
8 8 65 8 active sync /dev/sde1<br />
9 8 225 9 active sync /dev/sdo1<br />
10 8 81 10 active sync /dev/sdf1<br />
11 8 241 11 active sync /dev/sdp1<br />
12 8 97 12 active sync /dev/sdg1<br />
13 65 1 13 active sync /dev/sdq1<br />
14 8 113 14 active sync /dev/sdh1<br />
15 65 17 15 active sync /dev/sdr1<br />
16 8 129 16 active sync /dev/sdi1<br />
17 65 33 17 active sync /dev/sds1<br />
18 8 145 18 active sync /dev/sdj1<br />
22 65 97 19 spare rebuilding /dev/sdw1<br />
20 65 65 20 active sync /dev/sdu1<br />
21 65 81 21 active sync /dev/sdv1<br />
<br />
23 65 113 - spare /dev/sdx1<br />
<br />
LSI Info:<br />
<br />
mpt2sas version 15.00.00.00 loaded<br />
scsi0 : Fusion MPT SAS Host<br />
alloc irq_desc for 30 on node 0<br />
alloc kstat_irqs on node 0<br />
alloc irq_2_iommu on node 0<br />
mpt2sas 0000:03:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30<br />
mpt2sas 0000:03:00.0: setting latency timer to 64<br />
mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (49416756 kB)<br />
alloc irq_desc for 52 on node 0<br />
alloc kstat_irqs on node 0<br />
alloc irq_2_iommu on node 0<br />
mpt2sas 0000:03:00.0: irq 52 for MSI/MSI-X<br />
mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 52<br />
mpt2sas0: iomem(0x00000000fbb3c000), mapped(0xffffc90017168000), size(16384)<br />
mpt2sas0: ioport(0x000000000000c000), size(256)<br />
mpt2sas0: sending diag reset !!<br />
mpt2sas0: diag reset: SUCCESS<br />
mpt2sas0: Allocated physical memory: size(3392 kB)<br />
mpt2sas0: Current Controller Queue Depth(1483), Max Controller Queue Depth(1720)<br />
mpt2sas0: Scatter Gather Elements per IO(128)<br />
mpt2sas0: LSISAS2008: FWVersion(15.00.00.00), ChipRevision(0x03), BiosVersion(07.29.00.00)<br />
mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)<br />
mpt2sas0: sending port enable !!<br />
<br />
<br />
Crash Info(From the crashdump kernel log):<br />
<br />
sd 0:0:19:0: [sdt] Unhandled sense code<br />
sd 0:0:19:0: [sdt] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE<br />
sd 0:0:19:0: [sdt] Sense Key : Medium Error [current] <br />
Info fld=0x39e30f68<br />
sd 0:0:19:0: [sdt] Add. Sense: Unrecovered read error<br />
sd 0:0:19:0: [sdt] CDB: Read(10): 28 00 39 e3 0f 40 00 00 68 00<br />
sd 0:0:19:0: [sdt] Unhandled sense code<br />
sd 0:0:19:0: [sdt] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE<br />
sd 0:0:19:0: [sdt] Sense Key : Medium Error [current] <br />
Info fld=0x39e30f68<br />
sd 0:0:19:0: [sdt] Add. Sense: Unrecovered read error<br />
sd 0:0:19:0: [sdt] CDB: Read(10): 28 00 39 e3 0f 68 00 00 08 00<br />
------------[ cut here ]------------<br />
kernel BUG at drivers/scsi/scsi_lib.c:1156!<br />
invalid opcode: 0000 [<a href="http://bugs.centos.org/view.php?id=1">0000001</a>] SMP <br />
last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map<br />
CPU 4 <br />
Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 raid10 ses enclosure microcode serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support e1000e ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 raid1 sd_mod crc_t10dif ahci mpt2sas(U) scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]<br />
<br />
Pid: 2008, comm: md4_raid10 Not tainted 2.6.32-279.19.1.el6.x86_64 <a href="http://bugs.centos.org/view.php?id=1">0000001</a> Supermicro X8DTL/X8DTL<br />
RIP: 0010:[<ffffffff8135dbfe>] [<ffffffff8135dbfe>] scsi_setup_fs_cmnd+0x9e/0xe0<br />
RSP: 0018:ffff88062ee27870 EFLAGS: 00010046<br />
RAX: 0000000000000000 RBX: ffff880c14fe6e20 RCX: 0000000000000001<br />
RDX: 0000000000000000 RSI: ffff880c14fe6e20 RDI: ffff88062c649800<br />
RBP: ffff88062ee27880 R08: 0000000000000086 R09: 0000000000000001<br />
R10: 0000000039e30768 R11: 0000000000000000 R12: ffff88062c649800<br />
R13: ffff88062c652838 R14: ffff88062c649800 R15: ffff88062c732800<br />
FS: 0000000000000000(0000) GS:ffff880655400000(0000) knlGS:0000000000000000<br />
CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b<br />
CR2: 0000000002d77e68 CR3: 0000000c18027000 CR4: 00000000000006e0<br />
DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br />
DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400<br />
Process md4_raid10 (pid: 2008, threadinfo ffff88062ee26000, task ffff88062bfdaaa0)<br />
Stack:<br />
ffff880c14fe6e20 ffff880c14fe6e20 ffff88062ee27910 ffffffffa0099d17<br />
<d> ffff880c14fe6e20 ffff88062df3c000 ffff88062ee27910 ffffffff8126476f<br />
<d> ffff880600000000 0000000039e30768 0000000000000000 0000000004100031<br />
Call Trace:<br />
<br />
[<ffffffffa0099d17>] sd_prep_fn+0x157/0xf30 [sd_mod]<br />
[<ffffffff8126476f>] ? cfq_dispatch_requests+0x2cf/0xa70<br />
[<ffffffff81261c47>] ? cfq_prio_tree_add+0xc7/0xd0<br />
[<ffffffff8124f527>] blk_peek_request+0xc7/0x210<br />
[<ffffffff8135cd33>] scsi_request_fn+0x63/0x790<br />
[<ffffffff8107caed>] ? del_timer+0x7d/0xe0<br />
[<ffffffff81247271>] ? elv_insert+0xd1/0x1a0<br />
[<ffffffff8124cf02>] __generic_unplug_device+0x32/0x40<br />
[<ffffffff81250088>] __make_request+0x168/0x5a0<br />
[<ffffffff8124e65e>] generic_make_request+0x25e/0x530<br />
[<ffffffff811124c5>] ? mempool_alloc_slab+0x15/0x20<br />
[<ffffffff81112663>] ? mempool_alloc+0x63/0x140<br />
[<ffffffff8124e65e>] ? generic_make_request+0x25e/0x530<br />
[<ffffffff811124c5>] ? mempool_alloc_slab+0x15/0x20<br />
[<ffffffff81112663>] ? mempool_alloc+0x63/0x140<br />
[<ffffffff8124e9bd>] submit_bio+0x8d/0x120<br />
[<ffffffff813e90e6>] sync_page_io+0xb6/0x110<br />
[<ffffffffa01f2de6>] r10_sync_page_io+0x56/0x110 [raid10]<br />
[<ffffffffa01f3216>] fix_read_error+0x376/0x6f0 [raid10]<br />
[<ffffffffa01f4563>] raid10d+0xfd3/0x1130 [raid10]<br />
[<ffffffff8107d4eb>] ? try_to_del_timer_sync+0x7b/0xe0<br />
[<ffffffff8107d572>] ? del_timer_sync+0x22/0x30<br />
[<ffffffff814eaa4a>] ? schedule_timeout+0x19a/0x2e0<br />
[<ffffffff8105fa40>] ? default_wake_function+0x0/0x20<br />
[<ffffffff813e8046>] md_thread+0x116/0x150<br />
[<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40<br />
[<ffffffff813e7f30>] ? md_thread+0x0/0x150<br />
[<ffffffff81090626>] kthread+0x96/0xa0<br />
[<ffffffff8100c0ca>] child_rip+0xa/0x20<br />
[<ffffffff81090590>] ? kthread+0x0/0xa0<br />
[<ffffffff8100c0c0>] ? child_rip+0x0/0x20<br />
Code: 00 e8 17 fe ff ff 5b 41 5c c9 c3 66 90 4c 89 e7 be 20 00 00 00 e8 23 85 ff ff 48 85 c0 48 89 c7 74 38 48 89 83 d8 00 00 00 eb a0 <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 7a ff ff ff 48 8b 40 48 48 <br />
RIP [<ffffffff8135dbfe>] scsi_setup_fs_cmnd+0x9e/0xe0<br />
RSP <ffff88062ee27870>
↧