Quantcast
Channel: CentOS Bug Tracker - Issues
Viewing all articles
Browse latest Browse all 19115

0006209: RAID10 disk fail triggering kernel BUG at drivers/scsi/scsi_lib.c:1156!

$
0
0
This issue was happening on a large Raid10 with a failing disk. It was in production so I had to repair the issue by replacing the disk but I still have the crashdumps. Instead of dropping the disk the kernel would crash.<br /> <br /> I found an issue that someone on the Debian bug list thought may be the issue. I'm not sure if this has made it into the CentOS kernel.<br /> <br /> <a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682233">http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682233</a> [<a href="http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=682233" target="_blank">^</a>]<br /> <a href="http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb">http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb</a> [<a href="http://git.kernel.org/?p=linux/kernel/git/jejb/scsi.git;a=commit;h=98dc81b0d6c483a3eb256764ae10f156ccefdbbb" target="_blank">^</a>]<br /> <br /> The drives are SATA WD Raid Editions(WDC WD5003ABYX-01WERA1) on a LSI 9211-8i thru an LSI SAS2X36 expander.<br /> <br /> I'm was originally running old an older LSI firmware and driver, however, I am currently running the latest of both. Still crashing.<br /> <br /> Raid Info(Its currently rebuilding onto the spare):<br /> <br /> /dev/md4:<br /> Version : 1.1<br /> Creation Time : Mon Sep 17 11:42:08 2012<br /> Raid Level : raid10<br /> Array Size : 5372224000 (5123.35 GiB 5501.16 GB)<br /> Used Dev Size : 488384000 (465.76 GiB 500.11 GB)<br /> Raid Devices : 22<br /> Total Devices : 23<br /> Persistence : Superblock is persistent<br /> <br /> Intent Bitmap : Internal<br /> <br /> Update Time : Mon Jan 21 13:23:50 2013<br /> State : active, degraded, recovering <br /> Active Devices : 21<br /> Working Devices : 23<br /> Failed Devices : 0<br /> Spare Devices : 2<br /> <br /> Layout : near=2<br /> Chunk Size : 512K<br /> <br /> Rebuild Status : 37% complete<br /> <br /> Name : ???.a2hosting.com:4 (local to host ???.a2hosting.com)<br /> UUID : 248488c8:93b3e4bc:971a6676:3d77fb4d<br /> Events : 447295<br /> <br /> Number Major Minor RaidDevice State<br /> 0 8 1 0 active sync /dev/sda1<br /> 1 8 161 1 active sync /dev/sdk1<br /> 2 8 17 2 active sync /dev/sdb1<br /> 3 8 177 3 active sync /dev/sdl1<br /> 4 8 33 4 active sync /dev/sdc1<br /> 5 8 193 5 active sync /dev/sdm1<br /> 6 8 49 6 active sync /dev/sdd1<br /> 7 8 209 7 active sync /dev/sdn1<br /> 8 8 65 8 active sync /dev/sde1<br /> 9 8 225 9 active sync /dev/sdo1<br /> 10 8 81 10 active sync /dev/sdf1<br /> 11 8 241 11 active sync /dev/sdp1<br /> 12 8 97 12 active sync /dev/sdg1<br /> 13 65 1 13 active sync /dev/sdq1<br /> 14 8 113 14 active sync /dev/sdh1<br /> 15 65 17 15 active sync /dev/sdr1<br /> 16 8 129 16 active sync /dev/sdi1<br /> 17 65 33 17 active sync /dev/sds1<br /> 18 8 145 18 active sync /dev/sdj1<br /> 22 65 97 19 spare rebuilding /dev/sdw1<br /> 20 65 65 20 active sync /dev/sdu1<br /> 21 65 81 21 active sync /dev/sdv1<br /> <br /> 23 65 113 - spare /dev/sdx1<br /> <br /> LSI Info:<br /> <br /> mpt2sas version 15.00.00.00 loaded<br /> scsi0 : Fusion MPT SAS Host<br /> alloc irq_desc for 30 on node 0<br /> alloc kstat_irqs on node 0<br /> alloc irq_2_iommu on node 0<br /> mpt2sas 0000:03:00.0: PCI INT A -> GSI 30 (level, low) -> IRQ 30<br /> mpt2sas 0000:03:00.0: setting latency timer to 64<br /> mpt2sas0: 64 BIT PCI BUS DMA ADDRESSING SUPPORTED, total mem (49416756 kB)<br /> alloc irq_desc for 52 on node 0<br /> alloc kstat_irqs on node 0<br /> alloc irq_2_iommu on node 0<br /> mpt2sas 0000:03:00.0: irq 52 for MSI/MSI-X<br /> mpt2sas0-msix0: PCI-MSI-X enabled: IRQ 52<br /> mpt2sas0: iomem(0x00000000fbb3c000), mapped(0xffffc90017168000), size(16384)<br /> mpt2sas0: ioport(0x000000000000c000), size(256)<br /> mpt2sas0: sending diag reset !!<br /> mpt2sas0: diag reset: SUCCESS<br /> mpt2sas0: Allocated physical memory: size(3392 kB)<br /> mpt2sas0: Current Controller Queue Depth(1483), Max Controller Queue Depth(1720)<br /> mpt2sas0: Scatter Gather Elements per IO(128)<br /> mpt2sas0: LSISAS2008: FWVersion(15.00.00.00), ChipRevision(0x03), BiosVersion(07.29.00.00)<br /> mpt2sas0: Protocol=(Initiator), Capabilities=(Raid,TLR,EEDP,Snapshot Buffer,Diag Trace Buffer,Task Set Full,NCQ)<br /> mpt2sas0: sending port enable !!<br /> <br /> <br /> Crash Info(From the crashdump kernel log):<br /> <br /> sd 0:0:19:0: [sdt] Unhandled sense code<br /> sd 0:0:19:0: [sdt] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE<br /> sd 0:0:19:0: [sdt] Sense Key : Medium Error [current] <br /> Info fld=0x39e30f68<br /> sd 0:0:19:0: [sdt] Add. Sense: Unrecovered read error<br /> sd 0:0:19:0: [sdt] CDB: Read(10): 28 00 39 e3 0f 40 00 00 68 00<br /> sd 0:0:19:0: [sdt] Unhandled sense code<br /> sd 0:0:19:0: [sdt] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE<br /> sd 0:0:19:0: [sdt] Sense Key : Medium Error [current] <br /> Info fld=0x39e30f68<br /> sd 0:0:19:0: [sdt] Add. Sense: Unrecovered read error<br /> sd 0:0:19:0: [sdt] CDB: Read(10): 28 00 39 e3 0f 68 00 00 08 00<br /> ------------[ cut here ]------------<br /> kernel BUG at drivers/scsi/scsi_lib.c:1156!<br /> invalid opcode: 0000 [<a href="http://bugs.centos.org/view.php?id=1">0000001</a>] SMP <br /> last sysfs file: /sys/devices/system/cpu/cpu15/cache/index2/shared_cpu_map<br /> CPU 4 <br /> Modules linked in: nfsd lockd nfs_acl auth_rpcgss exportfs autofs4 sunrpc cpufreq_ondemand acpi_cpufreq freq_table mperf ipt_REJECT nf_conntrack_ipv4 nf_defrag_ipv4 iptable_filter ip_tables ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables ipv6 raid10 ses enclosure microcode serio_raw i2c_i801 i2c_core sg iTCO_wdt iTCO_vendor_support e1000e ioatdma dca i7core_edac edac_core shpchp ext4 mbcache jbd2 raid1 sd_mod crc_t10dif ahci mpt2sas(U) scsi_transport_sas raid_class dm_mirror dm_region_hash dm_log dm_mod [last unloaded: scsi_wait_scan]<br /> <br /> Pid: 2008, comm: md4_raid10 Not tainted 2.6.32-279.19.1.el6.x86_64 <a href="http://bugs.centos.org/view.php?id=1">0000001</a> Supermicro X8DTL/X8DTL<br /> RIP: 0010:[<ffffffff8135dbfe>] [<ffffffff8135dbfe>] scsi_setup_fs_cmnd+0x9e/0xe0<br /> RSP: 0018:ffff88062ee27870 EFLAGS: 00010046<br /> RAX: 0000000000000000 RBX: ffff880c14fe6e20 RCX: 0000000000000001<br /> RDX: 0000000000000000 RSI: ffff880c14fe6e20 RDI: ffff88062c649800<br /> RBP: ffff88062ee27880 R08: 0000000000000086 R09: 0000000000000001<br /> R10: 0000000039e30768 R11: 0000000000000000 R12: ffff88062c649800<br /> R13: ffff88062c652838 R14: ffff88062c649800 R15: ffff88062c732800<br /> FS: 0000000000000000(0000) GS:ffff880655400000(0000) knlGS:0000000000000000<br /> CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b<br /> CR2: 0000000002d77e68 CR3: 0000000c18027000 CR4: 00000000000006e0<br /> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000<br /> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400<br /> Process md4_raid10 (pid: 2008, threadinfo ffff88062ee26000, task ffff88062bfdaaa0)<br /> Stack:<br /> ffff880c14fe6e20 ffff880c14fe6e20 ffff88062ee27910 ffffffffa0099d17<br /> <d> ffff880c14fe6e20 ffff88062df3c000 ffff88062ee27910 ffffffff8126476f<br /> <d> ffff880600000000 0000000039e30768 0000000000000000 0000000004100031<br /> Call Trace:<br /> <br /> [<ffffffffa0099d17>] sd_prep_fn+0x157/0xf30 [sd_mod]<br /> [<ffffffff8126476f>] ? cfq_dispatch_requests+0x2cf/0xa70<br /> [<ffffffff81261c47>] ? cfq_prio_tree_add+0xc7/0xd0<br /> [<ffffffff8124f527>] blk_peek_request+0xc7/0x210<br /> [<ffffffff8135cd33>] scsi_request_fn+0x63/0x790<br /> [<ffffffff8107caed>] ? del_timer+0x7d/0xe0<br /> [<ffffffff81247271>] ? elv_insert+0xd1/0x1a0<br /> [<ffffffff8124cf02>] __generic_unplug_device+0x32/0x40<br /> [<ffffffff81250088>] __make_request+0x168/0x5a0<br /> [<ffffffff8124e65e>] generic_make_request+0x25e/0x530<br /> [<ffffffff811124c5>] ? mempool_alloc_slab+0x15/0x20<br /> [<ffffffff81112663>] ? mempool_alloc+0x63/0x140<br /> [<ffffffff8124e65e>] ? generic_make_request+0x25e/0x530<br /> [<ffffffff811124c5>] ? mempool_alloc_slab+0x15/0x20<br /> [<ffffffff81112663>] ? mempool_alloc+0x63/0x140<br /> [<ffffffff8124e9bd>] submit_bio+0x8d/0x120<br /> [<ffffffff813e90e6>] sync_page_io+0xb6/0x110<br /> [<ffffffffa01f2de6>] r10_sync_page_io+0x56/0x110 [raid10]<br /> [<ffffffffa01f3216>] fix_read_error+0x376/0x6f0 [raid10]<br /> [<ffffffffa01f4563>] raid10d+0xfd3/0x1130 [raid10]<br /> [<ffffffff8107d4eb>] ? try_to_del_timer_sync+0x7b/0xe0<br /> [<ffffffff8107d572>] ? del_timer_sync+0x22/0x30<br /> [<ffffffff814eaa4a>] ? schedule_timeout+0x19a/0x2e0<br /> [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20<br /> [<ffffffff813e8046>] md_thread+0x116/0x150<br /> [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40<br /> [<ffffffff813e7f30>] ? md_thread+0x0/0x150<br /> [<ffffffff81090626>] kthread+0x96/0xa0<br /> [<ffffffff8100c0ca>] child_rip+0xa/0x20<br /> [<ffffffff81090590>] ? kthread+0x0/0xa0<br /> [<ffffffff8100c0c0>] ? child_rip+0x0/0x20<br /> Code: 00 e8 17 fe ff ff 5b 41 5c c9 c3 66 90 4c 89 e7 be 20 00 00 00 e8 23 85 ff ff 48 85 c0 48 89 c7 74 38 48 89 83 d8 00 00 00 eb a0 <0f> 0b eb fe 48 8b 00 48 85 c0 0f 84 7a ff ff ff 48 8b 40 48 48 <br /> RIP [<ffffffff8135dbfe>] scsi_setup_fs_cmnd+0x9e/0xe0<br /> RSP <ffff88062ee27870>

Viewing all articles
Browse latest Browse all 19115

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>