Quantcast
Channel: CentOS Bug Tracker - Issues
Viewing all articles
Browse latest Browse all 19115

0001590: kernel oops: assertion failure at journal:576 (ext3 issue?)

$
0
0
We have a few (3) systems that are crashing with:<br /> <br /> Assertion failure in journal_next_log_block() at fs/jbd/journal.c:576:<br /> "journal->j_free > 1" <br /> <br /> Kernel BUG at journal:576<br /> invalid operand: 0000 [1] SMP<br /> CPU 1<br /> Modules linked in: <br /> md5 ipv6 parport_pc lp parport w83627hf eeprom adm1026 hwmon_vid hwmon<br /> i2c_sensor i2c_isa i2c_amd756 i2c_amd8111 i2c_dev i2c_core nfs lockd<br /> nfs_acl sunrpc ipt_REJECT ipt_state ip_conntrack iptable_filter<br /> ip_tables button battery ac ohci_hcd hw_random tg3 floppy dm_snapshot<br /> dm_zero dm_mirror ext3 jbd dm_mod 3w_9xxx sata_mv libata sd_mod<br /> scsi_mod<br /> Pid: 1603, comm: kjournald Not tainted 2.6.9-42.0.3.ELsmp<br /> RIP: 0010:[<ffffffffa006c18a>]<br /> <ffffffffa006c18a>{:jbd:journal_next_log_block+76}<br /> RSP: 0018:0000010476327b88 EFLAGS: 00010212<br /> RAX: 0000000000000060 RBX: 0000010283163e00 RCX: ffffffff803e1fe8<br /> RDX: ffffffff803e1fe8 RSI: 0000000000000246 RDI: ffffffff803e1fe0<br /> RBP: 0000000000000040 R08: ffffffff803e1fe8 R09: 0000010283163e00<br /> R10: 0000000100000000 R11: ffffffff8011e884 R12: 0000010283163e24<br /> R13: 0000010476327be0 R14: 0000010283163e00 R15: 000000000000002e<br /> FS: 0000002a95560b00(0000) GS:ffffffff804e5200(0000)<br /> knlGS:00000000f7ff36c0<br /> CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b<br /> CR2: 0000002a9556c000 CR3: 0000000037e42000 CR4: 00000000000006e0<br /> Process kjournald (pid: 1603, threadinfo 0000010476326000, task<br /> 0000010478d777f0)<br /> Stack: 0000010453f4afa8 0000010310072240 0000000000000040<br /> 0000010147528be0<br /> 000001044240a880 ffffffffa0067dfe 00000e7c00000000<br /> 00000101c33f2184<br /> 0000000000000000 0000010310b12f50<br /> Call Trace:<ffffffffa0067dfe>{:jbd:journal_commit_transaction+1834}<br /> <ffffffff80135756>{autoremove_wake_function+0}<br /> <ffffffff80135756>{autoremove_wake_function+0}<br /> <ffffffffa006a914>{:jbd:kjournald+250}<br /> <ffffffff80135756>{autoremove_wake_function+0}<br /> <ffffffff80135756>{autoremove_wake_function+0}<br /> <ffffffffa006a814>{:jbd:commit_timeout+0}<br /> <ffffffff80110f47>{child_rip+8}<br /> <ffffffffa006a81a>{:jbd:kjournald+0}<br /> <ffffffff80110f3f>{child_rip+0}<br /> <br /> Code: 0f 0b bd e2 06 a0 ff ff ff ff 40 02 48 8b ab 18 01 00 00 48<br /> RIP <ffffffffa006c18a>{:jbd:journal_next_log_block+76} RSP<br /> <0000010476327b88><br /> <0>Kernel panic - not syncing: Oops<br /> <br /> (Note I editied together some lines in the "Modules linked in"<br /> section. The rest is cut from the serial console (size 80x24) on the<br /> system.)<br /> <br /> We are running centos 4.4 kernel. Uname -a shows:<br /> <br /> Linux cook05 2.6.9-42.0.3.ELsmp <a href="http://bugs.centos.org/view.php?id=1">0000001</a> SMP Fri Oct 6 06:28:26 CDT 2006<br /> x86_64 x86_64 x86_64 GNU/Linux <br /> <br /> The disk subsystem for this crash are 4 sata disks on a 3ware 9550<br /> (see the attached dmesg output for more info) with a mix of western<br /> digital and seagate drives. It has also crashed with sysrq enabled and<br /> (not surprisingly) the system is totally dead. We have to power cycle<br /> it to reboot it.<br /> <br /> Other systems experiencing the same crash have:<br /> <br /> * non-smp version of the same kernel with the software md raid<br /> drivers<br /> * same kernel running a megaraid raid card<br /> <br /> The same crash has also been seen with an earlier kernel version<br /> 2.6.9-42.ELsmp.<br /> <br /> It seems to crash when we expect the system to have high IO, but we<br /> don't have any hard evidence of throughput/transactions to disk to<br /> support that.<br /> <br /> We can try setting up a remote kernel dump if that would be<br /> useful/would work.<br /> <br /> We get a crash every couple of days on average (sometimes two crashes<br /> with 30 min-2 hours between them) so we can try applying patches/new<br /> kernels if needed and see how the system does.<br /> <br /> I have attached selected lines from dmesg to give some additional info<br /> about the hardware and config of the system. I have a copy of<br /> /proc/kallsyms from the system that I can attach if you wish.<br /> In both cases, the files are from a post crash boot that should be<br /> identical to the pre-crash boot. <br /> <br /> If you require more/different information just let me know and I will<br /> try to obtain it.<br /> <br /> Thank you for your help.<br /> <br /> --<br /> -- rouilj<br /> <br /> John Rouillard<br /> System Administrator<br /> Renesys Corporation<br /> 603-643-9300 x 111

Viewing all articles
Browse latest Browse all 19115

Trending Articles