The affected system is still running:<br />
# uname -a<br />
Linux share8.xxxxxx.xx.xx 2.6.32-431.5.1.el6.x86_64 <a href="http://bugs.centos.org/view.php?id=1">0000001</a> SMP Wed Feb 12 00:41:43 UTC 2014 x86_64 x86_64 x86_64 GNU/Linux<br />
<br />
Installed packages in question are:<br />
device-mapper-persistent-data-0.2.8-4.el6_5.x86_64<br />
device-mapper-libs-1.02.79-8.el6.x86_64<br />
device-mapper-event-1.02.79-8.el6.x86_64<br />
device-mapper-1.02.79-8.el6.x86_64<br />
device-mapper-event-libs-1.02.79-8.el6.x86_64<br />
lvm2-libs-2.02.100-8.el6.x86_64<br />
lvm2-2.02.100-8.el6.x86_64<br />
<br />
It is a guest on a VMWare ESX.<br />
<br />
The system has 5 PV (sd[bcdef]) in a VG named "share0".<br />
<br />
We already had several LVs on it and created another one with<br />
(see archive/share0_00036-1607104706.vg)<br />
# lvcreate -l 167 -n ter_analyticlv share0<br />
<br />
The LV was formated using mkfs.ext4 afterwards and mounted. log/messages noted that with<br />
Sep 9 14:00:11 share8 kernel: EXT4-fs (dm-27): mounted filesystem with ordered data mode. Opts: <br />
<br />
<br />
After some hours another LV was created...<br />
(see archive/share0_00038-705821464.vg) using<br />
# lvcreate -l 163 -n euroliblv share0<br />
<br />
The LV was formated using mkfs.ext4 and mounted. Again, messages said:<br />
Sep 9 17:17:30 share8 kernel: EXT4-fs (dm-28): mounted filesystem with ordered data mode. Opts: <br />
<br />
At very same time the first troubles were visible:<br />
Sep 9 17:17:34 share8 kernel: EXT4-fs error (device dm-27): ext4_mb_generate_buddy: EXT4-fs: group 704: 31712 blocks in bitmap, 21 in gd<br />
<br />
Since we recognized that dm-27 behaved badly and the filesystem got corrupted we tried to figure out what happend.<br />
<br />
diff'ing archive/share0_00036-1607104706.vg and archive/share0_00038-705821464.vg shows that the creation of dm-27 had no visible impact in LVM.<br />
<br />
Currently the system is in a state were device-mapper knows of dm-27 which is also known as:<br />
# df /dev/mapper/share0-ter_analyticlv<br />
Filesystem 1K-blocks Used Available Use% Mounted on<br />
/dev/mapper/share0-ter_analyticlv<br />
174758768 156585984 17124208 91% /shares/TER_ANALYTIC<br />
<br />
# ls -l /dev/mapper/share0-ter_analyticlv<br />
lrwxrwxrwx 1 root root 8 Sep 9 14:00 /dev/mapper/share0-ter_analyticlv -> ../dm-27<br />
<br />
but LVM doesn't know about the LV at all<br />
# lvdisplay /dev/mapper/share0-ter_analyticlv<br />
One or more specified logical volume(s) not found.<br />
<br />
The second LV created on that day (dm-28) is<br />
# df /dev/mapper/share0-euroliblv <br />
Filesystem 1K-blocks Used Available Use% Mounted on<br />
/dev/mapper/share0-euroliblv<br />
41858368 21107760 20493292 51% /shares/eurolib<br />
<br />
# ls -l /dev/mapper/share0-euroliblv<br />
lrwxrwxrwx 1 root root 8 Oct 5 15:42 /dev/mapper/share0-euroliblv -> ../dm-28<br />
<br />
Looking at the output of lvmdump -aclmu we found:<br />
# cat dmsetup_table|egrep '(euro|ter_)'<br />
share0-ter_analyticlv: 0 350224384 linear 8:32 3674212352<br />
share0-euroliblv: 0 83886080 linear 8:32 3674212352<br />
<br />
... same device (/dev/sdc), same start sector<br />
<br />
But<br />
# pvdisplay -m /dev/sdc<br />
only knows about<br />
Physical extent 1752 to 1791:<br />
Logical volume /dev/share0/euroliblv<br />
Logical extents 0 to 39<br />
<br />
<br />
Not very surprisingly both filesystems show (mostly?) the same content after repair.<br />
<br />
# pvdisplay -m /dev/sd[bcdef]<br />
shows all LVs in the system except for the one dm-27 uses. LVM is no aware of the LV at all, but device-mapper uses it happily destroying both filesystems of dm-27 and dm-28.<br />
<br />
These "lvcreate"s were done by a perl script calling "system(...)" and checking the return value. We would have noticed if lvcreate returned with an error. It didn't. So, the script continued with mkfs.ext4.
↧