We use TMPFS for storing transient scratch data, writing out thousands of files every minute to timestamp-named directories; as such, we necessarily have to clear stale data with a cleanup process that simply removed old, and therefore no longer relevant, files and directories.<br />
<br />
Recently, we encountered an issue where one of these temporary directories could not be removed by our 'rm -rf' cleanup job; even as root, we were unable to move/delete it:<br />
<br />
# rm -rf 1381276560<br />
rm: cannot remove directory `1381276560': Directory not empty<br />
<br />
Which was very confusing, since upon inspection, it appears to contain no files:<br />
<br />
# ls -la 1381276560<br />
total 0<br />
drwxrw-r-- 2 user user 60 Oct 8 19:58 .<br />
drwxrw-r-- 5 user user 100 Dec 4 18:10 ..<br />
<br />
We also confirmed that there were no open file handles point to this directory using 'lsof +D'.<br />
<br />
Over the coming weeks, this occurred on a number of our servers -- which demonstrated that this was not just a random occurrence. On a hunch, we looked at the inode numbers of these unremovable directories, all of which were very close to 2**32:<br />
<br />
4294966751 drwxrw-r-- 2 user user 60 Oct 8 19:58 1381276560<br />
4294957948 drwxrw-r-- 2 user user 60 Oct 9 22:03 1381370460<br />
4294952539 drwxrw-r-- 2 user user 60 Oct 23 12:35 1382545980<br />
4294951887 drwxrw-r-- 2 user user 60 Nov 11 16:46 1384206240<br />
4294947758 drwxrw-r-- 2 user user 60 Nov 13 14:10 1384369680<br />
4294948806 drwxrw-r-- 2 user user 60 Nov 20 18:33 1384990260<br />
4294962748 drwxrw-r-- 2 user user 60 Dec 30 20:04 1388451720<br />
<br />
Which led us to postulate that perhaps we were running into an issue where there was a file created within each of these directories that had inode number 0, which is a reserved value that represents "deleted file not yet removed from disk" -- which could explain why rm, ls et al. don't show list it. <br />
<br />
According to <a href="http://stackoverflow.com/questions/4411701/how-are-inode-numbers-generated-in-linux-tmpfs,">http://stackoverflow.com/questions/4411701/how-are-inode-numbers-generated-in-linux-tmpfs,</a> [<a href="http://stackoverflow.com/questions/4411701/how-are-inode-numbers-generated-in-linux-tmpfs," target="_blank">^</a>] "the bulk of the tmpfs code is in mm/shmem.c., but it delegates almost everything to the generic filesystem code in fs/inode.c." The field "i_ino" of the inode struct handled by new_inode(), which simply performs a 'inode->i_ino = ++last_ino;', which is a 32-bit unsigned integer that can overflow. Only other filesystems, this value is typical overwritten by an unused inode number, but TMPFS does not appear to have any special handling for this.<br />
<br />
This, however, did suggest that the problem was related to the directory listing (in order to determine the file), rather than an issue with the underlying file itself -- so inspecting the filesystem's dentry (directory entries) for this directory name using low-level system calls [ e.g. getdents(2) ] should reveal a complete list of inodes and filenames.<br />
<br />
We modified <a href="https://raw.github.com/aidenbell/getdents/master/src/getdents.c,">https://raw.github.com/aidenbell/getdents/master/src/getdents.c,</a> [<a href="https://raw.github.com/aidenbell/getdents/master/src/getdents.c," target="_blank">^</a>] which was originally designed as a faster alternative to ls, so that it would only list files with inode number 0:<br />
<br />
- if( d->d_ino != 0 && d_type == DT_REG ) {<br />
- printf("%s\n", (char *)d->d_name );<br />
+ if( d->d_ino == 0 && d_type == DT_REG ) {<br />
+ printf("Inode number %ld: %s\n", d->d_ino, (char *)d->d_name );<br />
<br />
And much to our horror/delight, the mystery filename that neither ls nor rm could locate appeared out of thin air:<br />
<br />
# gcc getdents.c -o getdents<br />
# getdents 1381276560<br />
Inode number 0: 71A800181400<br />
<br />
This file was completely intact (i.e. contained the correct contents and typical file size for a file in this directory), and could be trivially deleted by name:<br />
<br />
# cat 71A800181400 | wc -c <br />
776<br />
<br />
# rm 71A800181400<br />
rm: remove regular file `71A800181400'? y<br />
<br />
At which point removing its parent directory was no longer an issue (directory block size was restored, etc.), and our problem went away.<br />
<br />
It's possible that it's remained unknown because the following things need to occur in order to get this unlikely situation to re-occur:<br />
<br />
1) have a server with sufficient uptime to generate ~4.3G files on a device with a reboot; and <br />
2) have the file that would be allocated inode 0 for that device created on the TMPFS partition; and <br />
3) trigger a process which deletes these TMPFS files without knowledge of their name; and finally<br />
4) try to delete the parent directory<br />
<br />
Nonetheless, we consider this a bug in TMPFS -- there's no reason to hand out a reserved inode number when starting again at 1 would be just fine, and thereby never encounter this issue.
↧