Friday, July 13, 2007

Re: [BLUG] Unix conventions for controlling file access

On Fri, 2007-07-13 at 12:16 -0400, Dave Monnier REN-ISAC wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Mark Krenz wrote:
>
> > Its interesting to note that mysql seems to do this too:
>
> Wow, sure enough.
>
> So I wonder, how does this come into play with reserve blocks, quotas, etc?
> I wonder how much stuff depends on intact metadata.
>

Quotas, block counts, etc are all handled at the inode level -- not the
directory level. So they're just normal files. They just don't have a
name in the filesystem.

You can play around with it by doing something like this in perl:
-------------------------------------------
#!/usr/bin/perl

open(H,">foo");
print H "hello world!\n";
dumpstat("open filehandle",H);
link("foo","foo.link");
dumpstat("two links to filehandle",H);
unlink("foo");
dumpstat("foo is now gone",H);
unlink("foo.link");
dumpstat("foo.link is now also gone",H);
# the file still exists and has data in it
close(H);
# the OS now cleans up, since there are no links and
# the only process with it open has closed it.

sub dumpstat {
my($message,$thing)=@_;
my($dev,$ino,$mode,$nlink,$uid,$gid,$rdev,$size,
$atime,$mtime,$ctime,$blksize,$blocks)
= stat(*$thing);
print $message,"\n";
print " inode: $ino, nlink: $nlink\n";
}
------------------------------

The filesystem uses the inode as the basis for all file operations. The
names are just syntactic sugar for us mere mortals! There's a couple of
side effects of this design decision:

* Each file (the contents & metadata) is uniquely identifiable by a
pair: (device number, inode number). On the above perl script (and
using the 'stat' program) we can see this information:

File: `inode_fun'
Size: 549 Blocks: 16 IO Block: 4096 regular file
Device: fd00h/64768d Inode: 4718621 Links: 1
Access: (0664/-rw-rw-r--) Uid: (11907/bdwheele) Gid: ( 500/bdwheele)
Access: 2007-07-13 12:34:07.000000000 -0400
Modify: 2007-07-13 12:34:03.000000000 -0400
Change: 2007-07-13 12:34:03.000000000 -0400

The only thing that's in the directory entry is the name 'inode_fun' and
the inode number. The 'Device' part comes from the OS, and everything
else comes from the inode.

* any number of names can point to a single set of data. Unlike DOS
complaining about "cross-linked files" when running chkdsk, having
multiple names for a file is a feature.

* mv only changes the name by adding a new link in the destination and
removing the old link in the source. No data is copied. This is why on
early versions of unix you couldn't mv across filesystem boundaries (the
inode numbers are not unique across devices). GNU's mv handles this
case by making a copy.

* The modify and change times, which seem redundant, actually measure
different things. The change time (ctime) is when the inode (meta) data
was changed: ownership, permissions, size, etc. Modification time
(mtime) is when the content was changed. Usually changing the content
changes the file time, so mtime and ctime are the same, but that's not
necessarily true.

* that goofy remove a file while its open thing works :)

* if you don't create enough inodes at filesystem creation time, you
could end up where you couldn't create a new file even though there were
disk blocks free. Its not a big deal these days, but it happened to me
on floppy disks a couple of times, since you want to conserve as much
space as possible by not "wasting" space for the inode tables. df -i
will give you the inode statistics.


That's all I can think of off the top of my head. Hopefully it put
everyone into a dreamy Friday afternoon sleep!

Brian

_______________________________________________
BLUG mailing list
BLUG@linuxfan.com
http://mailman.cs.indiana.edu/mailman/listinfo/blug

No comments: