|
Managing `dchunks' ("disk chunks")These are some notes on the Arusha Project (ARK) notions for managing your ``file space''. It can best be described as a "sketch" at the moment...We assume "disk management" has already happened: the disks are in, running, formatted, partitioned, RAIDified, LVMed, etc., -- and a filesystem has been put onto each resulting "blob of disk space". We are now interested in how those "blobs of space" are divided up, used, and managed. We take no further interest in the "blobs of space" used for base-system stuff -- /, /usr, /tmp, /var and so on. We are concerned with what's left: where we put home directories, project space, add-on tools, etc. -- the overwhelming majority of most sites' disk space. IssuesSome sites we've seen handle "file space management" as follows: (a) give each partition the first random name that occurs to you, adding an /etc/fstab entry accordingly; (b) add a nearly-as-random entry to /etc/exports (or equiv) (c) ditto for some automount map. So, what's wrong with the picture?
The basic ideaEasy stuff first: /etc/fstab entries. As per our blessed-names doc, for each machine, we name each (virtual-)device-with-a-filesystem-on-it as /._disk<n>, e.g. /._disk1, /._disk2, with entries accordingly in /etc/fstab. (Actually, we might generate /etc/fstab from the ARK information about a host [its XML file]...)All user access to what's in that disk space will come through automount map entries. These will always look like...: <thing> <hostname>:/._disk<N>/<mapname>/<thing>Now, here's the important thing: All automount maps are automagically generated from higher-level information, and each host may see different automount maps. Key data structure: dchunks ("disk chunks")In discussing ARK package terminology, we noted that a package is a "logically coherent bundle of bits" but that they need not be "co-located".Our concern in dchunk management is bundles of bits that are, or must be, co-located. Let's call such a bundle a "disk chunk", or dchunk for short (pronounced ``duh-chunk''). (Better suggestions more than welcome!) Such a beast can be moved from disk to disk, might be replicated, might be mounted with different options (on different hosts), may have checkable properties (e.g., "all files owned by...") ... The data about a specific info chunk is in its (XML) configuration file. An example might be:
ToDo: OBSOLETE OBSOLETE OBSOLETE (need version 2)
<dchunk name="workspace-partain" xml-version="1">
<status>active</status>
<prototypes>
<prototype team="." name="workspace-ALL" />
</prototypes>
<master-hosting>slicker:/._disc2/workspace/partain</master-hosting>
<part-of>slicker-disc2</part-of>
<user-visible-as>/workspace/partain</user-visible-as>
</dchunk>
Scope of a site's dchunk informationA site will have a bunch of .xml configuration files for its dchunks. From those, we expect to be able to generate all of the following:
System cross-checking with dchunks infoIf we combine...
We can also garbage-collect our disk space! Because all the space is systematically named and accounted for, any file/dir that can't be linked back to our XML tables can be ruthlessly deleted. (Most satisfying when you do that :-) |