Arusha Project
Sidai (policy)
 
Ideas, etc.
Multiplatform-ism
Good namespaces
Source-ism
 
Design, etc.
ARK strategy
Blessed names
DChunks
Security model
Config mgmt
 
Tools
Sidai tools
ARK templates
Sidai pkg mgmt
*-config packages
Proto-packages
Sidai host mgmt
Sidai user mgmt
Sidai mailing-list mgmt
 
See also:
Sidai how-to, etc.
Sidai culture and opinion
 
Hosted by
SourceForge.net Logo

Managing `dchunks' ("disk chunks")

These are some notes on the Arusha Project (ARK) notions for managing your ``file space''. It can best be described as a "sketch" at the moment...

We assume "disk management" has already happened: the disks are in, running, formatted, partitioned, RAIDified, LVMed, etc., -- and a filesystem has been put onto each resulting "blob of disk space".

We are now interested in how those "blobs of space" are divided up, used, and managed.

We take no further interest in the "blobs of space" used for base-system stuff -- /, /usr, /tmp, /var and so on. We are concerned with what's left: where we put home directories, project space, add-on tools, etc. -- the overwhelming majority of most sites' disk space.

Issues

Some sites we've seen handle "file space management" as follows: (a) give each partition the first random name that occurs to you, adding an /etc/fstab entry accordingly; (b) add a nearly-as-random entry to /etc/exports (or equiv) (c) ditto for some automount map. So, what's wrong with the picture?

  1. Users (or programs) will start using these now-named chunks of space, and it will be difficult to change the names without causing pain. Or to recover "unused" space, or to move things around on disk, or ... As we discuss elsewhere, how you name file-stuff has wide implications.

  2. Relatedly, it's easy to lose track of what stuff on disk is, or who owns it, or who cares about it, or what it's for...

  3. The entry-in-automount-map approach gives all using hosts the same "view" of the data, with the same mount options, etc. One can imagine wanting finer control (e.g., "student machines get read-only access").

  4. The obvious scheme doesn't do any number of checks that one might imagine -- "everything in each student's home directory must be owned by that student", "no setuid-root binaries", "everything in here should match that tripwire manifest", etc.

  5. Replication and backup issues: One can imagine wanting the automagic replication of chunks of stuff, either for performance reasons (e.g. multiple copies of commonly-used tools on the network) or reliability reasons (rsync user home directories four time a day [between nightly backups]). And: are you sure all of your partitions are in your backup tables?

  6. Mobility issues (laptops): what about chunks of disk space that disconnect and wander away for a days at a time? What about sync'ing and backing up that stuff?

The basic idea

Easy stuff first: /etc/fstab entries. As per our blessed-names doc, for each machine, we name each (virtual-)device-with-a-filesystem-on-it as /._disk<n>, e.g. /._disk1, /._disk2, with entries accordingly in /etc/fstab. (Actually, we might generate /etc/fstab from the ARK information about a host [its XML file]...)

All user access to what's in that disk space will come through automount map entries. These will always look like...:

<thing>  <hostname>:/._disk<N>/<mapname>/<thing>
Now, here's the important thing: All automount maps are automagically generated from higher-level information, and each host may see different automount maps.

Key data structure: dchunks ("disk chunks")

In discussing ARK package terminology, we noted that a package is a "logically coherent bundle of bits" but that they need not be "co-located".

Our concern in dchunk management is bundles of bits that are, or must be, co-located. Let's call such a bundle a "disk chunk", or dchunk for short (pronounced ``duh-chunk''). (Better suggestions more than welcome!) Such a beast can be moved from disk to disk, might be replicated, might be mounted with different options (on different hosts), may have checkable properties (e.g., "all files owned by...") ...

The data about a specific info chunk is in its (XML) configuration file. An example might be:

ToDo: OBSOLETE OBSOLETE OBSOLETE (need version 2)

<dchunk name="workspace-partain" xml-version="1">
<status>active</status>
<prototypes>
    <prototype team="." name="workspace-ALL" />
</prototypes>

<master-hosting>slicker:/._disc2/workspace/partain</master-hosting>

<part-of>slicker-disc2</part-of>

<user-visible-as>/workspace/partain</user-visible-as>
</dchunk>

Scope of a site's dchunk information

A site will have a bunch of .xml configuration files for its dchunks. From those, we expect to be able to generate all of the following:

/etc/fstab for each host:
Some dchunks are marked as being filesystems and, from these, we should be able to generate /etc/fstabs.

(Hmmm... I don't know if it's really worth it. I think it probably would be worth an fstab-checker, notably to make sure that everything in fstab is getting backed up correctly...)

/etc/exports (or equiv) for each host [assuming NFS]:
Instructions for NFS-serving hosts.

automount maps for each host [assuming NFS]:
Instructions for NFS-client hosts.

(We should be able to check also that the servers' setups match sanely to the clients' setups.)

/etc/auto_master for each host [assuming NFS]:
(NB: don't do this yet...)

ditto all of the above, but for Samba...
... or other non-NFS file-sharing mechanism

Backup tables
e.g. disklist for Amanda

We would like to be certain that everything that needs backing up is backed up, correctly. Maybe also: generate random restores-you-should-try for testing purposes.

Replication scripts
If we are replicating some disk chunks (for performance or reliability reasons), then the dchunks systems should generate the scripts for us.

Checking code
Possibly automagically run from cron

Documentation about what's where
 

System cross-checking with dchunks info

If we combine...
  • dchunks configuration info
  • package configuration info
  • user configuration info
... then one can imagine a whole variety of cross-checks that are possible. E.g. finding files owned by now-deleted accounts.

We can also garbage-collect our disk space! Because all the space is systematically named and accounted for, any file/dir that can't be linked back to our XML tables can be ruthlessly deleted. (Most satisfying when you do that :-)


© The Arusha Project, 2000-2003; team: sidai; c/o partain@users.sourceforge.net; revision 1.8, 2004-05-26.