Arusha Project
 
Ideas, etc.
Motivations
Key ideas
Fundamentals
Deep Coolnesses
Ways to use ARK
Misconceptions
Sysadmin history
 
Teams
ARK (mechanism)
Sidai (policy)
Other teams
[User guides]
 
Admin, etc.
Mailing lists
Download
Bug reports
Who? How?
Why Arusha?
ARK papers/talks
References
Related stuff
Older news
ACKs
Support
License
 
Hosted by
SourceForge.net Logo

Motivations for the Arusha Project

The Arusha Project (ARK) provides a framework for collaborative system administration of multi-platform Unix sites with many dozens of machines.

Are you satisfied with your sysadmin?

Set aside your personal Linux box with which you can do what you like. Think when you were a lowly user at a "dozens of machines" Unix site, perhaps at a university department, or perhaps at work... Did you think it was a fantastic system? Did it have all the tools you could wish for? Did it tick along like clockwork? Were the sysadmins god-like figures with magic fingers and brains the size of small planets?

Well, probably not. Odds are you thought the sysadmins were bozos, and you could do better. Maybe. Maybe not. System administration is a hard (and mostly thankless) task; here, we outline the general problems that the discipline faces, and which we hope to solve.

Sysadmins are mediocre at their jobs!

A modern Unix system is built up from, what?, a few hundred subsystems... a kernel (hardly a simple monolithic beast), sendmail, BIND, various authentication services, Web server(s), hundreds of utilities, RAID goodies, and on and on.

On more than a few of these subsystems, you can make a seriously good living by being a true expert on that one thing alone.

A good sysadmin is probably pretty good at dealing with a handful of the above subsystems, and can make a passable stab at dealing with a bunch of the others. But s/he is bound to be a long way from a true expert across the board.

In this sense, sysadmins -- especially those who work alone -- are unavoidably mediocre at their jobs. The Arusha Project fights this inherent bias by making possible fruitful collaboration among sysadmins.

The basic problems we are trying to solve

Think about Unix sites with, say, 100 hosts, of more than one flavor (e.g. Linux+Solaris). A good system administrator (sysadmin) "adds value" to the machines-as-delivered-by-the-vendor in all sorts of ways: configuring printers, installing packages other than the vendor-supplied ones, designing/implementing a backups strategy, writing Web pages to explain "where things are", etc.

The Wayward Bus Problem

Our first "problem" is that a huge fraction of our systems' "added value" may reside only in the sysadmin's head, and, if the proverbial bus runs over him/her, then ... oops.

The Wheel Reinvention Problem

It is painful that the "added value" of a good sysadmin at Site A probably has many similarities to that at Site B. Without help, you have wheel reinvention on a grand scale. (NB: it won't be the same because, at the heart of sysadmin, you have Immovable Local Realities -- e.g. different printers, different "security guidelines", different critical apps, etc.)

The Isolation Problem

The sysadmin software that does get written (usually "scripts") is often of poor quality from a software engineering point of view. It is written by one person, not reviewed, not tested in any formal way, often not configuration-managed, etc., etc.

All too often, the underlying reason for this is that the sysadmin works alone, and has no effective "community of practice".

The Record-Keeping Problem

Sysadmins have lots of information that they should keep track of (in some place other than their heads). Stuff about hosts, users, vendors, potential vendors, maintenance contracts, other contacts, clients, old purchase orders, licenses, serial numbers, spare-parts inventory, helpline numbers and addresses, e-mail about all of the above, etc., etc., etc.

We would really like to manage all of the above in some uniform and consistent way.

The Setup-to-Disaster Lag Problem

When a sysadmin sets up a subsystem -- say, software RAID on a new Linux box... -- all the facts, figures, and details are in his/her head. They are on the case.

It is in the natural run of things that a Big Problem with this subsystem may be many years later. By that time, the facts, figures, and details are emphatically not in his/her head, even if his/her head hasn't wandered off to another job.

The Multiple-Change Problem

Obviously, if you have all that info about hosts, users, vendors, etc., you would like to operate on it. But not manually. Making the "same" change manually on 30 hosts is almost certain to leave behind mistakes. Instead, we need "site at a time" ways of doing things -- e.g. a single command to install a package across all required platforms, a single command to check something for all user accounts, etc.

The Sharp-Knives Problem

Sysadmins almost always have to make changes in the context of a live, production system. You can't say, "Everyone off! All systems going to single-user mode so we can upgrade Emacs!"

What's more, sysadmins are often doing potentially catastrophic things. Propagating a broken /etc/passwd (e.g. zero-length) will make your whole system quite useless, for example.

Sysadmins do their jobs with sharp knives in a crowded room.

The Documentation Problem

A site's pile of sysadmin "added value" will quickly become big/complex enough that it will be impenetrable to all but the original author, no matter how systematically and conscientiously recorded -- and you're back to the run-over-by-a-bus problem.

The well-collected pile of sysadmin "added value" needs to be presented in a useful way. Or, more precisely, ways. (It is unlikely that the same presentation will work, for example, for ordinary users and for Arusha collaborators.)

As "collaborative sysadmin" is a core aim of the Arusha Project, finding a way to make one's work presentable to others is a key problem.

Problems we are not trying to solve

We are not trying to solve the how-to-administer-my-Linux-box problem. Lots of people are doing useful work on that.

Another problem we don't plan to solve is the ``push a button and a cold, dead machine springs to life'' problem; i.e., the auto-configuration of an OS and an initial software set onto a box. (We may well take advantage of others' solutions to this problem, however.)

We are also not trying to solve how to put a GUI on the front of an Arusha-like sysadmin system. (We don't mind if others try, though.)

Deconstructing the project description

To recap: "The Arusha Project provides a framework for collaborative system administration of multi-platform Unix sites with many dozens of machines."

These words are carefully chosen. Here are the details.

"collaborative system administration":
Many sysadmins work in isolation and are, therefore, constantly reinventing the wheel. What's more, their often-never-reviewed "solutions" are frequently of poor quality (having been born in haste/panic :-).

We think some sysadmin equivalent to "open-source" collaborative software development is needed; that's what the Arusha Project is trying to figure out.

"framework":
We want to create a framework in which sysadmins can collaborate on solutions (using existing tools). (If we happen to create some useful tools of our own along the way, well, that's OK, too.)

"Unix sites":
That's what we've got and care about; we don't want to invest initial effort in other platforms (e.g. Windows). We welcome others' work to broaden the project scope -- we'd be happy to drop the "Unix" qualifier as long as we don't have to do it :-)

"multi-platform ... sites":
A common way to increase sysadmin productivity is to limit a site to one kind of machine; "we're a Solaris shop" -- that kind of thing. We are resolutely multi-platform, mostly because of the discipline it provides. (See: "multiplatform-ism")

"many dozens of machines":
We have already said we aren't particularly interested in the one- or two-machine sysadmin problem. At the other end, we don't want to play on the "thousands and thousands of machines" end of the scale. Or the "it must keep working 24x365, and it's $10M/hour if it doesn't" end of the scale.

So we characterize our niche as "many dozens of machines". Lab-sized; workgroup-sized; department-sized; "house of geek with lots of computers"-sized :-) -- we can grow up or down from there...

(If we were marketeers, we would've said "hundreds of machines". But we're not.)


© The Arusha Project, 2000-2003; team: ARK; c/o partain@users.sourceforge.net; revision 1.5, 2004-05-26.