|
Key ideas of the Arusha ProjectThese are some notes on the key ideas of the Arusha Project (ARK) which spread across various parts of system administration. (These ideas follow from the project motivations...)Serious (but fun) sysadminThe Arusha Project is about serious professional care and feeding of heterogeneous Unix systems of many dozens of machines. "Sysadmin of convenience" is not our thing. The solo sysadmin with a few Linux boxes may or may not benefit from what we're trying to do.Besides all the usual sysadmin desiderata (reliability, resilience, fantastic uptimes, etc.), we are concerned about things such as: being able to account for what's on every disk on every box; a consistent "user interface" across all systems; smooth migration of packages/users/types-of-kit into or out of the overall system; complete and absolutely accurate documentation about everything (yeeps!); being able to roll back system configurations; having good log info of who did what; etc., etc. At many sites, they get "serious" about sysadmin by piling on the "process" or "methodology" -- but hey!, that looks like soul-destroying bureaucracy to us :-) We reckon that putting together great systems and improving them should be, well..., fun, and something's wrong if it isn't. User-centered systemsIt is all too easy to develop a system that's great for the sysadmins; too bad about those pesky users. (Of course, there are many systems where users aren't really a factor.)Part of our "user-centeredness" is because we happen to be interested in systems where the users need a rich computing environment -- lots of tools, the opportunity to "try new stuff", the odd piece of weird-and-wonderful hardware, etc -- e.g. some university departments. Bureaucratic solutions of the form "every X will do exactly Y, no exceptions" do not really appeal to us. But a user focus has a much deeper effect: it changes what sysadmin is all about! Sysadmin is about making effective users, not just effective systems. We therefore consider it within a sysadmin's purview to deal with questions such as "How do I keep from being overwhelmed by e-mail?", "How do I become an expert Emacs user?" "What's the best tool for simple line drawings?" In this context, our notion of sysadmin "added value" is much wider than most people's. The Arusha Project is about collaborating on these wider forms of "added value", too. Manage for (systematic) changeAt the heart of good system administration is being good at change. This is different from the more typical sysadmin focus, namely "just get it to work".It should be easy/quick to install (or remove) a package/user/whatever. It should be easy/quick for documentation related to that something to come to reflect that change (either automagically or by hand). Etc. "Quick hacks" are easy to get into, but can be hard to get out of. Part of being good at change is anticipating where you might want to go with your systems (and why), and planning/designing accordingly. An easily-changed system, however poor, can become a good system, because you're good at the one thing that will get you there! Manage for long-lived, entropy-free systemsLeft to its own devices, a computing infrastructure degrades over time, if for no other reason than the users' changing requirements.Many people regard this "bitrot", or "entropy", as inevitable. You put in a system, and it's good. It slowly degrades. You eventually have to throw it out and invest in a new one. We think that's bunk. With the right discipline, we can have systems that remain ``leading-edge'' for decades. A sysadmin equivalent to open-source software developmentThe Arusha Project (ARK) is a stab at a sysadmin equivalent to open-source software development. (Well, in part, it is open-source software development.)System administration is about "adding value" to the base systems provided by vendors. Some of a team's added value is utterly unique, some is the same as the next guy. ARK is fundamentally about providing a way to express unique "added value" (systematically) but also being able to use "added value" that others have made available. Systematically-described "added value" is valuable even if it isn't actually "re-used". That's because it is nearly as useful to see complete worked-out examples. If I'm trying to set-up/improve a Web server, nothing is more helpful than a handful of known-to-be-correct, working examples to study. Besides some technology and processes, a "sysadmin equivalent to open-source" will require an amenable "culture". We're not sure what it will be, but we're in favor of it :-) (See our sysadmin history for a little more about sysadmin and open-source.) Abstraction above existing toolsWe are not even slightly interested in reinventing all the tools needed for Unix system administration. Loosely speaking, we want to put a "clean" abstraction layer above existing tools, and add things only when we have to.Object-oriented view of all sysadmin entitiesIf you want across-the-board automation of sysadmin activities, you need to be able to manipulate (abstract representations of) all the "entities" that sysadmins deal with: software packages, machines, users, vendors, maintenance agreements, etc., etc.A crucial idea in ARK is to think of all these things as objects, with attributes and methods (i.e. the usual object-oriented programming gig). We want to have code that can manuipulate all of those things in a uniform framework. We want to write code that looks like...
me = usersMgr.lookup("Tommy Kelly")
top_box = me.mainMachine()
support = top_box.supportContract()
if support == None:
print 'I should get a main machine that is supported'
Obviously, there may be many instances of a particular kind
of thing at a site. For example, all the users are
"instances" of a user class. We describe the
attributes/methods of a user in the class, and that then
applies to all instances thereof. (As I say, obvious.)
One can also imagine "attributes/methods" of users that are common to many but not all of them. For example, all the 4th-year students might have home directories on one file server, but the 3rd-years are somewhere else. A very important ARK goal is that information is specified once. We want to be able to say that user "John Public" is a 4th-year, and then have all the relevant information that applies to all 4th-years automagically appears when we ask about John. However, if John Public is an unusual 4th-year and happens to have a special file server for his home directory, then we must be able to override that attribute just for him (which is otherwise common to all 4th-years). One way we do this is to distinguish between actual users and prototype users. "John Public" is an actual user, and he derives info from the "4th-year" prototype user. This exact same way of looking at things applies not to just to users, but to packages, hosts, contracts, contacts, user roles, etc., etc. (Our object model document explains various aspects of this ``object thinking'', and our configuration language document explains how to write ARK ``objects'' of your own.) Not just once per site, but once per planetThe mechanisms outlined so far suggest how we avoid information duplication within a site. ARK tries to go further.What you need to know about users, packages, contracts, machines, etc., etc., at one Unix site is much like what you need to know at another. If I have a Sun Ultra-10 and you have one too, then there's a whole bunch of information that we could "common up" -- info about busses, L1 cache size, disk type, etc. Why should we duplicate each other's work in figuring out what info to record, finding the info, typing it in, and maintaining it? Similarly, if I install GCC 2.95.2 for Solaris 7, and you do the same, the odds are that we did very similar steps and worked around the same problems (if any). Therefore, fundamental to the Arusha Project is the idea that you can build up the "information"/"description" for your team/site from that of collaborating teams anywhere on the Internet. For example, say you want to install the GNU coreutils-5.2.1 package. It is very likely that you will do this in a way (almost) identical to every other team that installs it. That "standard way" of doing the install should not have to be specified for each team; rather, it should be specified once-for-all under some "core" team, then all teams who wish can say, "if I don't tell you explicitly, try to get the info from that team over there". "Site at a time" thinkingThe Arusha Project (ARK) is concerned with systems made up of (many) "dozens" of machines. At that scale, walking around from box to box, doing/fixing something, is absolutely out.Even ad-hoc hand-typed for-loops that run some command across multiple machines (e.g. ... for i in ... ; do ssh $i /some/command ... ; done...) are kinda uncool. Instead, we'd prefer commands/object-methods/whatevers that "naturally" work across a whole site. Automation is everything (GUIs are not)We're more interested in automating (as much as possible of) sysadmin than we are in presenting pretty screens about same (i.e. a GUI). We're not saying it isn't important, but it's not our thing.(Slightly contradictory stuff: under The Documentation Problem.) `Staging' is goodWe've commented elsewhere about the sharp knives problem, i.e., a sysadmin can wreak lots of havoc quickly, especially with good automation tools.One general idea that ameliorates the problem is staging; that is, separating the final "make it go live" step of an activity from all the preparatory steps that come before. By staging the overall activity as two parts (prepare, finish), we can insert an intermediate activity of "check that things are likely to work when we push the Big Red Button". Another variant is to do the "finish" activity on a subset of hosts (or users or maintenance agreements, etc.) So, we might finalize our "tcp wrappers" changes on one or two test hosts, satisfy ourselves that things are OK, then finish the job for all other hosts. Sometimes, `staging' is also `separation of concerns' (teasing an activity apart into its constituent elements), which is good from a design point of view. If we get better control of the steps/stages of an overall activity, new possibilities emerge. An example is how, in package management, we distinguish between `installing' (creating the needed set of bits), `deploying' (setting up those bits on each host), and `revealing' (the package to the users). By separating them, we can do interesting things at each stage. A system should have its `source code'With software, you have the bits you run, and you have the source code -- stuff that human beings crafted, and from which the bits-you-run can be automatically recreated. The source code is a minimal capturing of the information content of the software. The executable bits are merely a useful elaboration of the source.A computing infrastructure should have its `source code', too. You should be able to point to a pile of human-created bits and say, ``With those, I could recreate my infrastructure exactly.'' Once you have ``source code'' for your infrastructure, you can do all the obvious Good Things that we do with software source. One obvious example is to apply configuration management to our sysadmin "added value" (source code). However you do it, it isn't optional -- all those files that you tweaked in /etc must be under config-management control. We are agnostic about what configuration-management system to use; we are likely to show a CVS bias, just because that's what we have. In using ARK, a team is likely to draw on the offerings of other teams. Each team's "stuff" will come from one or more "repositories", which could be a CVS repository, or a .tar.gz file sitting in the corner. Sysadmin without Magic ConstantsSysadmins write a lot of scripts and such which couldn't possibly be reused at another site. The main reason is that the scripts are littered with "magic constants" -- hardcoded pathnames, local hostnames, email addresses (part of the general category of Immovable Local Realities) -- that don't make any sense at another site.The core ARK code provides ways to keep all those "magic constants" in team-specific places, so that scripts can be written to be generic. The Documentation ProblemSystem administration is about systematically encoding the "value added" to the base systems provided by vendors. But how great is it to have all that carefully developed "added value" if you can't find what you want? A sysadmin team of (say) five good people will, over a few years, generate piles of stuff, some of it very complex, a lot of it fiddly, and with more than a fair share of delicate interactions and dependencies among the various parts?How on earth do you "document" such a pile of sysadmin added value? It's a really really hard problem, and we don't know of a good solution. It's one of the main pieces of the puzzle that we don't have a handle on yet. ConstraintsCertain objects' methods shouldn't be invoked unless certain constraints are satisfied. For example, some packages require that a special user be set up for their use. The "check-install" method for such a package should not give the all-clear unless the special user exists (a constraint). Conversely, invoking some methods cause constraints to be satisfied.A simple constraints mechanism exists in ARK; see the documentation (???). |