|
|
Grokking the ARK code
The all-import ARK ``engine'', which implements the ARK configuration language, is
in the `arkbase' package, provided by team `ARK'.
Understanding foo.bar()
is important in this context...
Anyway, here's a quick overview of the method to the madness
(programmer notes).
- Some easy Python modules first:
- ark/utils.py -- simple utility code
- ark/errors.py -- centralised list of exceptions raised
- Similarly, we've ``borrowed'' some code from others
(thanks, folks!):
DPyGetOpt.py -- command-line option handling; by Bill Bumgarner
- Some ``lower-level'' Python parts of the machinery:
- The ark/control.py module -- handles all the
global "program control", including command-line-option
grokking and reading an ``ARK profile''; this module does
all of it for the ark<thing>
programs. The DPyGetOpt module (above) does all
the heavy lifting.
- ark/event.py creates and reads ArkEventRecords,
which is how we record state about "what happened".
- ark/xmlfile.py and ark/db.py provide
all of the XML-grokking (style: DOM) that sits below
ark/thing.py
- For each sort of Arusha "thing" (team, package, host,
user, ...), there is ...
- ... a corresponding Python class (ArkTeam, ArkPkg,
ArkHost, ArkUser, ...) in a corresponding
module (ark.<thing> in ark/<thing>.py)
All of those classes have ArkThing as a base class,
where common-to-all code lives. In ark/thing.py.
(ark/thing.py is easily the most important module.)
- ... a corresponding "manager" class (ArkTeamsMgr,
ArkPkgsMgr, ArkHostsMgr, ...) [from same module], which
handles our "collection" of the relevant "things". In
ark/<thing>.py; i.e. the
<thing> and its "manager" are in
the same module.
All of these "manager" classes have ArkThingsMgr as a
base class. In ark/thing.py.
All of these "manager" classes are singletons (there
is only one instance of each class).
- ... a corresponding ark<thing> command-line-use program
which lets you manipulate <things>. The syntax is:
% ark <thing> <subcommand> [options] [thing1 thing2 ...]
So, for example:
% ark team describe ARK # says everything we know about team ARK
% ark package build --pretend coreutils--5.2.1 # build that package
% ark host restart-cron --verbose slimy slicker # restart cron daemons
Wherever you can specify a <thing>, you can alternatively
specify a <proto-thing>. So (using a proto-host "ALL" instead
of specific hostnames), you could do:
% ark host restart-cron ALL # restart *all* cron daemons
- We use '-' as a word separator (e.g. "restart-cron")
rather than '_' (e.g. "restart_cron"), solely to save
trips to the shift key.
- Running ark <thing> <subcommand> <things-spec> really
does invoke method <subcommand> (NB: name is s/-/_/g) on
a collection of <thing> objects! How does this work?
Every Ark<Things>Mgr must supply an unpackSpecs
method, which will take a <things-spec> and return a list
of Ark<Thing>s.
Any proto-things are expanded into their real-things equivalent.
From here, we hand over to the description of what foo.bar() means/does...
- In Python code, a variable named team, host, pkg,
etc., refers to an Ark<whatever> object.
- Most Ark* objects have an "id", something string-y that
(more-or-less) uniquely identifies them. Given an "id"
and the right "manager", you can do a lookup to get the
Ark* object itself; e.g. team = teams_mgr.lookup(team_id).
So, variable names called <something>_id are probably
strings that correspond to the unique identifier for that
thing; e.g. proto_host, team_id.
- Similarly, the name-to-print-out is usually the idString
attribute.
A site's ARK `database' is held in its collection of ARK
.xml files. The ARK engine reads in these files
(typically near the beginning of a run), creates Python
objects (ArkThing derivatives), and uses those.
The ARK engine does a couple of kinds of caching at
the file level. First, after it reads an .xml
file, it `pickles' it (Python term), which all subsequent
reads (ever after) then use. This gives a very big (10x)
speedup.
Second, within a run, the info from a file is
cached, and re-used on second and subsequent references.
This saves going back to disk, even the pickled parts.
This scheme is well and good for normal ARK engine use. It
is less good when the ARK engine is running for a long
period; for example, a Web application server (such as the
Sidai Webware one...) that runs for months at a time. Do
you really want the application server to show you the ARK
world as it was when the server started? No? Thought not.
For this reason, the internal caching (the 'second' kind,
above) has an expiry mechanism. After 10 minutes (say),
cache entries expire, and we re-read files. In this way,
new and changed ARK info (in .xml files) will be
picked up by long-running programs.
Digging deeper... Because assembling an ARK object from its
prototypes is a fair bit of work, the engine caches
the core information about objects that it puts together.
However, because the initial objects that we created may
eventually be discarded (we no longer have confidence that
the info in them is correct; their cached file information
has expired), we have to, in turn, be very
careful about references in one ARK object to
another! That is, if one ARK object needs to store info
about other ARK objects (e.g. its prototypes, or those that
are derived from it), we must not store
object references; instead, we store objects' ids,
which are names that don't/cannot `time out'. In the code,
the id field is always called idString.
Initially, we had all sorts of caching in the ARK engine
(e.g. caching the list of all things of a particular type),
but this has been switched off (2002.01) and will be added
back in as need demands. (If premature optimization is the
root of all programming evil, then removing it must be the
root of all programming good... No?)
Another thing that is very dangerous is direct comparison of
direct object references; e.g. host_ctxt == foo_host.
We have to be careful to use a proper method instead,
e.g. host_ctxt.sameAs(foo_host).
|