Arusha Project
ARK (mechanism)
 
How to..., etc.
ARK objects
Configuration language
Package mgmt
`ark' tool
ARK site gen
 
Design, etc.
foo.bar()
Grokking the code
   ARK:arkbase
   ARK caching
Problems
Ideas
 
Admin, etc.
Glossary
Technologies
Conventions
 
Hosted by
SourceForge.net Logo

Grokking the ARK code

The `ARK:arkbase' package

The all-import ARK ``engine'', which implements the ARK configuration language, is in the `arkbase' package, provided by team `ARK'.

Understanding foo.bar() is important in this context...

Anyway, here's a quick overview of the method to the madness (programmer notes).

  • Some easy Python modules first:
    • ark/utils.py -- simple utility code
    • ark/errors.py -- centralised list of exceptions raised

  • Similarly, we've ``borrowed'' some code from others (thanks, folks!):

    DPyGetOpt.py -- command-line option handling; by Bill Bumgarner

  • Some ``lower-level'' Python parts of the machinery:
    • The ark/control.py module -- handles all the global "program control", including command-line-option grokking and reading an ``ARK profile''; this module does all of it for the ark<thing> programs. The DPyGetOpt module (above) does all the heavy lifting.

    • ark/event.py creates and reads ArkEventRecords, which is how we record state about "what happened".

    • ark/xmlfile.py and ark/db.py provide all of the XML-grokking (style: DOM) that sits below ark/thing.py

  • For each sort of Arusha "thing" (team, package, host, user, ...), there is ...
    • ... a corresponding Python class (ArkTeam, ArkPkg, ArkHost, ArkUser, ...) in a corresponding module (ark.<thing> in ark/<thing>.py)

      All of those classes have ArkThing as a base class, where common-to-all code lives. In ark/thing.py.

      (ark/thing.py is easily the most important module.)

    • ... a corresponding "manager" class (ArkTeamsMgr, ArkPkgsMgr, ArkHostsMgr, ...) [from same module], which handles our "collection" of the relevant "things". In ark/<thing>.py; i.e. the <thing> and its "manager" are in the same module.

      All of these "manager" classes have ArkThingsMgr as a base class. In ark/thing.py.

      All of these "manager" classes are singletons (there is only one instance of each class).

    • ... a corresponding ark<thing> command-line-use program which lets you manipulate <things>. The syntax is:
      % ark <thing> <subcommand> [options] [thing1 thing2 ...]
      
      So, for example:
          % ark team describe ARK # says everything we know about team ARK
      
          % ark package build --pretend coreutils--5.2.1 # build that package
      
          % ark host restart-cron --verbose slimy slicker # restart cron daemons 
      
      Wherever you can specify a <thing>, you can alternatively specify a <proto-thing>. So (using a proto-host "ALL" instead of specific hostnames), you could do:
          % ark host restart-cron ALL # restart *all* cron daemons 
      

  • We use '-' as a word separator (e.g. "restart-cron") rather than '_' (e.g. "restart_cron"), solely to save trips to the shift key.

  • Running ark <thing> <subcommand> <things-spec> really does invoke method <subcommand> (NB: name is s/-/_/g) on a collection of <thing> objects! How does this work?

    Every Ark<Things>Mgr must supply an unpackSpecs method, which will take a <things-spec> and return a list of Ark<Thing>s.

    Any proto-things are expanded into their real-things equivalent.

    From here, we hand over to the description of what foo.bar() means/does...

  • In Python code, a variable named team, host, pkg, etc., refers to an Ark<whatever> object.

  • Most Ark* objects have an "id", something string-y that (more-or-less) uniquely identifies them. Given an "id" and the right "manager", you can do a lookup to get the Ark* object itself; e.g. team = teams_mgr.lookup(team_id).

    So, variable names called <something>_id are probably strings that correspond to the unique identifier for that thing; e.g. proto_host, team_id.

  • Similarly, the name-to-print-out is usually the idString attribute.

Caching in the ARK core

A site's ARK `database' is held in its collection of ARK .xml files. The ARK engine reads in these files (typically near the beginning of a run), creates Python objects (ArkThing derivatives), and uses those.

The ARK engine does a couple of kinds of caching at the file level. First, after it reads an .xml file, it `pickles' it (Python term), which all subsequent reads (ever after) then use. This gives a very big (10x) speedup.

Second, within a run, the info from a file is cached, and re-used on second and subsequent references. This saves going back to disk, even the pickled parts.

This scheme is well and good for normal ARK engine use. It is less good when the ARK engine is running for a long period; for example, a Web application server (such as the Sidai Webware one...) that runs for months at a time. Do you really want the application server to show you the ARK world as it was when the server started? No? Thought not.

For this reason, the internal caching (the 'second' kind, above) has an expiry mechanism. After 10 minutes (say), cache entries expire, and we re-read files. In this way, new and changed ARK info (in .xml files) will be picked up by long-running programs.

Digging deeper... Because assembling an ARK object from its prototypes is a fair bit of work, the engine caches the core information about objects that it puts together.

However, because the initial objects that we created may eventually be discarded (we no longer have confidence that the info in them is correct; their cached file information has expired), we have to, in turn, be very careful about references in one ARK object to another! That is, if one ARK object needs to store info about other ARK objects (e.g. its prototypes, or those that are derived from it), we must not store object references; instead, we store objects' ids, which are names that don't/cannot `time out'. In the code, the id field is always called idString.

Initially, we had all sorts of caching in the ARK engine (e.g. caching the list of all things of a particular type), but this has been switched off (2002.01) and will be added back in as need demands. (If premature optimization is the root of all programming evil, then removing it must be the root of all programming good... No?)

Another thing that is very dangerous is direct comparison of direct object references; e.g. host_ctxt == foo_host. We have to be careful to use a proper method instead, e.g. host_ctxt.sameAs(foo_host).


© The Arusha Project, 2000-2003; team: ARK; c/o partain@users.sourceforge.net; revision 1.12, 2004-05-26.