Arusha Project
ARK (mechanism)
 
How to..., etc.
ARK objects
Configuration language
   ARK basics
   ARK cookbook
   Reference
   Constraints
   ARK macros
   Granularity, proxy hosts
   hosts,proxy-hosts flags
   Method recording
   Field index
Package mgmt
`ark' tool
ARK site gen
 
Design, etc.
foo.bar()
Grokking the code
Problems
Ideas
 
Admin, etc.
Glossary
Technologies
Conventions
 
Hosted by
SourceForge.net Logo

The ARK configuration language

This is an early draft of this document. Your help with improving it will be most appreciated.

This document describes the ARK configuration language and how to describe ARK objects with it. It is arguably the central document of the Arusha Project. Structure:

Creating basic ARK configuration files.
(So you can do stuff straight away.)

The ARK cookbook.
A `cookbook' of techniques used in ARK configurations.

Reference manual.
The details.

Field index.
An index that points to comment about some ARK fields (e.g. <namespace-deploy-map>). Strictly speaking, ARK mandates very few fields and what you see in real configurations is team-specific... but that's not what you want to hear when you have a question.


Creating basic ARK configuration files

Easily the best way to create an ARK configuration (.xml) file is to copy (and modify?) someone else's. We supply oodles of examples you could start with.

Specifying a real thing at your site

The first common type of .xml file is for a real thing (e.g. package or host) at your site. An example might be:
<package name="make--3.79.1" xml-version="1">
<status>revealed</status>
<prototypes>
  <prototype team="." name="GNU"/>
  <prototype team="." name="ALL"/>
</prototypes>
</package>
Notable features:
  • The <package ... > line is standard.

  • The <status> field is ubiquitous in Sidai-style ARKing; status definitions for: a host, a package, a user...

  • A list of prototypes. Here's hoping you know about those already...

  • Nothing else! The more you pick up from your prototypes, and the less you have that is site-specific, the happier you will be.

  • Actually, there are a few extra things that often appears in site-specific .xml files; we refer you over to the ``cookbook'' section for the details:

Specifying a prototype thing to capture common knowledge

The second common type of .xml file is for a prototype thing (e.g. a host or package) that captures ``intelligence'' common to many things or across many sites.

The XML files supplied by team Sidai are exclusively of this type. (So you have many examples to look at; hint, hint.)

Sidai supplies some very general packages, e.g., GNU, which says how to build/install/deploy/reveal a typical GNU package.

You will typically write more limited .xml files; perhaps something like:

<package name="zlib" xml-version="1" prototype="yes">
<description>
zlib is a fairly standard compression library that other
programs use.
</description>

<configure>
  <constraint><dependency type="essential" name="."
                 on-method="host-linkfarm" /></constraint>
<comment>
**Why** yet another bizarre non-standard configure script???
</comment>
  <param name="PKG_PROTO_HOST">!proxy_for.idString</param>
  <param name="cc">@proxy-host:CC@</param>
  <param name="ldflags">@proxy-host:LDFLAGS@</param>
  <param name="prefix">!sidai_prefix</param>
  <param name="configure_args">--shared</param>

  <code once-per="hosts-supported"><![CDATA[
cd $PKG_BUILD_DIR/$PKG_PROTO_HOST

/bin/rm -f config.cache

prefix="$prefix"   \
CC="$cc"	   \
LDFLAGS="$ldflags" \
./configure $configure_args
]]></code></configure>
</package>
Notable features:
  • This prototype would presumably be used in conjunction with others. It is essentially overriding some standard configure method (e.g. from Sidai's GNU package), because the zlib maintainers have decided to go their own wicked way.

  • The <package ...> line must include prototype="yes".

  • ToDo: more

A team's team.xml file

Your team's team.xml file records some basic information about your team. What follows is an annotated example.

(The other team-ish file you must have somewhere is an ``ARK profile'' (pointed to by the ARK_PROFILE environment variable). Team `sample1' provides some documentation.)

The first part of a team.xml file is straightforward; as you will see here, a team can have prototypes, too. These prototypes only affect team actions (e.g. search for a `describe' method for a team), not actions related to a team's "things". For example, if ARK were looking for an `announce' method for one of this team's packages, it would use the package prototypes, not the prototype info shown here.

<team name="glasli1" xml-version="1">
<description>
The Glasgow SLI machines.
</description>

<prototypes>
  <prototype team="sidai"/>
</prototypes>
A <contacts> field is the sort of thing we expect to have more of. It isn't used anywhere at the moment.
<contacts><list>
  <item>partain@dcs.gla.ac.uk</item>
</list></contacts>
As with packages, a team.xml has a list of ``interesting directories'' (<ark-dirs>). This table is basically a set of macro definitions that are used throughout the ARK world. For example, if you see @team:ark-dirs:OUR@ in an XML file, it will be replaced with the definition you see below (/our).
<ark-dirs><table>
  <entry name="ARK_SRC"> /workspace/partain/ark </entry>
  <entry name="ARK_STATE"> /sys/ark-state </entry>
  <entry name="LOCAL"> /usr/local </entry>
  <entry name="LOCAL_DEPLOY"> /usr/local/.-ark-deploy </entry>
  <entry name="OUR"> /our </entry>
  <entry name="OUR_DEPLOY"> /our/.-ark-deploy </entry>
  <entry name="ROOT_DEPLOY"> /.-ark-deploy </entry>
  <entry name="VENDOR_STUFF"> /d/vendor-stuff </entry>
</table></ark-dirs>
(Note: you may put any pathnames that you like in those definitions; the ones shown are Sidai-esque `blessed' pathnames.)

If we need to force our way to a non-root user to do something ARK-ish, this is the actual user that we use:

<ark-sysadmin-user> partain </ark-sysadmin-user>
When we want to say, e.g., ``writable by the sysadmin group'', this is the actual group that we use:
<ark-sysadmin-group> sliadmin </ark-sysadmin-group>
If we want to force all ARK work to be done as a particular group (very useful for multiple cooperating sysadmins...), then we set the following:
<ark-sysadmin-group-must-be> sliadmin </ark-sysadmin-group-must-be>
(Don't set it if you have no such restriction, or are unsure; it can be added in later.)

Another useful setting when multiple sysadmins are working against the same ARK setup is:

<ark-sysadmin-umask> 002 </ark-sysadmin-group-umask>
</team>
The last part of a Sidai-style team.xml is a hairy-looking <site-spec> table (we have not shown one here). This table says what kind of ARK `things' are used at a site.

You can do without a <site-spec> table to start with, and add one in later on.


The ARK cookbook

This ``cookbook'' section describes techniques used in ARK .xml configuration files. Techniques range from the commonplace (e.g. passing parameters) to the product-of-a-sick-mind.
Passing <params>
ToDo

Specify root privileges for <code>
ToDo


Reference manual

Notation

Some notation:
  1. `.': the common thing. As a team, the current acting team (your site team). As a host, the host you're running on.

  2. `ALL': by strong convention, the catch-all proto-thing.

  3. The formal way to name a thing is <team>:<thing>. Of course, that's often just .:<thing>, which we often shorten to just <thing>.

Basic structure and definitions

  • Everything organized by team.

  • All sysadmin entities (a.k.a. things) are handled uniformly.

  • Every thing [entity instance] is of an entity type, e.g. `host', `package', `web site', etc.

  • Every thing has a name; there are no anonymous things.

  • Every thing has:
    1. A name
    2. A list of prototypes (zero or more)
    3. Zero or more fields; the fields are what get built up through the prototyping mechanism.

  • A thing's prototype tree is walked depth-first and left-to-right to determine its prototype path (the tree-walk without any duplicates).

  • A thing's field wibble has the general form (order not important):
    <wibble>
      <constraint> (zero or more)
      </constraint>
    
      <!-- the following, down through version, are NOT IMPLEMENTED yet -->
      <acl>     (zero or more)
      </acl>
      <comment> (zero or one)
      </comment>
      <provenance> (one or more -- generally machine-generated)
      </provenance>
      <quality> (zero or more)
      </quality>
      <topics> (zero or more)
      </topics>
      <validation> (zero or more)
      </validation>
      <version> (zero or one)
      </version>
    
      <param name="foo" type="bar">  (zero or more)
      </param>
      ...
      # a "value" (zero or one), which can be one of:
      <string> ... </string>
      <list><item>...</item> ...</list>
      <table><entry name="foo">...</entry>...</table>
      <code lang="..." privileges="..." redoable="..."  >...</code>
      <doc format="..."  >...</doc>
    </wibble>
    

Some examples...

  1. The simplest field would be one with a string value (here, `description'):
    <description>A long purple squishy thing.</description>
    

  2. Here is a field (called `compile') which has some params and a code value:
    <compile>
      <param name="MAKE" />
      <param name="prefix" />
      <code redoable="yes">
      $MAKE prefix=$prefix
      </code>
    </compile>
    

  3. Fields can be partial (don't include values). The most common use might be something like (two partial fields)...
     <install-bits>
      <constraint><host-spec name="hppa-hpux"/></constraint>
      <param name="MAKE">/usr/bin/make</param>
     </install-bits>
     
     <install-bits>
      <constraint><host-spec name="solaris"/></constraint>
      <param name="MAKE">/usr/ccs/bin/make</param>
     </install-bits>
    
    ... which sets a parameter MAKE in a system-dependent way.

  4. A typical query we might make of our configuration data is: e.g., ``What is the value of field disk-configuration for the entity of type host named web-server?''

    We gather together (in order) all occurrences of field disk-configuration in the XML files along host web-server's prototype path, and combine them.

    Combining is a non-trivial process, but at least there is only one such process! Some detail given below.

All the pieces of a field

The non-value parts of a field can include:
constraints
Things that must be true in the system for this field to be valid/make-sense. They come in various flavors (e.g. ``essential'' and ignorable).

The main example at the moment is package dependencies.

Constraints are also the ``guards'' of the configuration language. If all of a field's constraints are not satisfied, then this field is ignored.

The main constraint of this type is host-specs.

Constraints are a pretty big topic of their own.

acls
ACLs (Access Control Lists) specify who can do what to this field, notably change it.

Don't have a clue how this might work. NOT IMPLEMENTED YET.

comment
A chance to wax eloquent about the why's and wherefore's of this field. Would expect it to appear in a Web page generated about this entity.

provenances
Comprise a record of where (i.e. what version of what XML files) all the info about a field came from. Expect the provenances to be entirely machine-generated. Would feature in documentation about this entity.

(Implementation is still in early development. Revelant: method recording...)

quality
(Need a better name.) Where you can say things like, ``A horrible hack'', ``Works, but needs to go through code review'', ``To be revisited in May 2002'', etc. The idea is to be able to go through all configuration data and extract the fields that are weedy and/or need attention.

NOT IMPLEMENTED YET.

topics
(From ``topic maps'', an obscure XML thing.) Essentially, index entries that you could use if creating a huge cross-reference.

NOT IMPLEMENTED YET.

validations
Code that can be run to show that the information in this field is correct.

NOT IMPLEMENTED YET.

version
A number; other parts of the prototype tree can insist that they meet up with this version of things...

NOT IMPLEMENTED YET.

params
Well, um, parameters... Field values (<code> and otherwise) can reference param values (details below).

A field value can be one of:

string
Straightforward.

list
A list of <item>s, each of which can be of any type (string, list, table, code, ...).

table
A hash table of <entry>s, each of which blah blah...

code
Code that may be run. Oodles of possible attributes:
lang
The language the code is in: sh, python, or perl. Default: sh.

recordable
If yes, then we record in the ARK state dir that this code has been run. Default: yes. See also: about method recording.

redoable
If yes, then it's safe to re-run the code. Default: no.

interactive
The code may do something that requires interaction with the user. Default: no. BARELY IMPLEMENTED.

dangerous
If yes, means the code could do something Very Nasty to your system, e.g. leave you without a password file. Default: no. NOT IMPLEMENTED YET.

pretendable
If the --pretend (dry run) flag is on, we can still run the code (rather than just display it) and no ``real work'' will take place.

privileges
Code needs to run with the privileges of the user/group specified.

The most specified way to write this is privileges="<usr>:<grp>", in which case we run as user <usr> and group <grp>.

The part before or after the colon (:) may be omitted, in which case the default user or group (as appropriate) is used for that part.

The same (defaulting) is true if the user or group is given as '.' (dot).

If the user is defaulted and we are not root, then we use the user specified in <ark-sysadmin-user-must-be> (NB: not often set). If no such thing is specified, then the current user is used.

If the group is defaulted and we are not root, then we use the group specified in <ark-sysadmin-group-must-be> (NB: more commonly used). If no such thing is specified, then the current group is used.

Special cases: privileges="root" really means privileges="root:root", privileges="." really means privileges=".:." (i.e. default everything).

If privileges is not specified at all, default everything.

once-per
The `granularity' of the method code; i.e., how much it needs to be run for it to have been run for the whole site. Choices include: host (must run on every single host), site (just run it once), hosts-supported (roughly, once per platform), or according to a user-specified table. Please see: method granularity and proxy hosts... Default: host.

proxy-hosts
Specifies the host(s) that are allowed to act as proxy host to get a method run. Default: 'ALL'. See also: method granularity and proxy hosts...

doc
A documentation fragment. Attributes:
format
The format the fragment is in: text, html. Default: text. (Other possibilities include: pod, sgml, ???)

eval
Code that, when run, will produce a value of one of the types (string, list, table, code, ...). (An Arusha backtick!)

NOT IMPLEMENTED YET.

ref(erence)
A reference to another entity. Haven't decided about this. NOT IMPLEMENTED YET.

Constraints

An ARK field, partial or not, can have constraints. These constraints must be satisfied/true before the other parts of a field (params, value, etc.) come into play. As such, constraints act as guards over the other field information.

ARK constraints are slightly different from other languages, in that they have a `make it so' property. If a constraint is not satisfied, but can be made so, then we do that, and then consider the constraint satisfied.

There are three variants of constraint in ARK land.

  1. The <host-spec> constraint

    ToDo: finish!

  2. The <dependency> constraint

    A <dependency> constraint encodes a dependency between two ARK methods. For example, an ARK package's compile method probably depends on its assemble-source method having been run successfully.

    While dependency constraints are typically between methods of the same object (as in the example given), they can be on any method of any object. So, for example, a string-valued field could depend on a particular user method:

    <foo>
      <constraint><dependency>
        type="essential" what="user" name="joe" on-method="create" 
      </dependency></constraint>
      <string>
      Hi, Mom!
      </string></foo>
    
    ToDo:finish!

  3. The <general> constraint

    ToDo: finish!

The ARK macro language

Setting a <param> often looks like this:
<param name="MAKE"> @host:GNU-MAKE@ </param>
That @...@ stuff is the ARK macro language. It is a convenience thing; strictly speaking, you could do without it (but you really wouldn't want to). The main syntax is:
@<object-type|self|param>:<field-name>[:<index>[:<separator>]]@
Some examples therefore:
@proxy-host:PERL@
Returns the <PERL> string-valued field.

@host:ip-addresses:0@
Returns the string in item 0 of the <ip-addresses list-valued field.

@host:ip-addresses:*:,@
Returns a comma-separated list of the items in the <ip-addresses> list-valued field.

@host:ark-dirs:FOO@
Returns the string in the entry indexed by `FOO' in the <ark-dirs> table-valued field.

An anomalous form that we also support is @package:indexed:FOO@ (ToDo: say more).

The macro expander simply looks for things it knows how to expand, and keeps going as long as it keeps finding things. (Yes, you can probably make it loop.)

You can get nested macros to work; so, for example:

@host:run-level-dir:@param:RUN_LEVEL@@
The expander will spot that @host:run-level-dir:@ is a bad spec, so it skips it and matches the @param:RUN_LEVEL@ one. Having replaced it, it looks again and finds @host:run-level-dir:3@ (or whatever) and successfully matches it.

Method granularity (once-per) and proxy hosts

You will eventually bump into a tricky ARK questions of the form, ``How much must this code be run in order to do the whole site?'', and ``What host(s) will it run on?'', or perhaps ``Why did (or didn't) it run on that host?''

More formally, it boils down to two issues:

  • granularity: Some code needs to run just once per site (e.g. unpacking a tarball), and some code needs to run on every single host. And some needs something in between: we probably run a compile once per platform, not once per host.

  • proxy-ness: (to coin a horrible term) If we don't run every method on every host, then obviously some host(s) are acting on behalf of other hosts; i.e. the acting hosts are behaving as proxy hosts.

Here, we lay out the whole story on method granularity and (proxy) host use in ARK.

What are we saying, host-wise, when we invoke the following?

ark package compile ALL
In principle, we are saying to run the compile method for ALL packages on ALL hosts. Site-at-a-time thinking demands this.

(You can limit the set of hosts with the --hosts and --proxy-hosts command-line options; see the details.)

Next come constraints -- they kick up more work to do. Continuing with our compile example above, what if the compile method included...

<constraint><dependency type="essential" name="." on-method="depend" /></constraint>
The ARK machinery will check whether or not the depend method for ALL packages on ALL hosts (modulo --hosts, of course) has been run, and (of course) run it, before doing the compile method.

Imagine we have 300 hosts, of three flavors (say, Linux, Solaris, and AIX). If you have a big bunch of packages (ALL) and a hefty chain of dependencies (compile->depend->configure->patch-source->assemble-source...), you are looking at a HUGE pile of system crunching!)

And much of that crunching is entirely pointless. OK, admittedly, the don't-redo-the-dependencies thing saves lots of work, but that still leaves: Why would you compile on 300 hosts, when 3 hosts (one of each flavor) would do? (Standard ARK disclaimer: but if that's what you want to do, no problem!)

Enter granularity! This is specified by the once-per attribute of a <code>. It can be:

  • once-per="host": (the default) The method code must be run on every host.

  • once-per="site": It must be run once for the whole site.

    (Note that this is really just shorthand for doing once-per="hosts-supported" with a table that only lists the ALL proto-host.)

  • once-per="hosts-supported": It must be run once for each (proto-)host in the list <hosts-supported> minus those in <hosts-not-supported>. This is a special case of...

  • once-per="something-else": It must be run once for each (proto-)host in the list <something-else>. NOT IMPLEMENTED YET.

If we aren't doing the obvious and running a method's code on every single host, then some host(s) is/are plainly acting as proxy hosts for other hosts at the site.

We control this proxying behavior with the proxy-hosts code attribute. The default depends on the once-per code attribute. proxy-hosts can be:

  • none: No proxying, use the real host.

    This is the default if once-per="host".

  • . (dot): Any old host can act as the proxy host. (The implementation is very likely to choose the current run host.)

    This is the default if once-per="site".

  • some-table: Use <some-table> of (proto-host, real-host) pairs to find the real host.

If once-per="hosts-supported", then the default is proxy-hosts="proxy-hosts".

If once-per="something-else", then there is no default for proxy-hosts.

Not all combinations of once-per=... and proxy-hosts=... make sense. Here's the table that says what's OK:

once-per=...hostsitehosts-supportedother
proxy-hosts=none OKno nono
proxy-hosts=. OKOK no[1]OK
proxy-hosts=some-tableOKOK[2]OKOK

[1] This one could be `OK', but we're doing the wimp choice for now.

[2] We look up proto-host .:ALL in <some-table> to find the real host to use.

The --hosts and --proxy-hosts flags

Please see the relevant section about how to run the ark tool.

Method recording

An ARK method has a recordable property (default: yes). If yes, then, when we run the code, we record that we have done so, somewhere in the ARK_STATE (a team field) directory.

ToDo: more


Field index

ToDo


© The Arusha Project, 2000-2003; team: ARK; c/o partain@users.sourceforge.net; revision 1.23, 2004-05-26.