|
|
The ARK configuration language
This is an early draft of this document. Your help
with improving it will be most appreciated.
This document describes the ARK configuration
language and how to describe ARK objects with it.
It is arguably the central document of the Arusha Project.
Structure:
- Creating basic ARK configuration files.
- (So you can do stuff straight away.)
- The ARK cookbook.
- A `cookbook' of techniques used in ARK configurations.
- Reference manual.
- The details.
- Field index.
- An index that points to comment about some ARK fields
(e.g. <namespace-deploy-map>). Strictly speaking,
ARK mandates very few fields and what you see in real configurations is
team-specific... but that's not what you want to hear when
you have a question.
Easily the best way to create an ARK configuration
(.xml) file is to copy (and modify?) someone
else's. We supply oodles of
examples you could start with.
Specifying a real thing at your site
The first common type of .xml file is for a
real thing (e.g. package or host) at your site.
An example might be:
<package name="make--3.79.1" xml-version="1">
<status>revealed</status>
<prototypes>
<prototype team="." name="GNU"/>
<prototype team="." name="ALL"/>
</prototypes>
</package>
Notable features:
- The <package ... > line is standard.
- The <status> field is ubiquitous in
Sidai-style ARKing; status
definitions for:
a host,
a package,
a user...
- A list of prototypes. Here's hoping you know
about those already...
- Nothing else! The more you pick up
from your prototypes, and the less you have that is
site-specific, the happier you will be.
- Actually, there are a few extra things that often
appears in site-specific .xml files; we refer you
over to the ``cookbook'' section for the details:
Specifying a prototype thing to capture common knowledge
The second common type of .xml file is for a
prototype thing (e.g. a host or package) that
captures ``intelligence'' common to many things or across
many sites.
The XML files supplied by team
Sidai are exclusively of this type. (So you have many
examples to look at; hint, hint.)
Sidai supplies some very general packages, e.g.,
GNU, which says how to build/install/deploy/reveal
a typical GNU package.
You will typically write more limited .xml files;
perhaps something like:
<package name="zlib" xml-version="1" prototype="yes">
<description>
zlib is a fairly standard compression library that other
programs use.
</description>
<configure>
<constraint><dependency type="essential" name="."
on-method="host-linkfarm" /></constraint>
<comment>
**Why** yet another bizarre non-standard configure script???
</comment>
<param name="PKG_PROTO_HOST">!proxy_for.idString</param>
<param name="cc">@proxy-host:CC@</param>
<param name="ldflags">@proxy-host:LDFLAGS@</param>
<param name="prefix">!sidai_prefix</param>
<param name="configure_args">--shared</param>
<code once-per="hosts-supported"><![CDATA[
cd $PKG_BUILD_DIR/$PKG_PROTO_HOST
/bin/rm -f config.cache
prefix="$prefix" \
CC="$cc" \
LDFLAGS="$ldflags" \
./configure $configure_args
]]></code></configure>
</package>
Notable features:
- This prototype would presumably be used in conjunction
with others. It is essentially overriding some
standard configure method (e.g. from Sidai's
GNU package), because the zlib maintainers have
decided to go their own wicked way.
- The <package ...> line must include
prototype="yes".
- ToDo: more
Your team's team.xml file records some basic
information about your team. What follows is an annotated
example.
(The other team-ish file you must have somewhere is an ``ARK
profile'' (pointed to by the ARK_PROFILE
environment variable). Team `sample1' provides some documentation.)
The first part of a team.xml file is
straightforward; as you will see here, a team can have
prototypes, too. These prototypes only affect
team actions (e.g. search for a `describe' method
for a team), not actions related to a team's "things". For
example, if ARK were looking for an `announce' method for
one of this team's packages, it would use the package
prototypes, not the prototype info shown here.
<team name="glasli1" xml-version="1">
<description>
The Glasgow SLI machines.
</description>
<prototypes>
<prototype team="sidai"/>
</prototypes>
A <contacts> field is the sort of thing we
expect to have more of. It isn't used anywhere at the
moment.
<contacts><list>
<item>partain@dcs.gla.ac.uk</item>
</list></contacts>
As with packages, a team.xml
has a list of ``interesting directories'' (<ark-dirs>). This table
is basically a set of macro definitions that are used throughout
the ARK world. For example, if you see @team:ark-dirs:OUR@
in an XML file, it will be replaced with the definition you see
below (/our).
<ark-dirs><table>
<entry name="ARK_SRC"> /workspace/partain/ark </entry>
<entry name="ARK_STATE"> /sys/ark-state </entry>
<entry name="LOCAL"> /usr/local </entry>
<entry name="LOCAL_DEPLOY"> /usr/local/.-ark-deploy </entry>
<entry name="OUR"> /our </entry>
<entry name="OUR_DEPLOY"> /our/.-ark-deploy </entry>
<entry name="ROOT_DEPLOY"> /.-ark-deploy </entry>
<entry name="VENDOR_STUFF"> /d/vendor-stuff </entry>
</table></ark-dirs>
(Note: you may put any pathnames that you like in those
definitions; the ones shown are Sidai-esque `blessed'
pathnames.)
If we need to force our way to a non-root user to do something
ARK-ish, this is the actual user that we use:
<ark-sysadmin-user> partain </ark-sysadmin-user>
When we want to say, e.g., ``writable by the sysadmin group'',
this is the actual group that we use:
<ark-sysadmin-group> sliadmin </ark-sysadmin-group>
If we want to force all ARK work to be done as a
particular group (very useful for multiple cooperating sysadmins...),
then we set the following:
<ark-sysadmin-group-must-be> sliadmin </ark-sysadmin-group-must-be>
(Don't set it if you have no such restriction, or are
unsure; it can be added in later.)
Another useful setting when multiple sysadmins are working
against the same ARK setup is:
<ark-sysadmin-umask> 002 </ark-sysadmin-group-umask>
</team>
The last part of a Sidai-style team.xml is a
hairy-looking <site-spec> table (we have not
shown one here). This table says what kind of ARK `things' are used at a site.
You can do without a <site-spec>
table to start with, and add one in later on.
This ``cookbook'' section describes techniques used in ARK
.xml configuration files. Techniques range from
the commonplace (e.g. passing parameters) to the
product-of-a-sick-mind.
- Passing <params>
- ToDo
- Specify root privileges for <code>
- ToDo
Some notation:
- `.': the common thing. As a team, the current acting team (your site team).
As a host, the host you're running on.
- `ALL': by strong convention, the catch-all
proto-thing.
- The formal way to name a thing is
<team>:<thing>. Of course, that's
often just .:<thing>, which we often shorten
to just <thing>.
Basic structure and definitions
- Everything organized by team.
- All sysadmin entities (a.k.a. things)
are handled uniformly.
- Every thing [entity instance] is of an entity type,
e.g. `host', `package', `web site', etc.
- Every thing has a name; there are
no anonymous things.
- Every thing has:
- A name
- A list of prototypes (zero or more)
- Zero or more fields; the fields
are what get built up through the prototyping mechanism.
- A thing's prototype tree is walked depth-first
and left-to-right to determine its prototype path
(the tree-walk without any duplicates).
- A thing's field wibble has the general form
(order not important):
<wibble>
<constraint> (zero or more)
</constraint>
<!-- the following, down through version, are NOT IMPLEMENTED yet -->
<acl> (zero or more)
</acl>
<comment> (zero or one)
</comment>
<provenance> (one or more -- generally machine-generated)
</provenance>
<quality> (zero or more)
</quality>
<topics> (zero or more)
</topics>
<validation> (zero or more)
</validation>
<version> (zero or one)
</version>
<param name="foo" type="bar"> (zero or more)
</param>
...
# a "value" (zero or one), which can be one of:
<string> ... </string>
<list><item>...</item> ...</list>
<table><entry name="foo">...</entry>...</table>
<code lang="..." privileges="..." redoable="..." >...</code>
<doc format="..." >...</doc>
</wibble>
Some examples...
-
The simplest field would be one with a string value (here,
`description'):
<description>A long purple squishy thing.</description>
-
Here is a field (called `compile') which has some params
and a code value:
<compile>
<param name="MAKE" />
<param name="prefix" />
<code redoable="yes">
$MAKE prefix=$prefix
</code>
</compile>
-
Fields can be partial (don't include values).
The most common use might be something like (two partial fields)...
<install-bits>
<constraint><host-spec name="hppa-hpux"/></constraint>
<param name="MAKE">/usr/bin/make</param>
</install-bits>
<install-bits>
<constraint><host-spec name="solaris"/></constraint>
<param name="MAKE">/usr/ccs/bin/make</param>
</install-bits>
... which sets a parameter MAKE in a
system-dependent way.
- A typical query we might make of our
configuration data is:
e.g., ``What is the value of field disk-configuration
for the entity of type host named web-server?''
We gather together (in order) all occurrences of field
disk-configuration in the XML files along host
web-server's prototype path, and combine
them.
Combining is a non-trivial process, but at least there is
only one such process! Some detail given below.
All the pieces of a field
The non-value parts of a field can include:
- constraints
- Things that must be true in the system for
this field to be valid/make-sense. They come in
various flavors (e.g. ``essential'' and ignorable).
The main example at the moment is package dependencies.
Constraints are also the ``guards'' of the configuration language.
If all of a field's constraints are not satisfied, then this
field is ignored.
The main constraint of this type is host-specs.
Constraints are a pretty big topic of their own.
- acls
- ACLs (Access Control Lists) specify who can do what to
this field, notably change it.
Don't have a clue how this might work. NOT IMPLEMENTED YET.
- comment
- A chance to wax eloquent about the why's and wherefore's
of this field. Would expect it to appear in a Web page
generated about this entity.
- provenances
- Comprise a record of where (i.e. what version of what
XML files) all the info about a field came from. Expect the
provenances to be entirely machine-generated. Would feature
in documentation about this entity.
(Implementation is still in early development. Revelant: method recording...)
- quality
- (Need a better name.) Where you can say things like,
``A horrible hack'', ``Works, but needs to go through code review'',
``To be revisited in May 2002'', etc. The idea is to be
able to go through all configuration data and extract the
fields that are weedy and/or need attention.
NOT IMPLEMENTED YET.
- topics
- (From ``topic maps'', an obscure XML thing.)
Essentially, index entries that you could use if creating
a huge cross-reference.
NOT IMPLEMENTED YET.
- validations
- Code that can be run to show that the information
in this field is correct.
NOT IMPLEMENTED YET.
- version
- A number; other parts of the prototype tree can insist that
they meet up with this version of things...
NOT IMPLEMENTED YET.
- params
- Well, um, parameters... Field values
(<code> and otherwise) can reference
param values (details below).
A field value can be one of:
- string
- Straightforward.
- list
- A list of <item>s, each of which can be
of any type (string, list, table, code, ...).
- table
- A hash table of <entry>s, each of which
blah blah...
- code
- Code that may be run. Oodles of possible attributes:
- lang
- The language the code is in: sh, python, or perl. Default: sh.
- recordable
- If yes, then we record in the ARK state dir that this code has been run. Default: yes.
See also: about method recording.
- redoable
- If yes, then it's safe to re-run the code. Default: no.
- interactive
- The code may do something that requires interaction with the user. Default: no.
BARELY IMPLEMENTED.
- dangerous
- If yes, means the code could do something Very Nasty to your system, e.g. leave you without a password file. Default: no.
NOT IMPLEMENTED YET.
- pretendable
- If the --pretend (dry run) flag is on, we can
still run the code (rather than just display it) and no ``real work'' will take place.
- privileges
- Code needs to run with the privileges of the user/group
specified.
The most specified way to write this is privileges="<usr>:<grp>",
in which case we run as user <usr> and group <grp>.
The part before or after the colon (:) may
be omitted, in which case the default user or group (as
appropriate) is used for that part.
The same (defaulting) is true if the user or group is given
as '.' (dot).
If the user is defaulted and we are not root, then we use the user specified
in <ark-sysadmin-user-must-be> (NB: not
often set). If no such thing is specified, then the
current user is used.
If the group is defaulted and we are not root, then we use the group specified
in <ark-sysadmin-group-must-be> (NB: more commonly
used). If no such thing is specified, then the
current group is used.
Special cases: privileges="root" really means
privileges="root:root", privileges="." really means
privileges=".:." (i.e. default everything).
If privileges is not specified at all, default everything.
- once-per
- The `granularity' of the method code; i.e., how much
it needs to be run for it to have been run for the whole
site. Choices include: host (must run on every
single host), site (just run it once),
hosts-supported (roughly, once per platform), or
according to a user-specified table. Please see: method granularity and proxy
hosts... Default: host.
- proxy-hosts
- Specifies the host(s) that are allowed to act as
proxy host to get a method run. Default: 'ALL'.
See also: method granularity and proxy
hosts...
- doc
- A documentation fragment. Attributes:
- format
- The format the fragment is in: text, html. Default: text.
(Other possibilities include: pod, sgml, ???)
- eval
- Code that, when run, will produce a value of one of the
types (string, list, table, code, ...). (An Arusha backtick!)
NOT IMPLEMENTED YET.
- ref(erence)
- A reference to another entity. Haven't decided about this.
NOT IMPLEMENTED YET.
An ARK field, partial or not, can have constraints.
These constraints must be satisfied/true before the other
parts of a field (params, value, etc.) come into
play. As such, constraints act as guards over the
other field information.
ARK constraints are slightly different from other languages,
in that they have a `make it so' property. If a constraint
is not satisfied, but can be made so, then we do that, and
then consider the constraint satisfied.
There are three variants of constraint in ARK land.
- The <host-spec> constraint
ToDo: finish!
- The <dependency> constraint
A <dependency> constraint encodes a
dependency between two ARK methods. For example, an ARK
package's compile method probably depends on its
assemble-source method having been run
successfully.
While dependency constraints are typically between methods
of the same object (as in the example given), they can be
on any method of any object. So, for example, a
string-valued field could depend on a particular user method:
<foo>
<constraint><dependency>
type="essential" what="user" name="joe" on-method="create"
</dependency></constraint>
<string>
Hi, Mom!
</string></foo>
ToDo:finish!
- The <general> constraint
ToDo: finish!
Setting a <param> often looks like this:
<param name="MAKE"> @host:GNU-MAKE@ </param>
That @...@ stuff is the ARK macro language.
It is a convenience thing; strictly speaking, you could do
without it (but you really wouldn't want to). The main
syntax is:
@<object-type|self|param>:<field-name>[:<index>[:<separator>]]@
Some examples therefore:
- @proxy-host:PERL@
- Returns the <PERL> string-valued field.
- @host:ip-addresses:0@
- Returns the string in item 0 of the <ip-addresses
list-valued field.
- @host:ip-addresses:*:,@
- Returns a comma-separated list of the
items in the <ip-addresses> list-valued field.
- @host:ark-dirs:FOO@
- Returns the string in the entry indexed by
`FOO' in the <ark-dirs> table-valued field.
An anomalous form that we also support is @package:indexed:FOO@
(ToDo: say more).
The macro expander simply looks for things it knows how to
expand, and keeps going as long as it keeps finding things.
(Yes, you can probably make it loop.)
You can get nested macros to work; so, for example:
@host:run-level-dir:@param:RUN_LEVEL@@
The expander will spot that @host:run-level-dir:@
is a bad spec, so it skips it and matches the
@param:RUN_LEVEL@ one. Having replaced it, it looks
again and finds @host:run-level-dir:3@ (or
whatever) and successfully matches it.
You will eventually bump into a tricky ARK questions of the
form, ``How much must this code be run in order to do the
whole site?'', and ``What host(s) will it run on?'', or
perhaps ``Why did (or didn't) it run on that host?''
More formally, it boils down to two issues:
- granularity: Some code needs to run just once
per site (e.g. unpacking a tarball), and some code needs to
run on every single host. And some needs something in
between: we probably run a compile once per platform, not
once per host.
- proxy-ness: (to coin a horrible term)
If we don't run every method on every host, then obviously
some host(s) are acting on behalf of other hosts; i.e.
the acting hosts are behaving as proxy hosts.
Here, we lay out the whole story on method granularity and
(proxy) host use in ARK.
What are we saying, host-wise, when we invoke the following?
ark package compile ALL
In principle, we are saying to run the
compile method for ALL packages on ALL
hosts. Site-at-a-time
thinking demands this.
(You can limit the set of hosts with the --hosts
and --proxy-hosts command-line options; see the details.)
Next come constraints -- they
kick up more work to do. Continuing with our
compile example above, what if the compile
method included...
<constraint><dependency type="essential" name="." on-method="depend" /></constraint>
The ARK machinery will check whether or not the
depend method for ALL packages on
ALL hosts (modulo --hosts, of course) has
been run, and (of course) run it, before doing the
compile method.
Imagine we have 300 hosts, of three flavors (say, Linux,
Solaris, and AIX). If you have a big bunch of packages
(ALL) and a hefty chain of dependencies
(compile->depend->configure->patch-source->assemble-source...),
you are looking at a HUGE pile of system crunching!)
And much of that crunching is entirely pointless.
OK, admittedly, the don't-redo-the-dependencies thing saves
lots of work, but that still leaves: Why would you
compile on 300 hosts, when 3 hosts (one of each
flavor) would do? (Standard ARK disclaimer: but if
that's what you want to do, no problem!)
Enter granularity! This is specified by the once-per
attribute of a <code>. It can be:
If we aren't doing the obvious and running a method's code
on every single host, then some host(s) is/are plainly
acting as proxy hosts for other hosts at the site.
We control this proxying behavior with the proxy-hosts
code attribute. The default depends on the once-per
code attribute. proxy-hosts can be:
If once-per="hosts-supported", then the default
is proxy-hosts="proxy-hosts".
If once-per="something-else", then there is
no default for proxy-hosts.
Not all combinations of once-per=... and
proxy-hosts=... make sense. Here's the table that
says what's OK:
| once-per=... | host | site | hosts-supported | other |
| proxy-hosts=none | OK | no | no | no |
| proxy-hosts=. | OK | OK | no[1] | OK |
| proxy-hosts=some-table | OK | OK[2] | OK | OK |
[1] This one could be `OK', but we're doing the wimp choice
for now.
[2] We look up proto-host .:ALL in
<some-table> to find the real host to use.
Please see the relevant section
about how to run the ark tool.
An ARK method has a recordable property (default:
yes). If yes, then, when we run the code, we
record that we have done so, somewhere in the
ARK_STATE (a team field) directory.
ToDo: more
ToDo
|