[sorry if this is a repost ; mail problems! ] Hi Guys, The background for the filesystem-based structure is what's come out of me working for the National Library of Australia, although it has nothing library specific about it. Basically, there are pre- and post processes we do with our data, such as converting to and fro various formats, analysing them, indexing, filtering and so forth, and quite often, because all or most of our tools exsist outside of databases, the filesystem is the place to be. (I've written some more rationale for the use of filesystems instead of RDBMS's, but that sort of formality is a banality in a forum such as this ... :) ******** The gist of it is three main directories, where 'apps' and 'libs' are a shared space for various tools, applications and reusable libraries, and a 'data' directory is a shared space with a special meaning ; /apps /libs /data The data directory can contain as many subdirectories as you want, with the constraint that each of them is a data-set, so for example for a data-set that consist of records of childrens literature, it might look like ; /data/childrenslit Also in the data directory is a configuration file called config.xml, and you register your apps, libs and data-sets in it, although only the data-set registration is mandatory ; <config> <!-- Register your apps here if you like ... --> <apps /> <!-- Register your libs here if you like ... --> <libs /> <!-- You must register your data-sets here --> <data> <data-set id="childrenslit"> <with-records schema="marc21" read-access="true" write-access="false" update-access="false" /> </data-set> </data> <!-- Register your supported schemata here --> <schemata> <schema id="marc" extension="bin" /> <schema id="marc21" namespace="http://..." extension="xml" /> <schema id="mods" namespace="http://..." extension="xml" /> <schema id="xobis" namespace="http://..." extension="xml" /> <schema id="rss2" namespace="http://..." extension="xml;rss;rdf" /> <schema id="foap" namespace="http://..." extension="rdf" /> </schemata> </config> Each data-set's id is the name of the directory it resides in, and you specify what schemas you can expect to find in that structure ; <with-records schema="marc" read-access="true" write-access="true" /> <with-records schema="marc21" read-access="true" write-access="false" update-access="false" /> The example above would specify a data-set where we can read and write MARC records, but only read MARC 21 XML, a typical MARC to MARC XML conversion. We can also see that for each record we can expect one MARC file and one MARC XML file. If there is discrepencies, we know what tools to call to do a conversion if a new record has been popped into the structure. Any tool that wants to work on this data-set know what parts of the data-set they can read, write and update. (I'm sure better concurrency information could be thought of) Next is the structure of the data-set directory itself. If we go back to our previous example ; /childrenslit The tree-structured dirindex is required from this directory and three levels deep. Each record is in the format [id].[schema].[extension]. An example for a MARC XML record with id '676732a' ; /childrenslit/a/a2/a23/676732a.marc21.xml Notice the reverse order for the id directory structure; most id's are rightly bound to uniqueness / traffic, so just a pragmatic choice. There is no requirement to create a tree-structured dirindex for all possible combinations, only those who will be filled with actual records. If a directory becomes empty at a later stage, that directory can also be deleted. (There is here also the possibility to create a file index definition file, but I'll hold this off until needed; file structure traversing is more a configuration issue than a technical one) How it is supposed to work -------------------------------- I can have different apps that looks at the same data-set to do various things to it. For example, here are two apps that use the same data-set ; <application id="my_app"> <description>Converts MARC to MARC XML</description> <uses data-set="childrenslit"> <reactor schema-idref="marc" test="new;update" /> </uses> </application> <application id="my_other_app"> <description>Indexes MARC XML</description> <uses data-set="childrenslit"> <reactor schema-idref="marc21" test="new;update;delete" /> </uses> </application> Both applications knows from the config.xml file who does what, and we can write reactors to data added, updated or deleted in a given schema. The idea from here is that we simply can exchange apps and libs and point to what data-set we want them to work with. If an app has support for a few good schemas, it would be a simple matter of plonking it it, update config.xml, and then run it. By this I could share with you for example the XPF framework to create a Topic Maps website from any data-set (MARX XML, Topic Maps, RDF, RSS, DocBook and a few other schemata), the Phonto tool (automatic ontology extraction and analysis tool) and a host of XSLT libraries for various conversions. Imagine next the creation of lexical parsers, AI tools, data-set to SemWeb bridges, and so forth, easily shared. I have some data-sets, you have others, and often sharing these are restricted, so let's share tools that read the same structures, on the path to world domination. Anyways, that's the basic idea, in no ways exhaustive terms. Any thoughts? -- "Ultimately, all things are known because you want to believe you know." - Frank Herbert __ http://shelter.nu/ __________________________________________________