LinuXML

Contents

Introduction | Storyboards | How it Works | Downloads | How to contribute to the project | The "Vision" | Architecture for LinuXML | Specifications and DTDs | Discussions on Issues


Welcome to the LinuXML project.

This project is devoted to changing the UNIX de facto standard for inter-process communication and storage from line-based ASCII records to XML. Linux is the reference platform.


Hit Counter:


Home


1. Introduction

This project originated from my daily frustration with xterm. It annoyed me that xterm didn't allow me to click on the ASCII words and symbols with my mouse and get things to happen beyond simple text cut and paste. (My original whinge on this subject was posted on Usenet back in 1994.Do HyperTerminals Exist?.) I always find getting "sort" to work on arbitrary record formats a chore, and I can never remember the file formats of all the /etc/conf files.

Now Linux has come to us all, and XML is now appearing. These two provide us the means to improve our computing environments. GNU Linux because it gives us access to the source of the UNIX commands, and XML, because it gives us a standard syntax for data.

Home


2. Storyboards

2.1 "XMLterm"

Here is a simulation of the improved XMLterm: (Hit your browser refresh button to restart the animation.)

Home


3. How it Works

The idea is simple. Instead of a program like "ls" outputting its data in ASCII records, it outputs the directory listing in XML. This allows all the downstream programs, like "sort" and "xterm" to actually understand what the data is and do more useful things with it. Here's an "ls" output:

[birch@redhat birch]$ ls -l --xml /
<?XML version="1.0" ?>
<!DOCTYPE UNIX>
<COLLECTION>total 51
<DIRECTORY NAME="bin" MODE="755" USER="root" GROUP="root" SIZE="2048" MODIFIED="922062673l" >drwxr-xr-x 2 root root 2048 Mar 22 1999 bin</DIRECTORY>
<DIRECTORY NAME="boot" MODE="755" USER="root" GROUP="root" SIZE="1024" MODIFIED="917089762l" >drwxr-xr-x 2 root root 1024 Jan 23 22:09 boot</DIRECTORY>
<FILE NAME="bootsect.lnx" MODE="644" USER="root" GROUP="root" SIZE="512" MODIFIED="922025029l" >-rw-r--r-- 1 root root 512 Mar 22 1999 bootsect.lnx</FILE>
.......
</COLLECTION>

You can see that each listing is delimited by XML tags such as <DIRECTORY> or <FILE>.

Here's another "ls" output with a more specific set of XML tags:

[birch@redhat birch]$ ls -l --xml /
<?XML version="1.0" ?>
<!DOCTYPE UNIX>
<COLLECTION>total 51
<DIRECTORY SIZE="2048" ><PERMS>drwxr-xr-x</PERMS> 2 <USER>root</USER> <GROUP>root</GROUP> <SIZE>2048<</SIZE> <MODIFIED UTIME="922062673l">Mar 22 1999</MODIFIED> <NAME>bin</NAME></DIRECTORY>
<DIRECTORY SIZE="2048" ><PERMS>drwxr-xr-x</PERMS> 2 <USER>root</USER> <GROUP>root</GROUP> <SIZE>2048<</SIZE> <MODIFIED UTIME="922062673l">Mar 22 1999</MODIFIED> <NAME>etc</NAME></DIRECTORY>
<DIRECTORY SIZE="2048" ><PERMS>drwxr-xr-x</PERMS> 2 <USER>root</USER> <GROUP>root</GROUP> <SIZE>2048<</SIZE> <MODIFIED UTIME="922062673l">Mar 22 1999</MODIFIED> <NAME>foo.txt</NAME></FILE>
.......
</COLLECTION>

These tags delimit each component of the listing. (This is the currently favoured format by people on the mailing list.)

An XMLterm can be programmed so that when you click on a directory such as "/tmp", it tells the shell to "cd /tmp".

Also "XMLsort" can sort on the size of a file directly since the XML output of "ls" explicitly identifies the size by the tag. i.e. instead of

ls -l / | sort -n +4
you have:
ls -l --xml | XMLsort -size

The rest of this web site is all about achieving these goals.

Home


4. Downloads

This section of the site is here you can download LinuXML programs and files. We are starting with prototypes for initial play and usability testing, this will section of the site will list and point to downloadable LINUXML software and docs.

4.1 'ls' source code

The source for an XML version of 'ls' is available here: lsxml.tgz.

'ls' does not produce a depth-first output, rather it queues up directories to scan (breadth-first?). This probably means that we'll need to write a variant of 'ls' whose output mimics the hierarchical file structure. GUI viewers such as "XML spy" with a tree-display will work better with that format.

4.2 Example Files

Here are some example XML files:

4.3 A simple script for ls

paul@argo.demon.co.uk hacked up an ls shell script to provoke some thought. Here's what it outputs:

$ ./xls -l /etc/d*

<?xml version="1.0"?>
<!DOCTYPE FILELIST SYSTEM "ls.dtd">
<?xml-stylesheet href="ls.xsl" type="text/xsl"?>
<FILELIST>
<directory name="/etc/default" mode="0755" links="2" size="1024"/>
<directory name="/etc/dhcpc" mode="0755" links="2" size="1024"/>
<file name="/etc/dosemu.conf" mode="0644" links="1" size="6753"/>
<file name="/etc/dosemu.users" mode="0644" links="1" size="1170"/>
<file name="/etc/drums.o3.rpmorig" mode="0644" links="1" size="7680"/>
<file name="/etc/drums.sb.rpmorig" mode="0644" links="1" size="6656"/>
<file name="/etc/dumpdates" mode="0664" links="1" size="0"/>
</FILELIST>

4.4 XML grep

I have been using the LTXML suite http://www.ltg.ed.ac.uk/software/xml/ which has a 'sggrep' (an XML grep program) amongst other things.

4.5 Vaughn's Terminal Emulator Candidate for XML?

Vaughn: I have decided to go ahead and make a release of gsh. The current version is now 0.0.12 and is available at

http://personal.atl.bellsouth.net/atl/v/c/vcato/gsh/

I have had a couple of other people try it, so maybe it will really work this time.

Home


5. How to contribute to the project

This is an Open Source project, and will be covered by the GNU license.

All forms of contribution will be welcomed. Currently the most useful contributions will be in the form of feedback on the vision and architecture. More concrete help in the forms of DTDs and source will be welcomed with open arms. Thinking and opinion on the issues list would also be helpful.


Click to join the to LinuXML mailing list. The mailing list is the forum for project discussions.

Or email birchb@ozemail.com.au

5.1 Want to volunteer for some XSL coding?

Do you know XSL? If so would you consider creating an XSL for the example file:

http://geocities.datacellar.net/ResearchTriangle/Forum/6751/exa2.txt

The goal is to have a cool display of the XML file in Microsoft IE5 and Netscape browsers.

5.2 Marketing LinuXML

Currently people are signing up to the mailing list roughly one per week. I have no idea where people find out about the project from, since there is only an old link from Linux Weekly News. A volunteer is needed to spread the word in the Linux & UNIX community more widely. Any takers?

5.3 To Do List

Here's a high-level to do list. These activities are semi-ordered, however which get done first depends on the whim of the contributors.

5.3.1 Complete the DTD (Bill)

The first-cut DTD for Linux objects needs to be written and published.

5.3.2 Re-code 'ls' to use the new DTD style. (Bill)

'ls' needs to be changed to use more tags and less attributes.

5.3.3 Code a prototype XMLterm (Your name here)

A first-cut XML terminal is needed to prototype and validate the designs and DTD.

5.3.4 Code XML versions of Linux commands.(Your name here)

XML enabled versions of common Linux programs need to be coded. A lifetime's work! New XML filter programs are needed to replace 'grep' etc.

5.3.5 Code a full-on XML terminal and shell.(Your name here)

After the prototyping, a full-blown XML teminal emulator will be coded as the main UI workhorse for everyday use.. This may also include a complete OO shell environment.

Home


6. The "Vision"

I'm not a big fan of this word, it sounds pretentious but it seems to fit here.

The new 'vision' for the project consists of combining the idea of 'object streams' and direct manipulation in the user interface. Rolling all the smarts into the user interface is an option, but we need to keep the magic of the "|" pipe alive. The original concept is still valid, but a more structured approach to getting there is needed. Remember this is 'vision' material so some of it will be beyond our immediate reach. There are some key parts which I believe we need.

6.1 A Standard for Defining Data

We need to be able to describe the format of objects in object-streams. The format of the data should itself be standard. Why have different output data for every program? XML appears to provide us a solution.

Meta-data is also a requirement. Format specs have to be machine readable for programs that need it. An obvious candidate meta-data format is XML's DTD, although it does lack detail on atomic data types.

6.2 An Object Model for Common UNIX Objects

Objects like files in UNIX are very standard. It makes sense to create common object models for all standard UNIX objects. (I heard that a researcher is building an OO model of UNIX, but I have not found it on the web yet. Is it called Obex? Let me know.)

A standard XML format for standard XML items would be the way forward. This way it won't hurt much when people hard code these into their programs. No-one should be penalised for hard-coding expectations about the XML syntax of standard objects. Also, having a standard for common things should allow a lot of code re-use.

Applications that don't use standard UNIX objects must also be able to use the same object and meta-data stream formats.

6.3 A Standard Way to Load Methods

It would be very cool indeed if programs like "sort" were able to dynamically load methods for objects they are working on. This would also apply to user-interface programs, which would need to dynamically discover the capabilities of an object class, and load the relevant methods. Java is a very good candidate for this requirement.

6.4 Backward Compatibility

LinuXML needs to be fully backward compatible with UNIX & Linux of yore. For example new versions of "ls" would have the XML output as a new command-line switch, new versions of "xterm" must work with straight ASCII. Also the role of producers like "ls", filters like "sort" and terminals should remain the unchanged..

Home


7. Architecture for LinuXML

The de-facto standard for UNIX records is the line-based record possibly with tab delimited fields. LinuXML software uses the XML tag as the standard for delimiting records and fields. This standard format can be used anywhere that ASCII data formats are used currently, including:

Thanks to the stdio and the pipe, programs in Linux can be subdivided into 'source programs', 'filters' and 'terminal' programs. Examples of 'source programs are 'ls', 'yes', 'ps'. Filters include 'sed', 'awk', 'grep' and many others. 'xterm' is a great example of a terminal program. The goal of LinuXML is to provide XML versions of all these types of programs. Source programs would output in a standard DTD for standard Linux OS objects like processes. Filters would be expected to work with any arbitrary XML stream, and terminals and shells would require understanding of the standard DTD to allow them to interpret the meaning of the data they are receiving.

7.1 Why XML?

Why XML? Many formats could be used. (I had originally planned to use the syntax of LISP since it has a rich yet simple syntax.) Here are a list of reasons why we are going with XML:

7.2 Object Streams for Linux

An Object Stream is a sequence of objects which have been converted from their in-memory format to some serial format. We use XML to delimit and serialise objects. This is an extrapolation of the age-old UNIX tradition of chaining filters together using pipes. Instead of chaining together programs such as sort | uniq |wc which work on LF delimited lines, we will be chaining together XML-aware filters.

Since XML can hold hierarchies of data, we expect that XML filters will be capable of much more complex tasks than the traditional versions. Also XML tags are more explicit about where the fields are in the stream, which means we won't have to program parsers in or filters as we do currently.

7.3 Model View and Object Streams

We don't actually transmit the Linux objects themselves, rather we transmit 'Views' of the objects. (This concept is derived from the Model-View-Controller paradigm.)

An object View refers to the original object via pointers, or unique object IDs. These allow the receiver of the view in the object stream to re-establish contact with the original object. For example a URL is a kind of object ID. tThe HTML tag <A HREF"=http://jupiter/foo.html">Foo File</A> is a 'View' of the foo.html file stored on the machine "jupiter".

Thus in LinuXML, we require all Linux objects to have some kind of unique ID such as a full path name. For example <FILE PATH="/tmp/foo.dat"> would work.

Objects in a stream which do not have a unique pointer to a corresponding 'Model' object could be regarded as transient objects which only exist in the transmission stream.

The diagram below illustrates the LinuXML approach to object streams:

7.4 Making Everything Click

On the user interface the goal is to make everything 'clickable' or 'inspectable'. This has been done before,a great examplecan be seen in the Allegro Common Lisp IDE, in which every symbol on the screen is selectable with the mouse. The IDE knows everything there is to know about the symbols, similar things are available in many other IDEs. On the web too, the use of hyper-text links is of course fundamental.

Currently in Linux the only use of the mouse is in X-windows apps where widgets and dialogues have been coded. The terminal emulator is still the predominant UI for Linux users, but strangely only has rudimentary clickability. We are working towards the fully clickable terminal emulator for Linux as one part of this project. Some approaches are presented below.

7.5 Terminal Emulator Designs

This section describes some alternative ways of implementing hyper-terminals. These are under review. Note that in all cases the LinuXML DTD is the same. I hope that in future people will try some or all of these different approaches

7.5.1 Fat Java Client

7.5.2 Fat xterm

7.5.3 Thin xterm with a new shell

This option places the user interface as being 'dumb'. It will know when objects have been clicked, and will convert user clicks into shell commands. eg xterm will transmit a message to a shell instructing it that the user had clicked on a directory.

$ click /home

The shell will then deduce that a "cd" is appropriate and execute that operation. This places all the OO smarts in the shell (knowledge of object classes and methods). This allows early prototyping with existing shell languages.

7.5.4 Integrated terminal emulator and shell

This option bundles all the functionality into one X-windows executable. This is the model preferred by many 'file manager' programs today.

7.6 XML for Configuration & Data Files.

XML is ideal for structured config files. Combining a standard syntax with parsing libraries and XML structure editors will make system admin more pleasant. There are already many 'folding tree' style XML editors that could be applied to an XML format config file. Common-garden XML parsing librares are available to ease the burden of reading in XML formats.

Home


8. Specifications and DTDs

This section is/will be a repository of standard UNIX specifications and DTDs for XML object streams. This section contains or references detailed specifications of interfaces (down to the bit).

Firstly, what is a specification? For us a specification is a document for human readers that unambiguously and definitively describes the format of an object stream. Since we are using XML, we shall use the syntax of Document Type Definitions (DTDs) as the basis for specifications. Note that DTDs themselves are not rigorous enough for our purposes. Therefore text annotation in the specifications will be added to dis-ambiguate the descriptions. In future, we may also use other meta-data syntaxes to update the specifications (eg XML-Data).

8.1 DTD Design Principles

This section collects design principles for LinuXML DTDs.

8.1.1 Mandatory Object IDs

The objects' identification will have to be mandatoy If it isn't the XML format will only serve to to allow pretty presentation, the user would have to figure out which object was being addresses :-( . WITH the ID info, though great things can be done.

8.1.2 Use of Tags vs Attributes

After looking at emails we thought that a better syntactic approach to the DTD would be to use nested tags instead of attributes. (See exa2.txt below).

The advantages:

* Fits better with some parsers and transformers.

* Allows re-use (more like an OO model) ie is a tag that can be

re-used whereas

having a USER= attribute needs to be defined in every element.

* looks better!

Disadvantages:

* is more verbose.

Using attributes wherever possible is a good thing. Not only are they validatable, but they help you adhere to good object modeling. In the case of , the UID, and UNAME are clearly attributes. If this particular element were to gain children over time, like their subordinates, then they could be added as child elements.

8.1.3 Optional Tags

The DTD will allow most elements to be optional so that data output volumes can be controlled by command line switches etc. This will allow us to control data volume.

8.1.4 Tags for Users, Attributes for Machines

This is a general principle we have discussed and have adopted:

In general, if a fragment of information needs to be displayed to a user then there shall be a tag that can be used to mark it in the data stream. If there is a datum that needs to be in the stream for technical reasons but is unlikely to be displayed to a user (eg epoch time) then it shall be an attribute.

8.1.5 Upper Case

As much as I hate it we're going with upper case. Some XML code only seems to accept upper case.

8.2 Experimental Specifications & DTDs

The specifications in this section are experimental and may change without notice or be removed.

8.2.1 <!DOCTYPE UNIX>

This first specification allows almost any UNIX object to appear at almost any place in an object stream. This allows programs like 'ls' to re-arrange underlying operating system hierarchies (e.g. tree flattening) as required by the users. The specification assumes that programs output a kind of text 'data soup' in which morsels of parsable data can be found. We use XML tags to delimit the parsable data, and ignore (for processing) the free text.

Furthermore, we place few if any restrictions of the placement of particular elements, they may appear anywhere. There is no attempt made to mimic the internal file system or other UNIX aggregation hierarchies.

By contrast, the syntax of the tags and their attributes is rigorous and unambiguous. This allows us to code filters and UI programs that can handle the data morsels. Since the elements can appear out of context, they each must have a proper object identification for the UNIX object the refer to. Thus a element needs to have the full pathname to the file as an attribute. Without this, downstream programs and users are unable to identify the file.

<!-- Not Compete -- Refer to the UNIX DTD file (unix.dtd) for the full specification. -->

8.3 Ratified Specifications & DTDs

The specifications in this section have been ratified by the team and by public consensus. These will form the basis of the LinuXML design.

Home


9. Discussions on Issues

This section captures some of the issues and discussions from the mailing list.

9.1 Embedding the Presentation

David Suarez de Lis : I am with this new DTD, but, like Paul, dislike the inclusion of the

representation there...

actually a

100users

sysuser

kind of tree (or grove) may be easier to parse and to play with for later

representations...

9.1.1 Backward Compatibility

Bill: We MUST have backward compatibility in presentation. Well look at

Linux, it mimics an ancient OS from the 70s almost down to the last detail.

Why? Because the programming public like and want their familiar commands.

Linux would not be where it is today if all the commands and files were

different. So in LinuXML we are stiving to ALLOW backward compatibility. The

beauty of the XML format is that it allows us to migrate towards better

solutions!

> I've modified the GNU "ls" to allow 'ls -l --xml'. I attach a sample output

> below. I plan to complete the 'ls' changes RSN. This will include the other

> modes of 'ls', columns etc etc. After that I'll document the associated

> specification & DTD. (The ls command has non-linear stuff in it which

> affects the design. This is all about backward compatibity...)

9.1.2 David Suarez de Lis

I see... is that the reason for the contained texts? If not, I wouldn't

go the container way and stick to empty tags...

If representation is a problem, there's be an adecuate stylesheet

somewhere to tell the program how to display ls -l --xml... a flat ASCII

string wouldn't work for a normal xterm, which would output the whole XML

document...

9.1.3 Bill

Yes, 'ls' is 'ls'. Another ls-like program will be coded to give a raw XML

type output. We keep 'ls' as is except we mark up the output. That way when

t gets to the XML-ready terminal emulator it will look the same as always.

9.2 Compression/Optimisation

David Suarez de Lis: The only concern I have with this is for operations on big filesystems...

an 'ls -lR --xml > ls-lR.xml' could take ages and loads of resources...

maybe the level of verbosity can be reduced by a unix symplification

algorithm(tm) :) consisting on having as many 2 or 3 letter tags as

possible... certainly and make much shorter docs...

another concern is the size of objects travelling the system if we are

going for the CORBA model... big objects need time and resources that can

be better used for other thinks (for that matter, a simple 'cp prog

/target/dir/; cd /target/dir/')

paul666@mailandnews.com: No, I disagree with this. This is an optimisation excercise and can

be done with compression or by transferring the in-memory XML tree

if it is not for textual consumption. Gnome's libxml already does

compression I believe.

There is also the possibility of providing hints to the XML source

application not to output certain bits of info, instead of outputting

them and not being used. Take gcc for example. Suppose we want to

ignore all warnings from gcc (which has of course been modified to

output XML), perhaps we could call gcc with gcc --eat WARNING to

tell it to swallow all WARNING elements?

Bill: I tend to agree with Paul about the abbreviations, volume is not so much a

problem nowadays. Remember that classic UNIX was designed to work over

9600Baud TTY lines (which it does fine). I think we should assume More's Law

will help us out.

And yes, the source programs can be asked not to dump everything via command

switches. 'ls -l' does this to some extent, you can control what you get.

So therefore the DTD will allow most elements to be optional.

William Adams: As far as the volume of text is concerned, I would tend to agree that it's

not something to worry about at this particular point. Your DTD should be

as clear and expressive as possible. The XML generated can always be

transformed into a more terse form using XSL. So if size is a issue, like

transmitting over a expensive satellite line, then you transform to a more

terse form on each end.

In addition, the XML can be represented in a binary form which can make it

even smaller. This binary form might look like ASN.1 or something more

emergent.

9.3 Editting Raw XML in XML-aware Terminal Emulators?

There is a problem with Terminal Emulators in raw mode. What happens? Also, what happens if a we want to edit (or even cat) an XML file?

9.3.1 Vaugn Cato

I have been thinking about how XML fits into all this. One of my

dilemmas has been how to handle the difference between seeing the XML

source and seeing the interpreted XML. If xmlfile is an XML source

file, and you type

cat xmlfile

You will see the interpreted output, but what if you want to see the

actual source? Maybe the solution is just a toggle somewhere to switch

between plain and interpreted mode.

9.3.2 Bill

I see these options:

1. Exploit unused escape codes in the VT100 escape code set to signal start

and end of XML

2. Assume input is always going to be good XML and use this as default mode.

3. Have a "View Source" option like on a browser.

4. Provide an 'xml-quote' program that wraps entire files in #PCDATA

declarations, so the user gets to see the source. Philosophically this is

the exact same issue that we have today with VT100 escape codes. If you cat

a file with escape codes in it, most terminal emulators go bonkers. You need

to run these through 'od -c' or similar to 'quote' the file.

Of these options I hate 1. but like 2.3.4

9.3.3 Vaughn

This doesn't sound like a bad option to me, but I could be convinced

otherwise.

If you are using a text editor, the control characters are printed in

some

special way so they wouldn't be interpreted as an XML introducer. If

you used

grep, grep would have to recognize that it is looking at XML output and

output

the XML introducer in appropriate places. It would still be the case

that

if you cat the output of a command, you would get the XML code, but if

you used

less, you wouldn't since less prints the ESC character specially.

> 2. Assume input is always going to be good XML and use this as default mode.

I think we have to take the same approach that browsers do. Anything

that looks

like XML but is not understood would be ignored. It seems like in order

of XML

to be useful it would need to be the default.

> 3. Have a "View Source" option like on a browser.

I think this is good also.

> 4. Provide an 'xml-quote' program that wraps entire files in #PCDATA

> declarations, so the user gets to see the source. Philosophically this is

> the exact same issue that we have today with VT100 escape codes. If you cat

> a file with escape codes in it, most terminal emulators go bonkers. You need

> to run these through 'od -c' or similar to 'quote' the file.

This wouldn't help if you were editing the file with a terminal based

text editor

(i.e. vi) from within the LinuXML terminal.

> Of these options I hate 1. but like 2.3.4

I guess I'm going more with 1,2,3.

9.3.4 Bill

Editors could only edit the raw XML (unless they were XL structure editors).

They normally have direct access to the raw files to there should be no

problem unless the terminal emulator is in 'interpret-XML" mode. Clearly

sequence of events needs to switch off XML interpretation in the terminal

editor. This could be done by:

1. Start a separate window for the edit. (xterm -e vi foo.xml;)

2. alias vi '(echo ESC sequence to switch off XML mode;

vi foo.xml;

echo ESC sequence to switch on XML mode;)'

I don't know the details but I understand that devices are put into raw mode

by programs like vi, and they use special escape. Perhaps a combination of

these can be used to control whether the terminal goes into XML mode? If the

device is in raw mode, this could also imply no xml interpretation?

9.3.5 Vaughn

I suppose it would be sufficient that if the terminal emulator

encountered any escape sequences that it switched off XML interpretation

until the command completed. Any program like vi would certainly send

some sort of escape sequence when it started, while programs that output

XML wouldn't.

9.4 Java, Gnome or KDE?

Which would be the best to use for enhanced XML-enabled user interfaces?

9.4.1 paul666@mailandnews.com

I just looked at gnome-terminal. It already supports a couple of different types of drag and drop and it looks pretty clean. I suppose there are

several places where the XML interpretation could occur, in libc, in the shell, or in the xterm. I think for a prototype, it would be easiest to

do it in the xterm, or more precisely gnome-terminal. Probably more precisely in the zvt widget which is the terminal widget itself.

I've tried KDE as a windowing system, but it eats so much memory that I switched after a couple of weeks. The KDE widget set is QT and requires

that all code is written in C++ I believe. It also has a licence that is not universally liked. Gnome on the other hand, is fully GNU, uses

the GTK (gimp toolkit) and allows applications to be written in anything. I use windowmaker as a window manager and it is also the official GNU

window manager and has GNOME hooks. I prefer C to C++. Gnome also supports drag and drop, a CORBA ORB, and a ton of other features. It is

at version 1.0 which I think they're hoping to complete so redhat can put out redhat 6.0 with glibc2.1, linux 2.2 and Gnome 1.0. Redhat

are paying for a lot of the cost of Gnome development.

XML parser for Gnome

Documentation is available on-line at http://rufus.w3.org/veillard/XML/xml.html

A mailing-list has been set-up, to subscribe:

echo "subscribe xml" | mail majordomo@rufus.w3.org

The list archive is at:

http://rufus.w3.org/veillard/XML/messages/

http://www.openxml.org/

"March 5, 1999 -- OpenXML.org announces general availability

of OpenXML, an open source, pure Java, commercial-grade,

fully featured framework for XML-based applications."

Not that I think this is really a job for Java...

http://www.sgmltools.org of course.

http://www.alphaWorks.ibm.com/aw.nsf/home/current - more java stuff mainly

http://www.ltg.ed.ac.uk/software/xml/ - nice set of tools, but perhaps

with an unusable licence as the software cannot be redistributed.

I think for production, libxml is nice and small, produces an in

memory walkable representation of the XML.

9.5 Namespaces?

What should we do about them? anyone?

9.6 Problems with XML!

XML ain't perfect:

9.6.1 XML or SGML?

Paul Tyson writes: I think you are on to something with this project. I don't think it

should be built on XML. The family of ISO SGML standards (including

HyTime and DSSSL) have better-developed concepts and more precise

constructs for doing the kinds of things you want to do. Here are a few

reasons why I believe this:

1. It would be very handy to use the markup minimization features

allowed by SGML, which are proscribed by XML. I guess I really mean

"markup elimination" by use of the DATATAG and OMITTAG features. This

could be used to process standard configuration files, for instance,

without changing their format.

2. The addressing and linking mechanisms in HyTime will never be equaled

by XLink/XPointer. These HyTime concepts could be fully exploited to

allow users to associate whatever information they want to in a variety

of different ways to suit their needs, completely independent of the

file system structure.

3. The "grove" abstract model of structured information provided by

HyTime could be used for linking, exchanging, and transforming disparate

resources in a standard manner.

4. DSSSL provides an elegant, simple language for querying,

transforming, and applying style characteristics to structured documents

(or more precisely, to the groves built therefrom).

5. HyTime includes "activity policy" constructs that I believe would

readily map to UNIX file permissions.

The sum of these features (and others) would create an interface in

which: 1) the filesystem fades into the background, and what the user

sees and works with is more like a collection of meaningful entities;

and 2) most if not all operations on data start with SGML parsing, and

involve grove and node processing rather than character and line

processing.

9.6.2 Cees Wesseling

I think XML-schemata are better suited for this:

http://www.w3.org/TR/NOTE-xml-schema-req

It is more far more flexible and extendible, while still being a simple

XML

application and can represent atomic data types.

Although still sketchy in this note, I know a lot of people are working

on it.

Some web searching might be usefull.

I do have some remarks on Schemata from a presentation given

by Henry Thompson (http://www.ltg.ed.ac.uk/~ht/).

Alas only in PowerPoint, but if you want it...

cheers,

9.7 User-defined XML Tags not in the DTD?

How would terminal emulators display/cope with non-DTD tags?

Home


Bill Birch, Tuesday, June 08, 1999

1