MiniGML - About a 1986 DOS Program and the Roots of HTML

29. December 2001

My wife and I used the Christmas holiday to refurnish our flat, and while moving my paperwork from the old to the new cabinets, I discovered some old program listings - stuff I have written in my early days with IBM, in PL/I and Turbo Pascal. One of these programs is MiniGML, and while looking at the yellowed code printout and the user manual, I was amazed to see that I wrote kind of a "browser" long before the World Wide Web was invented. I have scanned and reformatted the MiniGML user manual in HTML, maintaining the look of the 80ies.

The MiniGML user manual illustrates what was "state of the art" in the middle of the 80ies, in the middle between the invention of SGML and Tim Berners-Lee's early WWW proposal, and it made me think about the history of SGML and HTML.

Goldfarb's Invention and IBM's DCF/GML Product

Charles F. Goldfarb, an attorney practicing in Boston, joined IBM in 1967 and started to work on a document markup language project in 1969. At this time, the Graphic Communications Association (GCA) started to promote a new idea - the separation of content from format, and this had a strong impact on Goldfarb's work. Later on, together with some fellow researchers, he invented GML as a mean to describe the structure and content of a document. IBM decided that GML had significant product potential, and implemented GML on top of the DCF (Document Composition Facility) Script language.

DCF Script was a text formatting language that used inline commands like .ce to center a piece of text or .sp 3 to add three lines of vertical space. GML tags (yes, they were called tags) were implemented as a collection of Script commands (similar to a macro). A GML tag like :ol. contained all necessary Script commands to format an ordered list. While DCF Script command were used to describe the physical formatting of the printed page, GML tags were used to describe the logical structure of a document. GML relates to DCF Script in the same way HTML relates to CSS: GML and HTML describe the logical structure, while DCF Script and CSS describe the physical representation.

Somewhere between 1975 and 1978 (I believe), IBM made GML available as an extension of its mainframe-based DCF product. DCF and GML were implemented as "text compilers" - they took a file with Script commands and GML tags as input, processed it in batch mode, and sent the formatted output to a printing device. DCF/GML later on became the IBM Bookmaster.

IBM used DCF/GML heavily for all internal and external documentation. While the sales folks and the secretaries continued to use DWS machines or 8100-based text processing, the engineers were using DCF/GML. Focusing on the content (what do I say) instead of the format (how do I say it) became natural to us quite quickly.

Goldfarb later on contributed to the development of SGML as an ISO standard. He has published many articles on this subject, and he also has made available his view of the history of SGML.

DCF/GML and the IBM Personal Computer

In 1981, IBM announced the Personal Computer and started to use it internally. Many of us in IBM enjoyed the PC and used it as a host terminal and as a personal productivity device. However, we felt that the clumsy text programs available for DOS (EasyWriter, DisplayWrite, WordStar and so on) were a step backward from the content-centric approach we had with GML. In 1985, I started to develop MiniGML. It was a freetime project; I had no sponsoring from IBM, and I did all of the development work at home, but I was relying on IBM's internal bulletin board system (remember RSCS?) to get feedback and advice from senior IBM researchers and developers all over the world.

While I was using PL/I for my host-based development work, I decided to go with Turbo Pascal for MiniGML. My PC had just one 360KB floppy drive and 256KB memory. Turbo Pascal was known for its small footprint and high speed. The MiniGML executable was some 30KB in size! It was implemented as a DOS filter, i.e. it used input and output redirection (piping). MiniGML implemented a subset of the DCF Script commands and GML tags, but the subset was a fairly complete implementation of the "general document" definition of GML.

MiniGML never made it to a true IBM product, but it was used by hundreds of IBM engineers and researchers who were used to the mainframe-based GML and wanted to have something similar on the PC. There were other GML implementations for PCs, both within IBM and commercially (but not from IBM), but we GMLers failed to explain the basic concept - the separation of structure from format - to the growing PC community. In 1988, I left IBM, and I no longer maintained MiniGML. In my new jobs, I had to deal with commercial PC text applications like Lotus Manuscript and IBM DisplayWriter. While I always felt that these products were clumsy, I forgot about the beauty of GML until the early 90ies.

DCF/GML, the CERN and HTML

In 1991, Tim Berners-Lee, a CERN employee, published his first proposal for a hypertext system he called "World Wide Web". Berners-Lee described a tag language that looked surprisingly similar to IBM's DCF/GML:

DCF/GML HTML
:ol.
:li.Ordered list item 1
:ul.
:li.Nested unordered list item
:li.Nested unordered list item
:eul.
:li.Ordered list item 2
:eol.
<ol>
<li>Ordered list item 1
<ul>
<li>Nested unordered list item
<li>Nested unordered list item
</ul>
<li>Ordered list item 2
</ol>

The CERN at this time was using DCF/GML for many major documentation tasks. Tim Berners-Lee was a hypertext guru and a DCF/GML user, so the similarities between GML and HTML are not just accidental.

Conclusions

History repeats itself... Ironically, HTML started as a content tagging language, but was perverted to a formatting language as it became increasingly popular. Dave Raggett's 1993 HTML+ specification, for example, did not contain a <font> tag. But as the Web became popular (and commercial), presentation became more important. In the years between 1995 and 1999, the original concept of separating structure from format was nearly lost. As support for CSS is growing in today's browsers, the concept comes back to live, and Goldfarb's vision finally becomes true, 30 years after he noted it in 1971:

The principle of separating document description from application function makes it possible to describe the attributes common to all documents of the same type. ... [The] availability of such 'type descriptions' could add new function to the text processing system. Programs could supply markup for an incomplete document, or interactively prompt a user in the entry of a document by displaying the markup. A generalized markup language then, would permit full information about a document to be preserved, regardless of the way the document is used or represented.