What is Docbook?

It’s an XML-based NON-PRESENTATION format for specifying document content. The Docbook standard specifies tags for chapters, tables of contents, examples, procedures, a whole lot of stuff. It's a big toolbox of XML tags with a pre-defined structure and layout (ie, there's rules behind the madness). It's aimed towards wiring books and articles mainly for technical purposes (specifically computer code).

What's non-presentation mean?

Non-presentation means there's no formatting. Think of Word or OpenOffice Writer. You can't just type words, you have to worry about how it's going to look. Put a bullet point here, indent, boldface, italics, heading1, etc etc. So while you're trying to write content you have to stop and move something or indent it or decide which font to use. It's annoying.

It's not that annoying

Well, it depends. If you're a writer you have to worry about all of that formatting mess anyhow, so you have to pay attention to it. There are tools in Word and I'm sure OpenOffice to allow you to define styles and whatnot and that makes your job easier. Bottom line: if you're involved in every step of the process, no worrying about format isn't that bad. However if you're a technical type like me formatting is the bane of your existence. It has nothing to do with the content of what you're writing, and I'm not paid to be a Word expert. Peer reviews turn into one big complaint about formatting and it wastes time. Docbook fixes that. The formatting is controlled at a single point which everyone uses. Formatting complaints? One person is probably assigned to keep track of the style sheets that control formatting, so talk to them. Talk to me about the content.

That's not really a panacea

You're right: docbook alone is good but not astounding. You'll still have style issues like should this be an ordered list or an unordered, what tags are we using for this, etc etc. It will just push the majority of the minor work to one person instead of spreading it around. Good but not amazing.
What IS amazing is that docbook can save us even more work. If you assume that one docbook file = one document then you only see the gains listed above. But docbook can be modularized. If you have content that is repeated across several different documents it can be put in its own XML file and included in each document it's used in. That issue comes up a lot and wastes time, especially when information changes a lot. Those changes have to et to the people that care about them, and this is one way to do it.

Let's get even MORE technical. XML is text, which makes finding differences between versions easy. That means that it lends itself towards version control. Version control can remove the mess of figuring out which version of the file you have hanging around the shared drive is the newest. And it can also be integrated with a publishing method, ie, put it on a webserver. Your XML files can include content directly from the web server, and if version control is used, it will always be the latest version. It will always be correct.
Malfunctioning Eddie is crazy. He has this crazy idea that you could also import the revision history from the version control system and automatically include it in your document. Malfunctioning Eddie is crazy. It's not like DocBook has built-in revision tags or anything.

Another benefit is that XML can store any data in any form that can be defined by tags with attributes. That's a lot. Nearly infinite. What's better is that XSL can transform those tags into any other XML or HTML tags (and then from there to PDF, etc). So if you have a list of requirements, you can turn it into paragraphs of text or a table with columns and then include those in different documents. You can select only certain types of records, only certain information and transform it any way you want. Write the requirements ONCE and include it in ten different documents ten different ways.

Want me to go further? Ok, pipe dream time. We have all kinds of systems for tracking data. We have databases, word documents, text files weblinks, you name it. Take DOORS for instance. It stores requirements in a database. Good. Central point for all data. Say you edit that database and change something. You want those changes included everywhere they're pertinent. With a bit of server-side scripting magic, every time you change something in the DOORS database you can have it create a new XML document with the information automatically. Then people can include that information in their documents.
You can do this for ANY type of data whether it's in a text file or a database. Perl, Python, PHP, they can all be used to retrieve information from a database and distill it into an XML file. You could even create a web-based interface to edit the data in the database, and then automatically create the new XML file after everything was validated. If you want to get SUPER crazy you can create an approval process to verify the changes made... at this point i'm insane.

THe possibilities of an integrated, versioned and controlled data pipeline are nearly infinite.

Ok it is pretty cool


Yeah.

Toolchain


You need these files if you want to regenerate what I've done

Wait where did DocMan Go?


To those of you who have been keeping up with this page ( who ARE you anyway?) you may have noticed that I've sacked DocMan for eDe. There is a reason for this. DocMan provided a graphical interface and PDF output, which attracted me. However, it produces NO ERROR OUTPUT! It would merely crash if something wasn't perfect (something as simple as a misspelled file in an import statement). Also, it was not scriptable, it was rather finicky, and it had no help. I'm not sure that it's still actively supported. I had found eDe before but didn't like it because it produced odd-looking pdf output and had (what I thought at the time) was a retarded directory structure system. Let me describe: Say that you have a DocBook file called, oh I dunno,example.xml. You would expect that with eDe's built-in batch files you could say 'docbook_pdf example.xml' and you'd get a PDF output in the same directory. Not true. By default eDe has a single directory under c:\docbook\repository where you have to put all of your docbook files, each separate file in its own subdirectory. So theexample.xml file had to go to C:\docbook\repository\example\example.xml. Then the output would go to C:\docbook\output\example\pdf\example.pdf I didn't like that. However, if you muck around with the batch files a bit you can make more sensible. By removing a couple of lines you can allow docbook files to be changed from any directory. And since eDe automatically adds itself to the PATH it's easy to run from the command line. So my file can go in <anywhere>\example\example.xml and the output will go to <anywhere>\example\pdf\example.pdf. I like that better. Plus with the NppExec plugin I can run the whole process from Notepad++, no Alt-Tabbing to DocMan, and I have set it up to automatically load the PDF up when it's done. Also, I can see the errors in the console window, so no more guessing! Also, eDe is a lot more organized than DocMan was. It has custom style XSL files for each type or document (book, article, etc). And within these files they have COMMENTS! Yes! Comments AND allowed options for each parameter! My God it makes SENSE! That does make things a lot easier. Plus they have custom files where you can put your own options. It's very good and hopefully it will make things easier.

Installation Steps



  1. Install Notepad++
  2. Unzip the External Libraries into Notepad++'s Program Files directory
  3. Unzip the XML Plugin into the Notepad++ Plugin directory
  4. Unzip the NppExec Plugin into the Notepad++ Plugin directory
  5. Install eDe
  6. You have to edit the eDe batch files to make things easier on yourself. They are located in C:\docbook\bat. The first one is docbook_configuration.bat. In it there are two lines: set docbook_document_repository and set docbook_document_output. Type 'rem' before each of the lines:
 rem set docbook_document_repository = C:\docbook\repository
 rem set docbook_document_output = C:\docbook\output
  1. Add the path for your PDF viewer to the set docbook_pdf_viewer line like so:
 set docbook_pdf_viewer="C:\Program Files\Adobe\Acrobat 7.0\Reader\Reader\acrord32.exe"
  1. I like to view any errors in the conversion process and have Adobe automatically open the new PDF after it's created. To do that add the following line to the docbook_pdf.bat file right before the :end tag:
 echo eDE: %1.pdf created
 
 rem NEW LINES START HERE
 pause
 .\pdf\%1.pdf
 rem NEW LINES END HERE
 
 :end
  1. Now you can add your parameter and attribute definitions to the proper stylesheet. For eDe this is located in C:\docbook\stylesheet\custom_book_fo.xsl (of course, we're making a book in FO/PDF, so change the stylesheet if you're interested in something else). The current set of parameters and options that I use is here:
 <!--Custom titlepage-->
 <xsl:import href="titlepage_custom_fo_table.xsl" />
 
 <!--These fix the problem where the titlepage was not centered-->
 <xsl:param name="title.margin.left">0pt</xsl:param>
 <xsl:param name="double.sided">0</xsl:param>
 
 <xsl:param name="admon.graphics" select="1"/>
 <xsl:param name="admon.graphics.extension" select="'.gif'"/>
 <xsl:param name="fop.extensions" select="1"/>
 <xsl:param name="admon.graphics.path">resources/</xsl:param>
 
 <xsl:param name="callout.graphics.path" select="'resources/'"/>
 <xsl:param name="callout.graphics.extension" select="'.gif'"/>
 <xsl:param name="callout.graphics.number.limit" select="'15'"/>
 
 <xsl:param name="insert.xref.page.number" select="0"/>
 
 <xsl:param name="paper.type" select="'A4'"/>
 
 <xsl:param name="use.extensions" select="'1'"/>
 <xsl:param name="chapter.autolabel" select="'0'"/>
 <xsl:param name="section.autolabel" select="'1'"/>
 <xsl:param name="segmentedlist.as.table" select="1"/>
 <xsl:param name="toc.section.depth">5</xsl:param>
 
 
 
 <xsl:attribute-set name="section.level1.properties">
 <xsl:attribute name="break-before">page</xsl:attribute>
 </xsl:attribute-set>
 
  1. We're not done yet! Copy the titlepage_custom_fo_table.xsl file into C:\docbook\stylesheet directory. I think in the future I'll just include this from somewhere.
  2. Still not done. Now we have to add the custom execution command to Notepad++. Open Notepad++ and go to Plugins->NppExec->Execute and paste this code:
 npp_run docbook_pdf $(NAME_PART) .\pdf
And save it as 'Create PDF from Docbook' or whatever you like. Then cancel out
  1. That's it, there is no 12.

Example

  1. Open notepad++ and copy the below example into a new file:
<?xml version="1.0"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.2//EN"
"http://www.oasis-open.org/docbook/xml/4.2/docbookx.dtd" >
 
 
<book><title>Mary and the Lamb</title><subtitle>A Sordid Affair</subtitle>
 <bookinfo>
 <author>
 <firstname>Stephen</firstname>
 <surname>Friederichs</surname>
 </author>
 </bookinfo>
 <chapter><title>The Tale of Mary</title>
 <para>
 There are not many stories more frightening than the story of Mary and her lamb.
 Many people ask: how is such a thing possible? Where did Mary and the lamb go wrong?
 To answer these questions we must take an in-depth look at Mary herself.
 </para>
 <para>
 Mary - what does the name imply? A virgin perhaps? Not likely.
 Before her role in the lamb scandal Mary was arrested three times for
 prostitution. She's had seedy underworld connections her entire life.
 Her uncle was none other than three-time world champion mafioso Homer Euclid.
 Her father sold wine to the Deigos, her mother mad synthetic gin, and her
 sister was a Gamma Phi Beta. My God, how the money rolled in!
 </para>
 </chapter>
 <chapter><title>The Story of the Lamb</title>
 <para>
 Taking a look at the lamb we see that it is an innocent in this whole affair.
 Barely six months old and newly out of graduate school with a degree in
 psychology, the lamb was optimistic and trusting. After meeting Mary and
 getting caught up in the glamor of her life, the lamb was too well entrenched to
 go anywhere else.
 </para>
 <para>
 The lamb's descent was slow and gradual, but with Mary at the helm, certainly assured.
 The drugs, sex, drinking, sheering and partying seemed like it would go on forever.
 With the money steadily flowing nothing was interrupting the party. It seemed
 that the lamb had found its ticket to the good life.
 </para>
 </chapter>
 <chapter><title>The Incident</title>
 <para>
 We're all fully aware of the incident. It pains me to even think about it. Oh God, I can't!
 *sobs*
 </para>
 
 </chapter>
</book>
 
 
 
 
  1. Save the file as ACA_TDD.xml
  2. Go to Plugins->NppExec->Execute and run the 'Create PDF from DocBook' command that you set up earlier. It will make the PDF and display it.

Next Steps/Links

Here's a good How-To page. Lots of great explanations: DocBook How-To
This page shows you how to start up on windows: Windows How-To
Amazing freakin tutorial website for all things xml: http://xmlzoo.net/
This is a blog from someone who used DocBook and somewhat of how they did it: http://docbook.theblog.ca/
Docbook reference: http://www.docbook.org/tdg/en/html/docbook.html

Docbook Customization

Parameters

Docbook has several options (also called parameters) built-in that can affect the look of the resultant PDF. They are stored in an XSL file underneath the Docman folder. Below is the list of parameters, the location of the file, and settings I have added.
List of parameters for FO transform of Docbook
http://docbook.sourceforge.net/release/xsl/current/doc/fo/
Or to just look at the file on your local computer: DocMan/docbook/docbook-xsl/fo/param.xsl
The big-bad guide to docbook XSL customization is here: http://www.sagehill.net/docbookxsl/
The file on the local hard drive where you can add these options:
C:\Program Files\DocMan\docbook\docbook-xsl\fo\fo.xsl
Alternately you can use the db-xsl-focfg program: http://sourceforge.net/project/showfiles.php?group_id=119513&package_id=130191&release_id=611149
There are several different versions on that page, you will want the FO version. I assume it will generate an XSL file in the end that you can import in fo.xsl, but
Options I prefer to set to emulate engineering document styles:
In fo.xsl:
 <!--Custom titlepage-->
 <xsl:import href="titlepage_custom_fo_table.xsl" />
 
 <!--These fix the problem where the titlepage was not centered-->
 <xsl:param name="title.margin.left">0pt</xsl:param>
 <xsl:param name="double.sided">0</xsl:param>
 
 <xsl:param name="admon.graphics" select="1"/>
 <xsl:param name="admon.graphics.extension" select="'.gif'"/>
 <xsl:param name="fop.extensions" select="1"/>
 <xsl:param name="admon.graphics.path">resources/</xsl:param>
 
 <xsl:param name="callout.graphics.path" select="'resources/'"/>
 <xsl:param name="callout.graphics.extension" select="'.gif'"/>
 <xsl:param name="callout.graphics.number.limit" select="'15'"/>
 
 <xsl:param name="insert.xref.page.number" select="0"/>
 
 <xsl:param name="paper.type" select="'A4'"/>
 
 <xsl:param name="use.extensions" select="'1'"/>
 <xsl:param name="chapter.autolabel" select="'0'"/>
 <xsl:param name="section.autolabel" select="'1'"/>
 <xsl:param name="segmentedlist.as.table" select="1"/>
 <xsl:param name="toc.section.depth">5</xsl:param>
 
 
 
 <xsl:attribute-set name="section.level1.properties">
 <xsl:attribute name="break-before">page</xsl:attribute>
 </xsl:attribute-set>
 
 
In my titlepage_custom.xsl:
 <!--This fixes the issue where text in the titlepage was being wrapped around the line with a hypen-->
 <xsl:attribute-set name="book.titlepage.recto.style">
 <xsl:attribute name="hyphenate">false</xsl:attribute>
 </xsl:attribute-set>

Other links:

http://www.dpawson.co.uk/docbook/styling/fo.html
http://www.dpawson.co.uk/docbook/styling/custom.html
http://www.sagehill.net/docbookxsl/CustomizingPart.html

Attribute Sets

In the options code above you can see something called an attribute-set. Attribute sets are collections of formatting attributes given names that are saved for wide-use. Above, the section.level1.properties attribute set includes several attributes, and the one I change inserts a page break before every major section change. There are TONS of these attribute sets, and some of them just link directly to other attribute sets which are stored in entirely different files, and usually hidden. Luckily, if you can find the name of the attribute set you want to work with and you know what attributes you need to set you can just throw them in the fo.xsl file under DocMan/xsl and they will be added to the attribute set you wish. However, knowing the attributes you want to set is now always easy, and finding out which attribute set you want to work with is only slightly easier. Good luck with this one!
Attribute Sets Guide in the DocBook XSL Guide - http://www.sagehill.net/docbookxsl/AttributeSets.html

eDe makes it easier to handle all of this configuration than DocMan did. Several important files:
  • C:\docbook\stylesheet\e-novative_book.xsl - Several standard options for font, column count, alignment, etc
  • C:\docbook\stylesheet\e-novative_book_fo.xsl - FO specific options for books
  • C:\docbook\xsl\fo\pagesetup.xsl - Margins and such, but DON'T CHANGE IT! Find what you want then copy and paste to your custom.xsl file
  • C:\docbook\xsl\fo\param.xsl - Many parameters, but again, DON'T CHANGE IT! Find what you want then copy and paste to your custom.xsl file

Custom Titlepages


There are two parts to creating a custom titlepage. First, you can specify what elements (such as corpauthor, revision history, etc) should appear on the page, and then you specify their formatting, or how they should look.

Custom Titlepage Elements/Oder of Elements


You can add or remove elements that are in the **info tag block (ie, bookinfo, articleinfo) from the titlepage through this process:

First you need a template, below is an example of a custom titlepage format template file for PDF output:
 <t:templates xmlns:t="http://nwalsh.com/docbook/xsl/template/1.0">
 <t:titlepage t:element="article" t:wrapper="fo:block" class="titlepage">
 <t:titlepage-content t:side="recto">
 <title/>
 <subtitle/>
 <corpauthor/>
 <authorgroup/>
 <author/>
 <othercredit/>
 <releaseinfo/>
 <copyright/>
 <legalnotice/>
 <pubdate/>
 <revision/>
 <revhistory/>
 <abstract/>
 </t:titlepage-content>
 
 <t:titlepage-content t:side="verso">
 </t:titlepage-content>
 
 <t:titlepage-separator>
 </t:titlepage-separator>
 
 <t:titlepage-before t:side="recto">
 </t:titlepage-before>
 
 <t:titlepage-before t:side="verso">
 </t:titlepage-before>
 </t:titlepage>
 
 </t:templates>
 
 
This file determines what information goes on the
There are several parts to a titlepage:
  • before-recto
  • recto
  • before-verso
  • verso
  • side something?
  • titlepage-separator - dunno

recto is the front, verso is what's on the back of it. This titlepage layout assumes that you'll be printing on the front and back of pages.

Important parts of this file:
  • t:templates/t:base-stylesheet - this file is imported into the resulting titlepage sytlesheet before all of the custom options - REMOVE THIS! Leaving it in caused DocMan to fail. If you include the generated titlepage XSL file as described below then this process works just fine
  • t:titlepage/t:content - this attribute tells us what type of document (article, book, etc) this titlepage is for
  • t:titlepage/t:wrapper - this defines the default object that information must be 'wrapped' in.

Make sure you have the t: before the attributes in the t:titlepage and t:templates, otherwise NOTHING WORKS when you transform the template file.

Next, once you have that nice template file, you transform it with the titlepage.xsl file in: C:\Program Files\DocMan\docbook\docbook-xsl\tempate\titlepage.xsl

Titlepage Element Format


You can change the way that an element on the titlepage looks and behaves. For instance, you can make the <corpauthor> tag add an image such as a coporate logo using the instructions on this page: http://www.sagehill.net/docbookxsl/TitlePagePrint.html#TitlepageElementTemplates

Here's the code they use to work the magic:
 <xsl:template match="corpauthor" mode="book.titlepage.recto.mode">
 <fo:external-graphic>
 <xsl:attribute name="src" select="$my.corporate.logo"/>
 </fo:external-graphic>
 <fo:inline color="blue">
 <xsl:apply-templates mode="titlepage.mode"/>
 </fo:inline>
 </xsl:template>
As per this webpage: http://www.dpawson.co.uk/docbook/styling/titlepage.html

Using the New Titlepage


To use the new formatting you've created you have to add custom titlepage information to the fo.xsl file (located in DocMan/xsl/fo.xsl) after the initial includes:
 <?xml version='1.0'?>
 <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 xmlns:fo="http://www.w3.org/1999/XSL/Format"
 xmlns:doc="http://nwalsh.com/xsl/documentation/1.0"
 exclude-result-prefixes="doc"
 version='1.0'>
 
 <!-- Profiling does not work -->
 <xsl:import href="http://docbook.sourceforge.net/release/xsl/current/fo/docbook.xsl"/>
 <xsl:import href="focode.xsl"/>
 
 <!-- INSERT RIGHT HERE!!! -->
 
 <xsl:param name="admon.graphics" select="1"/>
 <xsl:param name="admon.graphics.extension" select="'.gif'"/>
 <xsl:param name="fop.extensions" select="1"/>
 <xsl:param name="admon.graphics.path">resources/</xsl:param>
 
 <xsl:param name="callout.graphics.path" select="'resources/'"/>
 <xsl:param name="callout.graphics.extension" select="'.gif'"/>
 <xsl:param name="callout.graphics.number.limit" select="'15'"/>
 
 <xsl:param name="insert.xref.page.number" select="1"/>
 
 <xsl:param name="paper.type" select="'A4'"/>
 
 <xsl:param name="use.extensions" select="'1'"/>
 <xsl:param name="chapter.autolabel" select="'0'"/>
 <xsl:param name="section.autolabel" select="'1'"/>
 <xsl:param name="segmentedlist.as.table" select="1"></xsl:param>
 
 <xsl:attribute-set name="section.level1.properties">
 <xsl:attribute name="break-before">page</xsl:attribute>
 </xsl:attribute-set>
 
 
 <!-- ==================================================================== -->
 
 </xsl:stylesheet>
 
Basically anything that you include at that point will overwrite what was included earlier. The first include (docbook.xsl) includes also all of the other XSL sheets for the FO code. they are also available locally under DocMan/docbook/docbook-xsl/fo
But DON'T CHANGE THEM! We want the baseline docbook XSL to be intact, we can just add things on top with includes from the fo.xsl file and replace the existing code. No need to break anything.

Changing the Title Page Layout


This is where it gets complicated but unfortunately this is the most useful thing you can do with the title page. The most you can do with the XSL generation method described above is to add elements to the title page and change how they look but NOT change WHERE they are placed on the titlepage. They're almost always just centered and printed right down the middle of the title page. Some things are bolded, made larger, etc, but they're all placed in the same place. To do that you need to forgo the above process and generate the titlepage.xsl file directly. Here is a link that describes part of the process: http://www.sagehill.net/docbookxsl/TitlePagePrint.html#TitlepageTableLayout

If you were doing an HTML output file then you would lay out the page with <div> elements and such. For a PDF you have to use FO block, which I was not familiar with. Here is a site that details FO markup: http://www.w3schools.com/xslfo/xslfo_blocks.asp

I have had all kinds of trouble getting a custom layout to even pass the conversion process. Most of the time DocMan just closed unexpectedly. After much trying I've found a section of FO markup that does not crash DocMan and renders pretty much what it should:

 <xsl:template name="book.titlepage.recto">
 
 <fo:block text-align="center" xmlns:fo="http://www.w3.org/1999/XSL/Format">
 <fo:table border-width="0.1pt" border-style="solid">
 <fo:table-column column-width="50mm"/>
 <fo:table-column column-width="50mm"/>
 <fo:table-column column-width="50mm"/>
 <fo:table-body>
 <fo:table-row >
 <fo:table-cell border-width="0.1pt" border-style="solid">
 <fo:block>
 ORIGINATOR :
 <xsl:apply-templates mode="book.titlepage.recto.auto.mode" select="bookinfo/author"/>
 </fo:block>
 </fo:table-cell>
 </fo:table-row>
 </fo:table-body>
 </fo:table>
 </fo:block>
 
 </xsl:template>
The complete file can be found here: titlepage_custom_fo_table_working.xsl

Things that need to be updated for engineering use:

The title page - needs to match the format used elsewhere
Need new root tag to be section or cancel some of the behavior of chapter
New tags:
  • Next Assy
  • Used On
  • Application?
  • Checked by
  • Contract No.
  • Drawing No.
  • Approvals of:
  • Engineer
  • Project Engineer
  • System Engineer
  • Quality
  • Component Engineer
  • Cage Code
  • Scale

Header and footer with specific info/format
Automatic acronym table generator with xref link to acronym definition
automatic referenced document table generator with olink to document
Links that may prove useful:
http://abbeyworkshop.com/howto/xslt/xslt-fop/index.html
http://www.dpawson.co.uk/docbook/styling/titlepage.html