Doc Infusion: 2014

Wednesday 30 July 2014

Generating PDFs with dblatex on Fedora

Recently, I experimented with converting DocBook to PDF using dblatex and this post summarizes my experience of installing and running dblatex on Fedora 20. But first I want to consider the motivation for doing this: why should anyone be interested in dblatex in the first place?

If you write your documentation in DocBook and you want to produce PDF output of a professional standard, there are a limited number of options available to you. For example:

FO processors (open source or commercial)
HTML-to-PDF converters (open source or commercial)
DocBook-to-LaTeX converters (open source)

FO Processors

In the FO (Formatted Objects) approach, DocBooks is converted to an intermediate format, an XML FO file, which is then converted to PDF. This approach was intended to be the canonical tool chain for publishing DocBook as PDF (and is fully supported by the standard DocBook XSL toolkit). Unfortunately, the FO approach seems to have run into difficulties in recent years. The FO processor is a complex tool (for example, see Apache FOP), requiring expert knowledge to fix bugs, add new features, or even just to tweak an FO template. So much so, that people are starting to turn away from FO and look for alternatives.

HTML-to-PDF Converters

The HTML-to-PDF converters offer a much simpler approach. These are available both in open source (for example, wkhtmltopdf) and commercial (for example, Prince XML) varieties. In this case, you convert DocBook to html-single format and then use the HTML-to-PDF converter to produce the final PDF output. The output that you get looks a bit like what you would get if you print a HTML page from a browser. In fact, this is no coincidence, because the wkhtmltopdf tool is based on WebKit, which is the engine that the Safari browser uses to print HTML pages. This approach is simple, but offers limited options for fine-tuning the page layout.

DocBook-to-LaTeX

Finally, there is the DocBook-to-LaTeX approach. LaTeX occupies something of a publishing niche, since it is mainly used for publishing in academia. But it has some notable strengths:

LaTeX has strong page layout capabilities. The requisites options are not necessarily easy to tweak, but they are there if you need them.
LaTeX has strong localisation support, through the BabelTex packages. In particular, it includes support for right-to-left languages, such as Hebrew and Arabic, in the ArabTex package.
A long history and a large user base, which suggest that this tool is not going to disappear anytime soon.

And it is also noteworthy (although not really relevant to documentation in the software industry) that LaTeX is superlatively good at typesetting mathematical equations. Really, nothing else comes close.

Install and Run dblatex

It is straightforward enough to install and run dblatex, but there are a few gotchas to watch out for. If you are lucky enough to have a Fedora system, installing dblatex is easy with yum:

sudo yum install dblatex

Now, if you have a straightforward DocBook book file, MyBook.xml, you can convert it to PDF with this simple command:

dblatex MyBook.xml

The output you get is a PDF file, MyBook.pdf (the intermediate files are automatically deleted at the end of the processing). However, converting my sample book (the Camel Development Guide) was a bit more complicated, because our documents use olink elements for cross-referencing between books. If you are familiar with the DocBook XSL toolkit, you will know that this means you need to generate link database files and a site layout file (site.xml). Assuming you already have these files (they can be generated using DocBook XSL), you need to specify the following properties to the dblatex command: target.database.document, which should be set to the absolute location of the site.xml file; and current.docid, which is the value of your book element's xml:id attribute. So, for the Camel Development Guide, I needed to enter the following dblatex command:

dblatex -P target.database.document=`pwd`/site.xml -P current.docid=CamelDev camel_dev/camel_dev.xml

(Note the use of `pwd` to get the absolute path to site.xml. This should not really be necessary; it's a minor bug in the dblatex tool.) So, what happened after running this command? Well, dblatex converted the 449 page long document without a hitch, which is a good start. On the other hand, what you get initially is the 'standard' LaTeX document look-and-feel (if you have ever seen the output of a typical LaTeX document, you'll know what I mean). So, the next question that presents itself is: how can I customize the layout and styling so that it has a more professional look-and-feel? I don't know the answer to that question yet, but it might well be the subject of a later blog post.

Friday 6 June 2014

GitCCMS Workflow for Updating Content Specs

In Pressgang, a content spec is an outline (like a table of contents) that assembles topics into a complete document. You can think of it as being analogous to a DITA map. As mentioned in a previous post, the GitCCMS utility can generate the content spec for you, if you enter the following command:

$ gitccms cs

Putting the generated content spec file into the location, cspec/<BookName>/contentspec.spec. You can then upload the new content spec using the standard Pressgang utility, csprocessor, as follows:

$ cd cspec/<BookNaem>
$ csprocessor create contentspec.cspec
...
Content Specification ID: 22781
Revision: 664539
...

If the content specification is successfully created, the new content spec ID is logged to the console (in this case, 22781).

But what if you make some changes to the structure of the book in your git repository and you need to update the content spec? How do you go about that using GitCCMS? First of all, you need to store the new content spec ID in your .git-ccms configuration file. Using your favourite text editor, open the .git-ccms file and add a cspec attribute to the relevant book element. For example, if you had just created the content spec for the book, esb_migration.xml, edit the configuration file as follows:

<context>
    <books>
        ...
        <book file="esb_migration/esb_migration.xml" cspec="22781"/>
        ...

And don't forget to commit that change:

git add .git-ccms
git commit -m "Added cspec ID for esb_migration book"

Now you are ready to regenerate the content spec using GitCCMS, as follows:

$ gitccms cs

If you take a peek at the regenerated content spec file, cspec/<BookName>/contentspec.spec., you can see that the ID is now included:

ID = 22781
# Book_Info.xml content
Title = Migration Guide
Subtitle = Migrating to Red Hat JBoss Fuse 6.1
...

You can now upload this revised content spec to Pressgang using the csprocessor push command, as follows:

$ cd cspec/<BookNaem>
$ csprocessor push contentspec.cspec
...
Content Specification ID: 22781
Revision: 664546
...

Wednesday 21 May 2014

Integrate Git with Pressgang CCMS using GitCCMS

As explained in the previous post, we recently developed a python script called GitCCMS, which functions as a bridge between the git revision control system and the Pressgang CCMS server (Red Hat's internal content management system). At the moment, the script is alpha quality only and is currently a one-way bridge: you can push content from git to Pressgang CCMS, but there is no support (yet) for pulling content back.

If you want to try it out, the GitCCMS script is available online at GitHub:

https://github.com/fbolton/GitCCMS

The script is useful, only if you have access to a Pressgang CCMS server.

Features

GitCCMS has the following features:

Supports DocBook 5 and (almost) DocBook 4.
If the source doc is DocBook 5, it gets converted to DocBook 4 format on the fly and uploaded as DocBook 4 (this is to optimize compatibility with existing content on Pressgang, which is mostly in DocBook 4 format).
Uploads topics and images.
Can generate content specs automatically.
Supports conditional text.
Supports XML entities.
Supports olinks (if you create a link between books, the link will be converted to pure text before it is uploaded to Pressgang, so it does not rely on Pressgang to interpret the olinks).

Configuring GitCCMS

The first thing you need to do is to create a configuration file for GitCCMS in your git repository.

Assuming that you have a repository, MyGitRepo, the first thing you need to do is to create a configuration file for GitCCMS. The configuration file must be called .git-ccms and must be stored in the root directory of the repository, MyGitRepo. Here is an example of a .git-ccms file:

<?xml version="1.0" encoding="UTF-8"?>
<context>
    <books>
        <book file="BookA/BookA.xml"/>
        <book file="BookB/BookB.xml"/>
    </books>
    <entities file="Library.ent"/>
    <ignoredirs>
        <dir name="camel"/>
        <dir name="amq"/>
    </ignoredirs>
    <topicelements>
        <element tag="simplesect"/>
        <element tag="section"/>
        <element tag="info"/>
    </topicelements>
    <profiles>
        <profile name="default">
            <project name="Fuse">
                <category name="Assigned Writer">
                    <tag name="johndoe"/>
                    <tag name="janedoe"/>
                </category>
                <category name="Product">
                    <tag name="JBoss Fuse"/>
                    <tag name="JBoss A-MQ"/>
                </category>
                <category name="Release">
                    <tag name="6.1"/>
                </category>
            </project>
            <conditions>
              <condition match="jbossfuse"/>
              <condition match="comment"/>
            </conditions>
        </profile>
    </profiles>
</context>

Note the following points about this GitCCMS configuration:

The books element is used to list all of the book files in your git repository.
The entities element specifies a file containing the definitions of all the XML entities appearing in your books.
The ignoredirs element is useful, if you are using git submodules. The files under the specified directories will not be considered when it comes to creating or updating topics and images (but they are considered for the purpose of compiling tables of cross-references).
The topicelements element is used to specify which DocBook element types are mapped to topics in the Pressgang CCMS. It is probably best to set this element exactly as shown above. It is not really very flexible at the moment.
The conditions tag is used to specify the conditions that are enabled when you are using conditional text (this is a standard DocBook feature).
The project/category/tag elements are used to specify what tags are assigned to newly created topics.
Some of the configuration settings are specified inside a profile element. In the future, this should allow you to switch configurations easily. But at the moment, only one profile should be used and it must be called default.

Using GitCCMS

Assuming that you have configured GitCCMS as described above, you are now ready to start using GitCCMS. If you have not done so already, you can add the gitccms script to your PATH. For example, on a Linux or UNIX platform:

export PATH=<GitCCMSInstall>/bin:$PATH

You can push your DocBook source from the git repository up to the Pressgang server by entering the following command (which must be executed from the top-level directory of your git repository):

gitccms push

By default, this command pushes content to Red Hat's development server (used for testing only). Alternatively, you can specify the Pressgang hostname explicitly using the --host option. If the script completes successfully, you should see that a new hidden file, .git-ccms-commit, is created. This file records the SHA of the last commit that was synchronized with Pressgang. GitCCMS uses this SHA to figure out what changes between synchonizations.

Generating content specs

After uploading your documentation to the Pressgang server, you can generate content specs for all of your books by entering the following command:

gitccms cs

You can find the generated specs under the cspec/ directory. You can upload the content specs to Pressgang CCMS using the standard Pressgang csprocessor client (not part of GitCCMS).

Monday 19 May 2014

Migrating to Pressgang CCMS

When FuseSource was originally acquired by Red Hat, back in September 2012, we brought with us half a dozen git repos containing our documentation source in DocBook 5 format. We soon discovered that Red Hat had quite a different documentation infrastructure and tool chain from the homegrown tools we used at FuseSource. In particular, a Content Management System (CMS) called Pressgang CCMS (a topic-based CMS) was shaping up to the be future of documentation in Red Hat

Initially, migrating to Pressgang CCMS posed some challenges that we were not in a position to solve right away. So, we put the migration off for a while. Now that we have released JBoss Fuse 6.1, we have the time to tackle it at last. Since we first looked at Pressgang CCMS a year and a half ago, a lot of things have changed. In particular, it is now much easier to import documentation into Pressgang using its built-in import tool. Support for DocBook5, XML entities, and conditional text have also been added, which makes migration a lot easier.

But we found it hard to face saying 'goodbye' to git revision control. Git offers us a number of features not available in Pressgang CCMS. Conversely, Pressgang CCMS also provides features not available in git.

Advantages of using Git

Here are some of the advantages that git offers us when managing documentation:

Branching
Ability to detect and merge conflicting changes made by different
writers to the same topics.
Commits [that is, the capability to save a set of related changes in
multiple topics in one atomic operation]
Simplified workflow for integrating and merging upstream Community docs,
when those docs are in DocBook format.
Cherry-picking bug fixes across branches
Ability to grep/sed/find/apply any script to the entire doc set in a
single operation, when needed.
Ability to commit changes when working offline.

Advantages of using Pressgang

On the other hand, Pressgang has some pretty nice features too:

Ability to share topics across all Red Hat products
Querying for metadata on topic re-use
Automatic book builds and continuous book building
Integration with Zanata translation software and localization processes
Centralized quality checks [spelling, link-checking, compliance with style guide]

Having it all

The solution to this dilemma, evidently, is to have your cake and eat it too. What if we had a tool that created a bridge between git and Pressgang CCMS, so that our documentation source could be moved freely back and forth between them? We have been working on such a tool over the past few months: a python script called GitCCMS. That will be the subject of the next blog post.