Tuesday, 30 April 2013

Documentation and Version Control

In this post I'm going to take a look at the version control requirements for storing and archiving documentation. It's worth considering what those requirements are, because they are not identical to the version control requirements for developing application code. Documentation requires many, but not all, of the features offered by classic developer-oriented revision control systems. On the other hand, many commercial Content Management Systems (CMS) do not offer the kind of flexibility that is required to maintain a professional documentation repository.

Here are the features I consider essential for a documentation-friendly revision control system:
  • Resolving collisions
  • Atomic commits
  • Revert/undo commits
  • Diffs between commits
  • Branching
  • Sub-projects
And here are some features I consider nice-to-have:
  • Merging branches
  • Cherry picking
And, finally, a non-requirement:
  • Tagging

Resolving Collisions

If there is more than one person on your docs team, it is reasonable to suppose that, sooner or later, you are going to collaborate on a document. In this case, it would be supremely annoying if you both happen to submit changes to the same file at the same time, and the documentation system simply over-writes one of the versions. The documentation system must therefore have some means of detecting and resolving such collisions. This is especially important if writers want to work offline (and they usually do), because collisions are then more likely to occur.

Revision control systems usually tackle this problem in one of two ways. Either by locking the file (so that one writer gets exclusive rights to update the file for as long as the lock is held) or by merging (where the revision control system requires you to merge the changes previously made by other users before you can upload your own changes).

Atomic Commits

Atomic commit means that you can submit multiple changes to multiple files in a single commit step. This has several advantages:
  • The commit log is much less cluttered, because you can group several changes into a single commit entry. 
  • It helps to keep the docs in a consistent state. For example, if you change the ID of a link target, this might break one or more cross-references and it might take multiple subsequent changes to multiple files to fix all of the broken links. If you can roll all of these changes into a single commit, you can ensure that the cross-references remain unbroken, both before and after the commit.

Revert/Undo Commits

From time-to-time we all make mistakes, so the capability to undo a commit is a welcome feature. Strictly speaking, Git does not allow you to undo a commit, but it enables you to commit the inverse of a previous commit, which amounts to the same thing.

Diffs between Commits

Diffs between commits are every bit as useful for technical writers as they are for developers. They enable you to keep track of changes made by your collaborators; and they enable you keep track of your own changes. In fact, the ability to make diffs between commits is one of the major reasons for keeping a version history in the first place.

Branching

There are various ways you can put branches to good use in a revision control system. The most important application of branching, from a documentation perspective, is for tracking past versions of the documentation.

In a documentation repository, it is natural to create a separate branch for each release. So, for example, you might have branches for versions 1.0, 1.1, 1.2, 1.3, 2.0, and 2.1 of your documentation. Anytime you need to go back, say, to fix an error or to add a release note to an earlier version, all you need to do is to check out the relevant branch, make the updates, and re-publish the docs from that branch. Moreover, sometimes fixes or enhancements made to an earlier version can also be applied to the current version (or vice versa) and it is particularly nice, if you have the option of cherry-picking the updates between branches.

This is a basic requirement, if you intend to do any maintenance at all of older documentation versions (and given that your customers are likely to have support agreements for the older products, it seems pretty much inevitable).


Sub-Projects

In a complex product, it is likely that you will need to use sub-projects at some point (that is, a mechanism that enables you to combine several repositories into a single repository). This can become necessary, if a product consists of multiple sub-products, so that the corresponding library is created by composing several sub-libraries.

The kind of mechanisms you can use to implement sub-projects include svn:external references in SVN or submodules in Git.

Although Git is, in most respects, a wonderful revision control system, its implementation of submodules does suffer from a couple of drawbacks:

  • You cannot mount an arbitrary sub-directory of the external sub-project in the parent project (like you can do in SVN), only the root directory.
  • Whenever you update the parent directory (for example, by doing a git pull), Git does not automatically update the submodules. This is probably the correct policy for working with application code, where you need to be conservative about updating dependencies, in case you break the code. But in the case of documentation, you normally want the submodules to point at the latest commit in the referenced branch. It is a nuisance to have to constantly update the submodules manually and then check those updates into the parent project.

The fundamental reason why sub-projects are needed is because sub-products naturally evolve at different rates, and you normally need to pick specific versions of the sub-products to assemble a complex product. Using a sub-project mechanism enables you to mix and match sub-product versions at will. (You might think it is also possible to lump all of the sub-products into a single repository, but this has the serious limitation that you can only work with a single combination of product versions. If you also need to release another product that uses a different combination of sub-product versions, this approach becomes completely unworkable.)


Merging Branches

I hesitated before putting merging branches into the nice-to-have category. You might prefer to categorise it as must-have, and I won't argue with you. But if you don't have a merge capability, I think you can mostly work around it, in the context of documentation. The most important use of branches in documentation is for tracking different versions of a library and these kind of branches would normally never need to be merged.

Just because the capability to merge branches is not an absolute necessity, it does not mean that you do not need a merge capability at all. You certainly need to be able to merge commits in order to resolve conflicts between different users working on the same branch.

Cherry-Picking

Cherry-picking is the ability to apply the same commit to more than one branch. In Git, for example, the procedure is incredibly easy. You just make the changes to one branch; commit them; then check out another branch and apply the commit to this branch as well (in my Git UI, I can right-click on a commit and select Cherry Pick to apply it to the currently checked out branch).

Tagging

Contrary to what you might think, tagging is a not really necessary in a documentation repository.

For years, myself and my team-mates dutifully tagged the repository every time a documentation library was released. In the early years, we did it in SVN and now we are doing it in Git. But recently I realised that we never used any of these revision tags, not even once.

This is because, in practice, you are only ever interested in accessing the tip of a branch, not the tagged commit. For example, if I create a branch for a 1.3 release, the tip of this branch will always have the version of the docs that I want to use for re-publishing the 1.3 docs. If I correct some errors in the docs, update the release notes with some patch information, and so on, this will always be available at the tip of the branch. The tag that might have been created the first time the library was released is of absolutely no interest: it references an earlier commit in the branch, which is now out of date.

Wednesday, 22 August 2012

The Security Token Service

With the release of Fuse ESB Enterprise 7.0.1, the Web Services Security Guide (for Apache CXF) has been expanded to cover the Security Token Service (STS).

A full implementation of the STS was recently added to the Apache CXF codebase and this implementation has a highly modular and customisable architecture, as you can see from the following architecture overview:



For example, the token Issue operation can be customised by plugging in a SAMLTokenProvider or an SCTProvider (secure conversation token provider); and the token Validate operation can be customised by plugging in one of the token validators, SAMLTokenValidator, UsernameTokenValidator, X509TokenValidator, or SCTTokenValidator.

The STS implementation has a number of special features, including:

  • Support for embedding Claims data in issued tokens.
  • Support for the AppliesTo policy (which enables you to centralise token issuing requirements).
  • Support for security realms.

These are all described in the new doc, in The Security Token Services chapter.

Tuesday, 17 July 2012

New FAB Videos

Recently, I have worked on producing a couple of videos that explain Fuse Application Bundles (FABs). A FAB is basically a new way of deploying applications into an OSGi container that can make your life a whole lot easier. This technology has been developed by my engineering colleagues at FuseSource and is open sourced at Github.

If you have ever built and deployed OSGi bundles using Maven, you might have experienced the frustration of adding a whole lot of package dependencies into the Maven bundle plugin. You have already specified all of your dependencies as Maven dependencies, and here you are doing it all over again! Is it really necessary? Well, if you are using FABs, it's not. The key idea of FABs is to leverage the existing Maven dependency metadata and use that metadata to figure out the requisite OSGi package dependencies.

The first video explains this basic concept and also explains the difference between shared and non-shared dependencies in a FAB project:



As we started to use FABs in practical applications, it soon became clear how important it is to distinguish between dependencies already provided by the container and other artifacts. Recently, our engineering team has done a lot of work to make FABs smarter, so that they can recognise provided dependencies automatically.

The second video shows a practical example of how to prepare a Maven project for FAB deployment and explains the importance of setting the dependency's <scope> tag correctly:

Wednesday, 23 November 2011

Scalate confexport tool


Scalate (Scalate template engine) is a handy tool for creating small Web sites, which we use for some of the Web sites at the Fuse Forge. A somewhat obscure, but useful feature, of Scalate is the confexport tool, which you can use to pull down the contents of an entire Confluence Wiki space through a remote SOAP interface. For example, if you want to pull down the entire Apache Camel site from Apache's Confluence Wiki, simply execute the following command in Scalate:

scalate> confexport --user user --password pass https://cwiki.apache.org/confluence CAMEL

Note that this command assumes that you are using the latest (not yet released) snapshot of Scalate, which has been refactored to use Confluence's SOAP port instead of the XMLRPC port.

Now, since most of our documentation is written in DocBook format, it would be nice if there was a way to cross-reference pages in a Scalate Web site from inside a DocBook book. Recently, I added an enhancement to the confexport tool that lets you do just that. You just add the --target-db switch to confexport when you are downloading a Wiki space, for example:

scalate> confexport --user user --password pass --target-db https://cwiki.apache.org/confluence CAMEL

Now, when confexport has finished downloading the CAMEL space, it will also create a target.db file for you. If you are familiar with DocBook XSL, you will recognize that this is the standard form of a link database that DocBook uses to create cross-references between books. After including this target.db file in your DocBook libraries site.xml file, you will be able to insert cross-references using the standard DocBook olink element.

If you have a nice XML editor like OxygenXML, you can then insert links to the Scalate site with the help of the UI. For example, here is a screenshot of the 'Insert OLink' dialog in OxygenXML:


Wednesday, 13 July 2011

Maven Offline Repository

Interesting new topics often get buried in the bulk of the FuseSource doc library. Every so often, I'll be blogging about these topics, so you can get an idea of what's new in the FuseSource library.

A recent example is the description of how to create your own custom Maven offline repository for Apache ServiceMix. If you've ever used Maven, you could hardly fail to notice the way it downloads a ton of dependencies from the Internet before it starts to build your project. But what if you don't want Maven to download anything from the Internet? What is the alternative?

The solution is to create a customized Maven offline repository, which you can then distribute with your Apache ServiceMix deployment. Although it's easy enough to identify your own application bundles, keeping track of the dependencies is not so easy. You typically have hundreds of transitive dependencies and identifying them by hand would be a nightmareafter all, that's what Maven is for.

This is where the features-maven-plugin comes in. This slightly obscure Maven plug-in was originally developed as a utility for Apache developers, for generating distributions of Apache Karaf and Apache ServiceMix. For example, at FuseSource we use it to generate Fuse ESB, our own distribution of Apache ServiceMix. It turns out that the features-maven-plugin plug-in has a goal, add-features-to-repo, that is ideally suited to creating a custom offline repository. Now, the plug-in does have one trivial limitation: the bundles that you want to include in the offline repository must be packaged in a Karaf 'feature'. This is easily taken care of (for example, see Creating a Feature).

To see how to use the features-maven-plugin, lets take a concrete example of an application, deployed into Fuse ESB 4.3.1, that uses the camel-jms feature and the camel-quartz feature. Let's generate a custom Maven repository containing just those two features and all of their dependencies.

First of all create a Maven pom.xml file with the following contents:

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0"
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">

    <modelVersion>4.0.0</modelVersion>

    <groupId>org.acme.offline-repo</groupId>
    <artifactId>features-offline</artifactId>
    <version>1.0.0</version>
    <name>Generate offline features repository</name>
    <packaging>pom</packaging>

    <repositories>
      <!-- In general, might need to add the repos
           listed in org.ops4j.pax.url.mvn.repositories property -->
      <!--
    http://repo1.maven.org/maven2, \
    http://repo.fusesource.com/maven2, \
    http://repo.fusesource.com/maven2-snapshot@snapshots@noreleases, \
    http://repo.fusesource.com/nexus/content/repositories/releases, \
    http://repo.fusesource.com/nexus/content/repositories/snapshots@snapshots@noreleases, \
    http://repository.apache.org/content/groups/snapshots-group@snapshots@noreleases, \
    http://repository.ops4j.org/maven2, \
    http://svn.apache.org/repos/asf/servicemix/m2-repo, \
    http://repository.springsource.com/maven/bundles/release, \
    http://repository.springsource.com/maven/bundles/external
      -->
    <repository>
      <id>esb.system.repo</id>
      <name>Fuse ESB internal system repo</name>
      <url>file:///E:/Programs/FUSE/apache-servicemix-4.3.1-fuse-00-00/system</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>repo1.maven.org</id>
      <name>Maven central</name>
      <url>http://repo1.maven.org/maven2</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>repo.fusesource.com</id>
      <name>FuseSource repo</name>
      <url>http://repo.fusesource.com/maven2</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>repo.fusesource.com.snapshot</id>
      <name>FuseSource snapshot repo</name>
      <url>http://repo.fusesource.com/maven2-snapshot</url>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>
    <repository>
      <id>repo.fusesource.com.nexus</id>
      <name>FuseSource Nexus repo</name>
      <url>http://repo.fusesource.com/nexus/content/repositories/releases</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>repo.fusesource.com.nexus.snapshot</id>
      <name>FuseSource Nexus snapshot repo</name>
      <url>http://repo.fusesource.com/nexus/content/repositories/snapshots</url>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>
    <repository>
      <id>repository.apache.org.snapshot</id>
      <name>Apache snapshot repo</name>
      <url>http://repository.apache.org/content/groups/snapshots-group@snapshots@noreleases</url>
      <snapshots>
        <enabled>true</enabled>
      </snapshots>
      <releases>
        <enabled>false</enabled>
      </releases>
    </repository>
    <repository>
      <id>repository.ops4j.org</id>
      <name>OPS4J repo</name>
      <url>http://repository.ops4j.org/maven2</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>svn.apache.org</id>
      <name>Apache SVN repos</name>
      <url>http://svn.apache.org/repos/asf/servicemix/m2-repo</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>repository.springsource.com</id>
      <name>Spring repo</name>
      <url>http://repository.springsource.com/maven/bundles/release</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    <repository>
      <id>repository.springsource.com.external</id>
      <name>Spring external repo</name>
      <url>http://repository.springsource.com/maven/bundles/external</url>
      <snapshots>
        <enabled>false</enabled>
      </snapshots>
      <releases>
        <enabled>true</enabled>
      </releases>
    </repository>
    </repositories>

    <build>
        <plugins>
          <plugin>
            <groupId>org.apache.karaf.tooling</groupId>
            <artifactId>features-maven-plugin</artifactId>
            <version>2.2.1</version>

            <executions>
              <execution>
                <id>add-features-to-repo</id>
                <phase>generate-resources</phase>
                <goals>
                  <goal>add-features-to-repo</goal>
                </goals>
                <configuration>
                  <descriptors>
                    <!-- List taken from featuresRepositories in etc/org.apache.karaf.features.cfg -->
                    <descriptor>mvn:org.apache.karaf/apache-karaf/2.1.3-fuse-00-00/xml/features</descriptor>
                    <descriptor>mvn:org.apache.servicemix.nmr/apache-servicemix-nmr/1.4.0-fuse-00-00/xml/features</descriptor>
                    <descriptor>mvn:org.apache.servicemix/apache-servicemix/4.3.1-fuse-00-00/xml/features</descriptor>
                    <descriptor>mvn:org.apache.camel.karaf/apache-camel/2.6.0-fuse-00-00/xml/features</descriptor>
                    <descriptor>mvn:org.apache.servicemix/ode-jbi-karaf/1.3.4/xml/features</descriptor>
                    <descriptor>mvn:org.apache.activemq/activemq-karaf/5.4.2-fuse-01-00/xml/features</descriptor>
                  </descriptors>
                  <features>
                    <feature>camel-jms</feature>
                    <feature>camel-quartz</feature>
                  </features>
                  <repository>target/features-repo</repository>
                </configuration>
              </execution>
            </executions>
          </plugin>
        </plugins>
    </build>
    
</project>

There's quite a lot of boilerplate in this POM file. Here is a summary of what you need to include:
  • A reference to the ServiceMix system repository, which is a local repository that contains all of the artifacts shipped with Apache ServiceMix. You really do need to include this, because it might contain artifacts not available from the other repositories.
  • List all of the repositories, local and remote, which might contain artifacts needed by your features. Better to err on the generous side here. In particular, you should list all of the remote repositories used by ServiceMix itself. You can find this list in the etc/org.ops4j.pax.url.mvn.cfg configuration file, in the org.ops4j.pax.url.mvn.repositories property setting.
  • In the configuration/description element of the features-maven-plugin configuration, add the list of relevant feature repositories. At a minimum, you will need to add all of the features repositories used by ServiceMix. You can find this list in the etc/org.apache.karaf.features.cfg file, in the featuresRepository property setting.
  • Finally, in the configuration/features element of the features-maven-plugin, you need to list the features for which you are generating the offline repository.

Now you are ready to generate the custom offline repo by entering the following Maven command:

mvn generate-resources

If this builds successfully, you should find the generated custom repository under the target/features-repo directory of the Maven project.

For a more detailed description of how to set up the POM correctly, see Generating a Custom Offline Repository.