Chapter 5. Customizing DocBook
$Revision$
For some applications, DocBook “out of the box” may not be exactly what you need. Perhaps you need additional inline elements or perhaps you want to remove elements that you never want your authors to use. By design, DocBook makes this sort of customization easy.
It is even easier to customize DocBook 5.0 than it was to customize earlier releases. This is because DocBook 5.0 uses RELAX NG to express its schema. RELAX NG provides better support for modifications than DTDs, and the DocBook schema takes full advantage of that support.
This chapter describes the organization of the RELAX NG schema for DocBook and how to make your own customization layer. It contains methods and examples for adding, removing, and modifying elements and attributes, and conventions for naming and versioning DocBook customizations. It assumes some familiarity with RELAX NG. If you are unfamiliar with RELAX NG, you can find a tutorial introduction in the RELAX NG Tutorial [RNG-Intro].
You can use customization layers to extend DocBook or subset it. Creating a schema that is a strict subset of DocBook means that all of your instances are still completely valid DocBook instances, which may be important to your tools and stylesheets, and to other people with whom you share documents. An extension adds new structures, or changes the schema in a way that is not compatible with DocBook. Extensions can be very useful, but might have a great impact on your environment.
Customization layers can be as small as restricting an attribute value or as large as adding an entirely different hierarchy on top of the inline elements.
5.1. Should You Do This?
Changing a schema can have a wide-ranging impact on the tools and stylesheets that you use. It can have an impact on your authors and on your legacy documents. This is especially true if you make an extension. If you rely on your support staff to install and maintain your authoring and publishing tools, check with them before you invest a lot of time modifying the schema. There may be additional issues that are outside your immediate control. Proceed with caution.
That said, DocBook is designed to be easy to modify. This chapter assumes that you are comfortable with XML and RELAX NG grammar syntax, but the examples presented should be a good springboard to learning the syntax if it’s not already familiar to you.
5.2. If You Change DocBook, It’s Not DocBook Anymore!
The license agreement under which DocBook is distributed gives you complete freedom to change, modify, reuse, and generally hack the schema in any way you want, except that you must not call your alterations “DocBook.”
5.2.1. Namespace and Version
Starting with DocBook V5.0, DocBook is
identified by its namespace, http://docbook.org/ns/docbook
. The particular
version of DocBook to which an element conforms is identified by its
version
attribute. If the element
does not specify a version, the version of the closest ancestor DocBook
element that does specify a version is assumed. The version
attribute is required on the root
DocBook element.
Here is how these attributes would appear on the
book
element.
If you change the DocBook schema, the namespace remains the same,
but you must provide an alternate version identifier for the schema and
the modules you changed. The version
attribute identifies the version of
DocBook the alternate is based on, specifies what type of variant it is,
and names the variant and any additional modules. While the format for
the version string is not part of the normative specification, the
DocBook Technical Committee recommends the following format:
base_version
-(subset|extension|variant) (name
[-version
])+
For example, version 1.0 of Acme Corporation’s extension of
DocBook V5.0 could be identified as
“5.0-extension acme-1.0
”.
If your schema is a proper subset, use the
subset
keyword in the version. If your schema extends
the markup model, use the extension
keyword. If you’d
rather not characterize your variant specifically as a subset or an
extension, use the variant
keyword.
5.2.2. Public Identifiers
Although not directly supported by RELAX NG, in some cases it may still be valuable to identify a DocBook V5.0 customization layer with a public identifier. A public identifier for DocBook V5.0 is:
-//OASIS//DTD DocBook V5.0//EN
If you make any changes to the structure of the schema, you must change the public identifier. You should change both the owner identifier and the description. Formal public identifiers for the base DocBook modules would have identifiers with the following syntax:
-//OASIS//text-class
DocBook description
Vversion
//EN
Your public identifiers should use the following syntax:
-//Owner-ID
//text-class
DocBook Vversion
-Based (Subset|Extension|Variant) \
Description-and-version
//lang
For example:
-//O'Reilly//DTD DocBook V5.0-Based Subset V1.1//EN
If your schema is a proper subset, use the
Subset
keyword in the description. If your schema
extends the markup model, use the Extension
keyword.
If you’d rather not characterize your variant specifically as a subset
or an extension, use the Variant
keyword.
5.3. Customization Layers
A RELAX NG grammar is a collection of patterns. These patterns can be stored in a single file or in a collection of files that import each other. Patterns can augment each other in a variety of ways. A complete grammar is the union of the specified patterns.
For convenience, the DocBook grammar is distributed in a single file.
5.3.1. RELAX NG Syntax
There are two standard syntaxes for RELAX NG, an XML syntax and a “compact” text syntax. The two forms have the same expressive power; it is possible to transform between them with no loss of information.
Many users find the relative terseness of the compact syntax makes it a convenient form for reading and writing RELAX NG. We will use compact syntax in our examples.
5.3.2. DocBook Schema Structure
The DocBook RELAX NG schema is highly modular, using named patterns extensively. Every element, attribute, attribute list, and enumeration has its own named pattern. In addition, there are named patterns for logical combinations of elements and attributes. These named patterns provide “hooks” into the schema that allow you to do a wide range of customization by simply redefining one or more of the named patterns.
The names of the patterns used in a RELAX NG grammar can be defined in any way the schema designer chooses. To make it easier to navigate, the DocBook RELAX NG grammar employs the following naming conventions:
db.
*
.attlistDefines the list of attributes associated with an element. For example,
db.emphasis.attlist
is the pattern that matches all of the attributes of theemphasis
element.db.
*
.attributeDefines a single attribute. For example,
db.conformance.attribute
is the pattern that matches theconformance
attribute on all of the elements where it occurs.db.
*
.attributesDefines a collection of attributes. For example,
db.effectivity.attributes
is all of the effectivity attributes (arch
,audience
, etc.).db.
*
.blocksDefines a list (a choice) of a set of related block elements. For example,
db.list.blocks
is a pattern that matches any of the list elements.db.
*
.contentmodelDefines a fragment of a content model shared by several elements.
db.
*
.enumerationDefines an enumeration, usually one used in an attribute value. For example,
db.revisionflag.enumeration
is a pattern that matches the list of values that can be used as the value of arevisionflag
attribute.db.
*
.infoDefines the
info
element for a particular element. For example,db.example.info
is the pattern that matchesinfo
on theexample
element.Almost all of the
info
elements are the same, but they are described with distinct patterns so that customizers can change them individually.db.
*
.inlinesDefines a list (a choice) of a set of related inline elements. For example,
db.link.inlines
is a pattern that matches any of the linking-related elements.db.
*
.role.attributeDefines the
role
attribute for a particular element. For example,db.emphasis.role
is the pattern that matchesrole
on theemphasis
element.All of the
role
attributes are the same, but they are described with distinct patterns so that customizers can change them selectively.db.
*
Defines a particular DocBook element. For example,
db.title
is the pattern that matches thetitle
element.RELAX NG allows multiple patterns to match the same element, so sometimes these patterns come in flavors, for example,
db.indexterm.singular
,db.indexterm.startofrange
, anddb.indexterm.endofrange
. Each of these patterns matchesindexterm
with varying attributes.
These are conventions, not hard and fast rules. There are patterns that don’t follow these conventions.
5.3.3. The General Structure of Customization Layers
Creating a customized schema is similar to creating a customization layer for XSL. The schema customization layer is a new RELAX NG schema that defines your changes and includes the standard DocBook schema. You then validate using the schema customization as your schema. Although customization layers vary in complexity, most of them have the same general structure as other customization layers of similar complexity.
In the most common case, you probably want to include all of DocBook, but you want to make some small changes. These customization layers tend to look like this:
namespace db = "http://docbook.org/ns/docbook"
# perhaps other namespace declarations
include "docbook.rnc"
# new patterns and augmented patterns
- 4
Start by importing the base DocBook schema.
- 6
Then you can add new patterns or augment existing patterns.
If you want to completely replace a pattern (e.g., to remove or completely change an element), the template is a little different.
namespace db = "http://docbook.org/ns/docbook"
# perhaps other namespace declarations
include "docbook.rnc" {
# redefinitions of DocBook patterns
}
# new patterns and augmented patterns
- 5
You can redefine patterns in the body of an import statement. These patterns completely replace any that appear in the imported schema.
- 8
As before, patterns outside the include statement can augment existing patterns (even redefined ones).
There are other possibilities as well; these examples are illustrative, not exhaustive.
5.4. Writing, Testing, and Using a Customization Layer
The procedure for writing, testing, and using a customization layer is always about the same. In this section, we’ll go through the process in some detail. The rest of the sections in this chapter describe a range of useful customization layers.
5.4.1. Deciding What to Change
If you’re considering writing a customization layer, there must be something that you want to change. Perhaps you want to add an element or attribute, remove one, or change some other aspect of the schema.
Adding an element, particularly an inline element, is one
possibility. For example, if you’re writing about cryptography, you
might want to add a “cleartext
” element.
The next section describes how to create a customization layer to do
this.
5.4.2. Deciding How to Change a Customization Layer
Figuring out what to change may be the hardest part of the process. For the cleartext example, there are several patterns that you could possibly change. The choice will depend on the exact focus of your document. Here are several candidates, all of which look plausible: technical inlines, programming inlines, and domain inlines. Let’s suppose you chose the domain inlines.
As shown in Example 5.1, “Adding cleartext with a customization layer”, your customization would import the DocBook schema, extend the domain inlines, and then provide a pattern that matches the new element.
5.4.3. Using Your Customization Layer
Using a customization layer is simple. Just put the customization
into a file—for example, mycustomization.rnc
—and
then refer to that file instead of the DocBook schema when your tools
offer the option.
5.4.4. Testing Your Work
Schemas, by their nature, contain many complex, interrelated patterns. Whenever you make a change to the schema, it’s always wise to use a validator to check your work.
Start by validating a document that’s plain, vanilla DocBook, one that you know is valid according to the DocBook standard schema. This will help you identify any errors that you’ve introduced to the schema. Once you are confident the schema is correct, begin testing with instances that you expect (and don’t expect) to be valid against it.
The following sections contain examples for several common customizations.
5.5. Removing Elements
DocBook has a large number of elements. In some authoring environments, it may be useful or necessary to remove unneeded elements.
5.5.1. Removing msgset
The msgset
element is a favorite
target. It has a complex internal structure designed for describing
interrelated error messages, especially on systems that may exhibit
messages from several different components. Many technical documents can
do without it, and removing it leaves one less complexity to explain to
your authors.
Example 5.2, “Removing msgset” shows a customization layer that
removes the msgset
element.
The complexity of msgset
is really in its
msgentry
children. DocBook V4.5
introduced a simple alternative, simplemsgentry
.
Example 5.3, “Removing msgentry” demonstrates how you could allow
msgset
but only support the simpler
alternative.
Closer examination of the msgentry
content
model will reveal that it contains a number of descendants. It isn’t
necessary, but it wouldn’t be wrong, to define their patterns as
notAllowed
as well.
5.5.2. Removing Computer Inlines
DocBook contains a large number of computer inlines. The DocBook inlines define a domain-specific vocabulary. If you’re working in another domain, many of them may be unnecessary.
They’re defined in a set of patterns that ultimately roll up to
the db.domain.inlines
pattern. If you make that
pattern notAllowed
, you’ll remove them all in one
fell swoop. Example 5.4, “Removing computer inlines” is a customization that
does this.
If you want to be more selective, you might consider making one or
more of the following
notAllowed
instead:
db.error.inlines
: errors and error messagesdb.gui.inlines
: GUI elementsdb.keyboard.inlines
: key and keyboard elementsdb.markup.inlines
: markup elementsdb.math.inlines
: mathematical expressionsdb.os.inlines
: operating system inlinesdb.programming.inlines
: programming-related inlines
Caution
Be aware that a customization layer that removed this many technical inlines would also remove some larger technical structures or make them unusable.
5.5.3. Removing Synopsis Elements
Another possibility is removing the complex synopsis
elements. The customization layer in Example 5.5, “Removing cmdsynopsis and funcsynopsis”
removes cmdsynopsis
and
funcsynopsis
.
5.5.4. Removing Sectioning Elements
Perhaps you want to restrict your authors to only three
levels of sectioning. To do that, you could remove the
sect4
and sect5
elements, as shown
in Example 5.6, “Removing the sect4 and sect5 elements”.
This technique works if your authors are using numbered sections,
which you could require them to do by removing the
section
element. But suppose instead you want to
allow them to use recursive sections, but limit them to only three
levels.
One way to do this would be to define new
section2
and section3
patterns, as
shown in Example 5.7, “Limiting recursive sections to three levels”.
Another solution, assuming your validation environment supports Schematron, is simply to add a new rule, as shown in Example 5.8, “Limiting recursive sections to three levels using Schematron”.
In this example, we’ve put the Schematron pattern inline in the RELAX NG grammar. If your validation strategy requires that they be in a separate document, it may be more convenient to simply create them separately.
5.5.5. Removing Admonitions from Table Entries
Sometimes what you want to do is not as simple as entirely removing an element. Instead, you may want to remove it only from some contexts. The way to do this is to redefine the patterns used to calculate the elements allowed in those contexts.
Standard DocBook allows any inline element or any block element to
appear in a table cell. You might decide that it’s unreasonable to allow
admonitions (note
, caution
,
warning
, etc.) to appear in a table cell.
In order to remove them, you must change what is allowed in the
entry
element, as shown in Example 5.9, “Removing admonitions from tables”.
The extent to which any particular change is easy or hard depends in part on how many patterns need to be changed. The DocBook Technical Committee is generally open to the idea of adding more patterns if it improves the readability of customization layers. If you think some refactoring would make your job easier, feel free to ask.
5.6. Removing Attributes
Just as there may be more elements than you need, there may be more attributes.
Suppose your processing system doesn’t support
“continued” lists. You want to remove the continuation
attribute from the
orderedlist
element. There are two ways you could
accomplish this. One way would be to redefine the
db.orderedlist.continuation.attribute
pattern as not
allowed; the other would be to redefine the
db.orderedlist.attlist
pattern so that it does not
include the continuation
attribute.
Either will accomplish the goal. Example 5.10, “Removing continuation from orderedlist”
uses the first method.
5.6.1. Subsetting the Common Attributes
DocBook defines a set of “common attributes,” which appear on every element. Depending on how you process your documents, removing some of them can both simplify the authoring task and improve processing speed.
Some obvious candidates are:
- Effectivity attributes (
arch
,os
,condition
...) If you’re not using all of the effectivity attributes in your documents, you can get rid of up to seven attributes in one fell swoop.
xml:lang
If you’re not producing multilingual documents, you can remove
xml:lang
.remap
The
remap
attribute is designed to hold the name of a semantically equivalent construct from a previous markup scheme (e.g., a Microsoft Word–style template name, if you’re converting from Word). If you’re authoring from scratch, or not preserving previous constructs withremap
, you can get rid of it.xreflabel
If your processing system isn’t using
xreflabel
, it’s a candidate as well.
The customization layer in Example 5.11, “Removing common attributes”
reduces the common attributes to just xml:id
, version
, and xml:lang
.
5.7. Adding Elements
Adding a new inline or block element is generally a
straightforward matter of creating a pattern for the new element and using
|=
to add it to the right pattern, as we did in Example 5.1, “Adding cleartext with a customization layer”. But if your new element is more intimately
related to the existing structure of the document, it may require more
surgery.
Example 5.12, “Adding a sect6 element” extends DocBook by adding a
sect6
element.
Here we’ve redefined sect5
to include
sect6
and provided a pattern for
sect6
.
5.8. Adding Attributes
The simplest way to add an attribute to a single element is
to add it to the attlist pattern for that element. Example 5.13, “Adding born and died attributes” adds the optional attributes born
and died
to the attribute list for
author
. The db.author.attlist
pattern is redefined to interleave the two new optional attributes with
the existing attributes on the list.
5.9. Other Modifications
5.9.1. Changing the Contents of the role Attribute
The role
attribute,
found on almost all of the elements in DocBook, is a text attribute that
can be used to subclass an element. In some applications, it may be
useful to modify the definition of role
so that authors must choose one of a
specific set of possible values.
In Example 5.14, “Changing role on procedure”, the role
attribute on the
procedure
element is constrained to the value
required
or optional
.
5.9.2. Adding a Value to an Enumerated Attribute
Example 5.15, “Adding a value to an enumeration” adds the value
“large
” to the
db.spacing.enumeration
pattern. Any attribute that is
defined using db.spacing.enumeration
will now have
large
as a legal value. Note that
while it is easy to add a value to an enumeration, to remove a value
from an enumeration you need to redefine the entire enumeration, minus
the values you don’t need.