This chapter explains in concrete, practical terms how to make DocBook documents. It's an overview of all the kinds of markup that are possible in DocBook documents. It explains how to create several kinds of DocBook documents: books, sets of books, chapters, articles, and reference manual entries. The idea is to give you enough basic information to actually start writing. The information here is intentionally skeletal; you can find “the details” in the reference section of this book.
Before we can examine DocBook markup, we have to take a look at what an SGML or XML system requires.
SGML requires that your document have a specific prologue. The following sections describe the features of the prologue.
SGML documents begin with an optional SGML Declaration. The declaration can precede the document instance, but generally it is stored in a separate file that is associated with the DTD. The SGML Declaration is a grab bag of SGML defaults. DocBook includes an SGML Declaration that is appropriate for most DocBook documents, so we won't go into a lot of detail here about the SGML Declaration.
In brief, the SGML Declaration describes, among other things, what characters are markup delimiters (the default is angle brackets), what characters can compose tag and attribute names (usually the alphabetical and numeric characters plus the dash and the period), what characters can legally occur within your document, how long SGML “names” and “numbers” can be, what sort of minimizations (abbreviation of markup) are allowed, and so on. Changing the SGML Declaration is rarely necessary, and because many tools only partially support changes to the declaration, changing it is best avoided, if possible.
Wayne Wholer has written an excellent tutorial on the SGML Declaration; if you're interested in more details, see http://www.oasis-open.org/cover/wlw11.html.
All SGML documents must begin with a document type declaration. This identifies the DTD that will be used by the document and what the root element of the document will be. A typical doctype declaration for a DocBook document looks like this:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN">
This declaration indicates that the root element,
which is the first element in the hierarchical structure of the
document, will be <book>
and that
the DTD used will be the one identified by the public identifier
-//OASIS//DTD DocBook V3.1//EN
. See the section called “Public Identifiers”” later in this chapter.
It's also possible to provide additional declarations in a document by placing them in the document type declaration:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> ]>
These declarations form what is known as the
internal subset. The declarations stored in the
file referenced by the public or system identifier in the
DOCTYPE
declaration is called the external
subset and it is technically optional.
It is legal to put the DTD in the internal
subset and to have no external subset, but for a DTD as large
as DocBook that wouldn't make much sense.
Although comments and processing instructions may occur between the document type declaration and the root element, the root element usually immediately follows the document type declaration:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> ]> <book> &chap1; &chap2; </book>
You cannot place the root element of the document in an external entity.
If you are entering SGML using a text editor such as Emacs or vi, there are a few things to keep in mind.[7] Using a structured text editor designed for SGML hides most of these issues.
DocBook element and attribute names are not case-sensitive. There's
no difference between <Para>
and <pArA>
. Entity names are case-sensitive, however.
If you are interested in future XML compatibility, input all element and attribute names strictly in lowercase.
If attribute values contain spaces or punctuation characters, you must quote them. You are not required to quote attribute values if they consist of a single word or number, although it is not wrong to do so.
When quoting attribute values, you can use either a straight single quote ('), or a straight double quote ("). Don't use the “curly” quotes (“ and ”) in your editing tool.
If you are interested in future XML compatibility, always quote all attribute values.
Several forms of markup minimization are allowed, including empty
tags. Instead of typing the entire end tag for an element, you can
type simply </>
. For example:
<para> This is <emphasis>important</>: never stick the tines of a fork in an electrical outlet. </para>
You can use this technique for any and every tag, but it will make your documents very hard to understand and difficult to debug if you introduce errors. It is best to use this technique only for inline elements containing a short string of text.
Empty start tags are also possible, but may be even more confusing. For the record, if you encounter an empty start tag, the SGML parser uses the element that ended last:
<para> This is <emphasis>important</emphasis>. So is <>this</emphasis>. </para>
Both “important” and “this” are emphasized.
If you are interested in future XML compatibility, don't use any of these tricks.
The null end tag (net) minimization feature allows constructions like this:
<para> This is <emphasis/important/: never stick the tines of a fork in an electrical outlet. </para>
If, instead of ending a start tag with >
, you end
it with a slash, then the next occurrence of a slash ends the element.
If you are interested in future XML compatibility, don't use net tag minimization either.
If you are willing to modify both the declaration and the DTD, even more dramatic minimizations are possible, including completely omitted tags and “shortcut” markup.
Although we've made a point of reminding you about which of these minimization features are not valid in XML, that's not really a sufficient reason to avoid using them. (The fact that many of the minimization features can lead to confusing, difficult-to-author documents might be.)
If you want to convert one of these documents to XML at some point in the future, you can run it through a program like sgmlnorm, which will remove all the minimizations and insert the correct, verbose markup. The sgmlnorm program is part of the SP and Jade distributions, which are on the CD-ROM.
In order to create DocBook documents in XML, you'll need an XML version of DocBook. We've included one on the CD, but it hasn't been officially adopted by the OASIS DocBook Technical Committee yet. If you're interested in the technical details, Appendix B, DocBook and XML, describes the specific differences between SGML and XML versions of DocBook.
XML, like SGML, requires a specific prologue in your document. The following sections describe the features of the XML prologue.
XML documents should begin with an XML declaration. Unlike the SGML declaration, which is a grab bag of features, the XML declaration identifies a few simple aspects of the document:
<?xml version="1.0" standalone="no"?>
Identifying the version of XML ensures that future changes to the XML specification will not alter the semantics of this document. The standalone declaration simply makes explicit the fact that this document cannot “stand alone,” and that it relies on an external DTD. The complete details of the XML declaration are described in the XML specification.
Strictly speaking, XML documents don't require a DTD. Realistically, DocBook XML documents will have one.
The document type declaration identifies the DTD that will be used by the document and what the root element of the document will be. A typical doctype declaration for a DocBook document looks like this:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd">
This declaration indicates that the root element will be <book>
and that the DTD used will be the
one indentified by the public identifier -//Norman Walsh//DTD
DocBk XML V3.1.4//EN
. External declarations in XML must
include a system identifier (the public identifier is optional). In
this example, the DTD is stored on a web server.
System identifiers in XML must be URIs. Many
systems may accept filenames and interpret them locally as
file:
URLs, but it's always
correct to fully qualify them.
It's also possible to provide additional declarations in a document by placing them in the document type declaration:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4/EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> ]>
These declarations form what is known as the internal subset. The
declarations stored in the file referenced by the public or system
identifier in the DOCTYPE
declaration is called the
external subset, which is technically optional. It is legal to put
the DTD in the internal subset and to have no external subset, but
for a DTD as large as DocBook, that would make very little sense.
Although comments and processing instructions may occur between the document type declaration and the root element, the root element usually immediately follows the document type declaration:
<?xml version='1.0'?> <!DOCTYPE book PUBLIC "-//Norman Walsh//DTD DocBk XML V3.1.4//EN" "http://nwalsh.com/docbook/xml/3.1.4/db3xml.dtd" [ <!ENTITY nwalsh "Norman Walsh"> <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> ]> <book>...</book>
The important point is that the root element must be physically present immediately after the document type declaration. You cannot place the root element of the document in an external entity.
If you are entering SGML using a text editor such as Emacs or vi, there are a few things to keep in mind. Using a structured text editor designed for XML hides most of these issues.
In XML, all markup is case-sensitive. In the XML version of DocBook, you must always type all element, attribute, and entity names in lowercase.
You are required to quote all attribute values in XML.
When quoting attribute values, you can use either a straight single quote ('), or a straight double quote ("). Don't use the “curly” quotes (“ and ”) in your editing tool.
Empty elements in XML are marked with a distinctive syntax:
<xref/>
.
Processing instructions in XML begin and end with a question mark:
<?pitarget data?>
.
XML was designed to be served, received, and processed over the Web. Two of its most important design principles are ease of implementation and interoperability with both SGML and HTML.
The markup minimization features in SGML documents make it more difficult to process, and harder to write a parser to interpret it; these minimization features also run counter to the XML design principles named above. As a result, XML does not support them.
Luckily, a good authoring environment can offer all of the features of markup minimization without interfering with the interoperability of documents. And because XML tools are easier to write, it's likely that good, inexpensive XML authoring environments will be available eventually.
Conceptually, almost everything in this book applies equally to SGML and XML. But because DocBook V3.1 is an SGML DTD, we naturally tend to use SGML conventions in our writing. If you're primarily interested in XML, there are just a few small details to keep in mind.
XML is case-sensitive, while the SGML version of DocBook is
not. In this book, we've chosen to present the element names using
mixed case (Book
, indexterm
,
XRef
, and so on), but in the DocBook XML DTD,
all element, attribute, and entity names are strictly
lowercase.
Empty element start tags in XML are marked with a distinctive
syntax: <xref/>
. In SGML, the trailing slash
is not present, so some of our examples need slight revisions to be
valid XML elements.
Processing instructions in XML begin and end with a question
mark: <?pitarget data?>
. In SGML, the
trailing question mark is not present, so some of our examples need
slight revisions to be valid XML elements.
Generally we use public identifiers in examples, but whenever system identifiers are used, don't forget that XML system identifiers must be Uniform Resource Indicators (URIs), in which SGML system identifiers are usually simple filenames.
For a more detailed discussion of DocBook and XML, see Appendix B, DocBook and XML.
When a DTD or other external file is referenced from a document, the reference can be specified in three ways: using a public identifier, a system identifier, or both. In XML, the system identifier is generally required and the public identifier is optional. In SGML, neither is required, but at least one must be present.[8]
A public identifier is a globally unique, abstract name, such as the following, which is the official public identifier for DocBook V3.1:
-//OASIS//DTD DocBook V3.1//EN
The introduction of XML has added some small complications to system identifiers. In SGML, a system identifier generally points to a single, local version of a file using local system conventions. In XML, it must point with a Uniform Resource Indicator (URI). The most common URI today is the Uniform Resource Locator (URL), which is familiar to anyone who browses the Web. URLs are a lot like SGML system identifiers, because they generally point to a single version of a file on a particular machine. In the future, Uniform Resource Names (URN), another form of URI, will allow XML system identifiers to have the abstract characteristics of public identifiers.
The following filename is an example of an SGML system identifier:
/usr/local/sgml/docbook/3.1/docbook.dtdAn equivalent XML system identifier might be:
file:///usr/local/sgml/docbook/3.1/docbook.dtd
The advantage of using the public identifier is that it makes your documents more portable. For any system on which DocBook is installed, the public identifier will resolve to the appropriate local version of the DTD (if public identifiers can be resolved at all).
Public identifiers have two disadvantages:
Because XML does not require them, and because system identifiers are required, developing XML tools may not provide adequate support for public identifiers. To work with these systems you must use system identifiers.
Public identifiers aren't magical. They're simply a method of indirection. For them to work, there must be a resolution mechanism for public identifiers. Luckily, several years ago, SGML Open (now OASIS) described a standard mechanism for mapping public identifiers to system identifers using catalog files.
See OASIS Technical Resolution 9401:1997 (Amendment 2 to TR 9401).
An important characteristic of public identifiers is that they are globally unique. Referring to a document with a public identifier should mean that the identifier will resolve to the same actual document on any system even though the location of that document on each system may vary. As a rule, you should never reuse public identifiers, and a published revision should have a new public identifier. Not following these rules defeats one purpose of the public identifier.
A public identifier can be any string of upper- and lowercase letters, digits, any of the following symbols: “'”, “(“, “)”, “+”, “,”, “-”, “.”, “/”, “:”, “=”, “?”, and white space, including line breaks.
Most public identifiers conform to the ISO 8879 standard that defines formal public identifiers. Formal public identifiers, frequently referred to as FPI, have a prescribed format that can ensure uniqueness:[9]
prefix
//owner-identifier
//text-class
text-description
//language
//display-version
Here are descriptions of the identifiers in this string:
prefix
The prefix
is either a
“+
” or a “-
”
Registered public identifiers begin with
“+
”; unregistered identifiers begin
with “-
”.
(ISO standards sometimes use a third form beginning
with ISO
and the standard number, but this form is
only available to ISO.)
The purpose of registration is to guarantee a unique owner-identifier. There are few authorities with the power to issue registered public identifiers, so in practice unregistered identifiers are more common.
The Graphics Communication Association (GCA) can assign registered public identifiers. They do this by issuing the applicant a unique string and declaring the format of the owner identifier. For example, the Davenport Group was issued the string “A00002” and could have published DocBook using an FPI of the following form:
+//ISO/IEC 9070/RA::A00002//...
Another way to use a registered public identifier is to use the format reserved for internet domain names. For example, O'Reilly can issue documents using an FPI of the following form:
+//IDN oreilly.com//...
As of DocBook V3.1, the OASIS Technical Committee
responsible for DocBook has elected to use the unregistered owner
identifier, OASIS
, thus its prefix is
-
.
-//OASIS//...
owner-identifier
Identifies the person or organization that owns the identifier. Registration guarantees a unique owner identifier. Short of registration, some effort should be made to ensure that the owner identifier is globally unique. A company name, for example, is a reasonable choice as are Internet domain names. It's also not uncommon to see the names of individuals used as the owner-identifier, although clearly this may introduce collisions over time.
The owner-identifier for DocBook V3.1 is
OASIS
. Earlier versions used the owner-identifier
Davenport
.
text-class
The text class identifies the kind of document that is associated with this public identifier. Common text classes are
An SGML or XML document.
A DTD or part of a DTD.
A collection of element declarations.
A collection of entity declarations.
Data that is not in SGML or XML.
DocBook is a DTD, thus its text class is DTD.
text-description
This field provides a description of the document. The text description is free-form, but cannot include the string //.
The text description of DocBook is DocBook V3.1
.
In the uncommon case of unavailable public texts (FPIs for proprietary DTDs, for example), there are a few other options available (technically in front of or in place of the text description), but they're rarely used. [10]
language
Indicates the language in which the document is written. It is recommended that the ISO standard two-letter language codes be used if possible.
DocBook is an English-language DTD, thus its language is
EN
.
display-version
This field, which is not frequently used, distinguishes between public texts that are the same except for the display device or system to which they apply.
For example, the FPI for the ISO Latin 1 character set is:
-//ISO 8879-1986//ENTITIES Added Latin 1//EN
A reasonable FPI for an XML version of this character set is:
-//ISO 8879-1986//ENTITIES Added Latin 1//EN//XML
System identifiers are usually filenames on the local system. In SGML, there's no constraint on what they can be. Anything that your SGML processing system recognizes is allowed. In XML, system identifiers must be URIs (Uniform Resource Identifiers).
The use of URIs as system identifiers introduces the possibility that a system identifier can be a URN. This allows the system identifier to benefit from the same global uniqueness benefit as the public identifier. It seems likely that XML system identifiers will eventually move in this direction.
Catalog files are the standard mechanism for resolving public identifiers into system identifiers. Some resolution mechanism is necessary because DocBook refers to its component modules with public identifiers, and those must be mapped to actual files on the system before any piece of software can actually load them.
The catalog file format was defined in 1994 by SGML Open (now OASIS). The formal specification is contained in OASIS Technical Resolution 9401:1997.
Informally, a catalog is a text file that contains a number of
keyword/value pairs. The most frequently used keywords are
PUBLIC
, SYSTEM
,
SGMLDECL
, DTDDECL
,
CATALOG
, OVERRIDE
,
DELEGATE
, and DOCTYPE
.
PUBLIC
The PUBLIC
keyword maps public identifiers to
system identifiers:
PUBLIC "-//OASIS//DTD DocBook V3.1//EN" "docbook/3.1/docbook.dtd"
SYSTEM
The SYSTEM
keyword maps system identifiers to
system identifiers:
SYSTEM "http://nwalsh.com/docbook/xml/1.3/db3xml.dtd" "docbook/xml/1.3/db3xml.dtd"
SGMLDECL
The SGMLDECL
keyword identifies the system
identifier of the SGML Declaration that should be used:
SGMLDECL "docbook/3.1/docbook.dcl"
DTDDECL
Like SGMLDECL
, DTDDECL
identifies the SGML Declaration that should be
used. DTDDECL
associates a declaration with a
particular public identifier for a DTD:
DTDDECL "-//OASIS//DTD DocBook V3.1//EN" "docbook/3.1/docbook.dcl"
Unfortunately, it is not supported by the free tools that are
available. The practical benefit of DTDDECL
can
usually be achieved, albeit in a slightly cumbersome way, with
multiple catalog files.
CATALOG
The CATALOG
keyword allows one catalog to
include the content of another. This can make maintenance somewhat
easier and allows a system to directly use the catalog files included
in DTD distributions. For example, the DocBook distribution includes
a catalog file. Rather than copying each of the declarations in that
catalog into your system catalog, you can simply include the contents
of the DocBook catalog:
CATALOG "docbook/3.1/catalog"
OVERRIDE
The OVERRIDE
keyword indicates whether or not
public identifiers override system identifiers. If a given declaration
includes both a system identifer and a public identifier, most systems
attempt to process the document referenced by the system identifier,
and consequently ignore the public identifier. Specifying
OVERRIDE YESin the catalog informs the processing system that resolution should be attempted first with the public identifier.
DELEGATE
The DELEGATE
keyword allows you to specify
that some set of public identifiers should be resolved by another
catalog. Unlike the CATALOG
keyword, which loads
the referenced catalog, DELEGATE
does nothing until
an attempt is made to resolve a public identifier.
The
DELEGATE
entry specifies a partial public
identifier and an alternate catalog:
DELEGATE "-//OASIS" "/usr/sgml/oasis/catalog"
Partial public identifers are simply initial substring
matches. Given the preceding entry, if an attempt is made to match any
public identifier that begins with the string
-//OASIS
, the alternate catalog
/usr/sgml/oasis/catalog
will be used instead
of the current catalog.
DOCTYPE
The DOCTYPE
keyword allows you to specify a default
system identifier. If an SGML document begins with a
DOCTYPE
declaration that specifies neither a public
identifier nor a system identifier (or is missing a
DOCTYPE
declaration altogether), the
DOCTYPE
declaration may provide a default:
DOCTYPE BOOK n:/share/sgml/docbook/3.1/docbook.dtd
A small fragment of an actual catalog file is shown in Example 2.1, “A Sample Catalog”.
Example 2.1. A Sample Catalog
-- Comments are delimited by pairs of double-hyphens, as in SGML and XML comments. -- OVERRIDE YES SGMLDECL "n:/share/sgml/docbook/3.1/docbook.dcl" DOCTYPE BOOK n:/share/sgml/docbook/3.1/docbook.dtd PUBLIC "-//OASIS//DTD DocBook V3.1//EN" n:/share/sgml/docbook/3.1/docbook.dtd SYSTEM "http://nwalsh.com/docbook/xml/1.3/db3xml.dtd" n:/share/sgml/Norman_Walsh/db3xml/db3xml.dtd
This catalog specifies that public identifiers should be used in favor of system identifiers, if both are present. | |
The default declaration specified by this catalog is the DocBook declaration. | |
Given an explicit (or implied) SGML <!DOCTYPE BOOK SYSTEM>use n:/share/sgml/docbook/3.1/docbook.dtd as the default
system identifier. Note that this can only apply to SGML documents
because the DOCTYPE declaration above is not a valid XML element.
| |
Map the OASIS public identifer to the local copy of the DocBook V3.1 DTD. | |
Map a system identifer for the XML version of DocBook to a local version. |
A few notes:
It's not uncommon to have several catalog files. See below, the section called “Locating catalog files””.
Like attributes on elements you can quote, the public identifier and system identifier are surrounded by either single or double quotes.
White space in the catalog file is generally irrelevant. You can use spaces, tabs, or new lines between keywords and their arguments.
When a relative system identifier is used, it is considered to be relative to the location of the catalog file, not the document being processed.
Catalog files go a long way towards making documents more portable by
introducing a level of indirection. A problem still remains, however:
how does a processor locate the appropriate catalog file(s)?
OASIS outlines a complete interchange packaging
scheme, but for most applications the answer is simply that the
processor looks for a file called catalog
or
CATALOG
.
Some applications allow you to specify a list of directories that should be examined for catalog files. Other tools allow you to specify the actual files.
Note that even if a list of directories or catalog files is provided, applications may still load catalog files that occur in directories in which other documents are found. For example, SP and Jade always load the catalog file that occurs in the directory in which a DTD or document resides, even if that directory is not on the catalog file list.
The rest of this chapter describes how you can break documents into logical chunks, such as books, chapters, sections, and so on. Before we begin, and while the subject of the internal subset is fresh in your mind, let's take a quick look at how to break documents into separate physical chunks.
Actually, we've already told you how to do it. If you recall, in the preceding sections we had declarations of the form:
<!ENTITYIf you refer to the entityname
SYSTEM "filename
">
name
in your
document after this declaration, the system will insert the contents
of the file filename
into your document at that
point. So, if you've got a book that consists of three chapters and
two appendixes, you might create a file called
book.sgm
, which looks like this:
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN" [ <!ENTITY chap1 SYSTEM "chap1.sgm"> <!ENTITY chap2 SYSTEM "chap2.sgm"> <!ENTITY chap3 SYSTEM "chap3.sgm"> <!ENTITY appa SYSTEM "appa.sgm"> <!ENTITY appb SYSTEM "appb.sgm"> ]> <book><title>My First Book</title> &chap1; &chap2; &chap3; &appa; &appb; </book>
You can then write the chapters and appendixes conveniently in separate files. Note that these files do not and must not have document type declarations.
For example, Chapter 1 might begin like this:
<chapter id="ch1"><title>My First Chapter</title> <para>My first paragraph.</para> ...
But it should not begin with its own document type declaration:
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <chapter id="ch1"><title>My First Chapter</title> <para>My first paragraph.</para> ...
DocBook elements can be divided broadly into these categories:
Sets |
Books |
Divisions, which divide books into parts |
Components, which divide books or divisions into chapters |
Sections, which subdivide components |
Meta-information elements |
Block elements |
Inline elements |
In the rest of this section, we'll describe briefly the elements that make up these categories. This section is designed to give you an overview. It is not an exhaustive list of every element in DocBook.
For more information about any specific element and the elements that it may contain, consult the reference page for the element in question.
A Set
contains two or more
Book
s. It's the hierarchical top of DocBook. You
use the Set
tag, for example, for a series of books
on a single subject that you want to access and maintain as a single
unit, such as the manuals for an airplane engine or the documentation
for a programming language.
A Book
is probably the most common top-level
element in a document. The DocBook definition of a book is very loose
and general. Given the variety of books authored with DocBook and the
number of different conventions for book organization used in
countries around the world, attempting to impose a strict ordering of
elements can make the content model extremely complex. But DocBook
gives you free reign. It's very reasonable to use a local customization layer to impose a more
strict ordering for your applications.
Book
s consist of a mixture of the following elements:
Dedication
pages almost always occur at the front of
a book.
There are a few component-level elements designed for
navigation: ToC
, for Tables of Contents;
LoT
, for Lists of Titles (for lists of figures,
tables, examples, and so on); and Index
, for
indexes.
Divisions are the first hierarchical level below Book
.
They contain Part
s and Reference
s.
Part
s, in turn, contain components.
Reference
s contain RefEntry
s. These are
discussed more thoroughly in the section called “Making a Reference Page””.
Books can contain components directly and are not required to contain divisions.
These are the chapter-like elements of a Book
.
Components are the chapter-like elements of a Book
or
Part
: Preface
,
Chapter
, Appendix
,
Glossary
, and
Bibliography
. Article
s can also
occur at the component level. We describe Article
s
in more detail in the section titled the section called “Making an Article””. Components generally
contain block elements and/or sections, and some can contain
navigational components and RefEntry
s.
There are several flavors of sectioning elements in DocBook:
Sect1
…Sect5
elementsThe Sect1
…Sect5
elements are the most common sectioning elements. They can occur in
most component-level elements. These numbered section elements must be
properly nested (Sect2
s can only occur inside
Sect1
s, Sect3
s can only occur inside
Sect2
s, and so on). There are five levels of numbered
sections.
Section
element
The Section
element, introduced in DocBook V3.1, is
an alternative to numbered sections. Section
s are
recursive, meaning that you can nest them to any depth desired.
SimpleSect
element
In addition to numbered sections, there's the
SimpleSect
element. It is a terminal section that
can occur at any level, but it cannot have any other sectioning
element nested within it.
BridgeHead
A BridgeHead
provides a section title without
any containing section.
RefSect1
…RefSect3
elements
These elements, which occur only in RefEntry
s, are
analogous to the numbered section elements in components. There are
only three levels of numbered section elements in a
RefEntry
.
GlossDiv
, BiblioDiv
, and
IndexDiv
Glossary
s, Bibliography
s,
and Index
es can be broken into top-level
divisions, but not sections. Unlike sections, these elements do not
nest.
All of the elements at the section level and above include a wrapper
for meta-information about the content. See, for example,
BookInfo
.
The meta-information wrapper is designed to contain bibliographic
information about the content (Author
, Title
,
Publisher
, and so on) as well as other meta-information
such as revision histories, keyword sets, and index terms.
The block elements occur immediately below the component and sectioning elements. These are the (roughly) paragraph-level elements in DocBook. They can be divided into a number of categories: lists, admonitions, line-specific environments, synopses of several sorts, tables, figures, examples, and a dozen or more miscellaneous elements.
There are seven list elements in DocBook:
CalloutList
A list of CallOut
s and their descriptions.
CallOut
s are
marks, frequently numbered and typically on a graphic or verbatim environment,
that are described in a CalloutList
, outside the element
in which they occur.
GlossList
ItemizedList
An unordered (bulleted) list. There are attributes to control the marks used.
OrderedList
A numbered list. There are attributes to control the type of enumeration.
SegmentedList
A repeating set of named items. For example, a list of states
and their capitals might be represented as a
SegmentedList
.
SimpleList
An unadorned list of items. SimpleList
s can
be inline or arranged in columns.
VariableList
A list of terms and definitions or descriptions. (This list of
list types is a VariableList
.)
There are five types of admonitions in DocBook:
Caution
, Important
,
Note
, Tip
, and
Warning
.
All of the admonitions have the same structure: an optional
Title
followed by paragraph-level elements. The DocBook
DTD does not impose any specific semantics on the individual
admonitions. For example, DocBook does not mandate that
Warning
s be reserved for cases where bodily harm
can result.
These environments preserve whitespace and line breaks in the source
text. DocBook does not provide the equivalent of HTML's
BR
tag, so there's no way to interject a line break
into normal running text.
Address
The Address
element is intended for postal
addresses. In addition to being line-specific, Address
contains additional elements suitable for marking up names and
addresses.
LiteralLayout
A LiteralLayout
does not have any semantic
association beyond the preservation of whitespace and line breaks. In
particular, while ProgramListing
and
Screen
are frequently presented in a fixed-width
font, a change of fonts is not necessarily implied by LiteralLayout
.
ProgramListing
A ProgramListing
is a verbatim environment, usually
presented in Courier or some other fixed-width font, for program
sources, code fragments, and similar listings.
Screen
A Screen
is a verbatim or literal environment
for text screen-captures, other fragments of an
ASCII display, and similar things.
Screen
is also a frequent catch-all for any verbatim
text.
ScreenShot
ScreenShot
is actually a wrapper for a
Graphic
intended for screen shots of a
GUI for example.
Synopsis
A Synopsis
is a verbatim environment for command
and function synopsis.
Examples, Figures, and Tables are common block-level elements:
Example
, InformalExample
,
Figure
, InformalFigure
,
Table
, and InformalTable
.
The distinction between formal and informal elements is that formal
elements have titles while informal ones do not. The
InformalFigure
element was introduced in DocBook
V3.1. In prior versions of DocBook, you could only
achieve the effect of an informal figure by placing its content,
unwrapped, at the location where the informal figure was desired.
There are three paragraph elements: Para
,
SimPara
(simple paragraphs may not contain other block-level
elements), and FormalPara
(formal paragraphs have
titles).
There are two block-equation elements, Equation
and
InformalEquation
(for inline equations, use
InlineEquation
).
Informal equations don't have titles. For reasons of
backward-compatibility, Equation
s are not required
to have titles. However, it may be more difficult for some stylesheet
languages to properly enumerate Equation
s if they
lack titles.
Graphics occur most frequently in Figure
s and
ScreenShot
s, but they can also occur without a
wrapper. DocBook considers a Graphic
a block
element, even if it appears to occur inline. For graphics that you
want to be represented inline, use InlineGraphic
.
DocBook V3.1 introduced a new element to contain
graphics and other media types: MediaObject
and its inline
cousin, InlineMediaObject
. These elements may contain
video, audio, image, and text data. A single media object can contain
several alternative forms from which the presentation system can
select the most appropriate object.
DocBook V3.1 introduced the QandASet
element, which is suitable for FAQs (Frequently
Asked Questions) and other similar collections of
Question
s and Answer
s.
The following block elements are also available:
BlockQuote
A block quotation. Block quotations may have
Attribution
s.
CmdSynopsis
An environment for marking up all the parameters and options of a command.
Epigraph
A short introduction, typically a quotation, at the beginning of a document.
Epigraph
s may have Attribution
s.
FuncSynopsis
An environment for marking up the return value and arguments of a function.
Highlights
A summary of the main points discussed in a book component (chapter, section, and so on).
MsgSet
Procedure
A procedure. Procedures contain Step
s, which
may contain SubStep
s.
Sidebar
A sidebar.
Users of DocBook are provided with a surfeit of inline elements. Inline elements are used to mark up running text. In published documents, inline elements often cause a font change or other small change, but they do not cause line or paragraph breaks.
In practice, writers generally settle on the tagging of inline elements that suits their time and subject matter. This may be a large number of elements or only a handful. What is important is that you choose to mark up not every possible item, but only those for which distinctive tagging will be useful in the production of the finished document for the readers who will search through it.
The following comprehensive list may be a useful tool for the process of narrowing down the elements that you will choose to mark up; it is not intended to overwhelm you by its sheer length. For convenience, we've divided the inlines into several subcategories.
The classification used here is not meant to be authoritative, only helpful in providing a feel for the nature of the inlines. Several elements appear in more than one category, and arguments could be made to support the placement of additional elements in other categories or entirely new categories.
These inlines identify things that commonly occur in general writing:
Abbrev
Acronym
An often pronounceable word made from the initial (or selected) letters of a name or phrase.
Emphasis
Footnote
A footnote. The location of the Footnote
element identifies the location of the first reference to the
footnote. Additional references to the same footnote can be inserted with
FootnoteRef
.
Phrase
Quote
Trademark
The cross reference inlines identify both explicit cross references,
such as Link
, and implicit cross references like
GlossTerm
. You can make the most of the implicit
references explicit with a LinkEnd
attribute.
These inlines are used to mark up text for special presentation:
ForeignPhrase
A word or phrase in a language other than the primary language of the document.
WordAsWord
A word meant specifically as a word and not representing anything else.
ComputerOutput
Literal
Markup
A string of formatting markup in text that is to be represented literally.
Prompt
A character or string indicating the start of an input field in a computer display.
Replaceable
SGMLTag
UserInput
DocBook does not define a complete set of elements for representing equations. No one has ever pressed the DocBook maintainers to add this functionality, and the prevailing opinion is that incorporating MathML using a mechanism like namespaces is probably the best long-term solution.
DocBook V4.5 added a mathphrase
element to support
simple, textual mathematics that doesn't require extensive markup.
InlineEquation
mathphrase
A mathematical phrase, an expression that can be represented with ordinary text and a small amount of markup.
Subscript
Superscript
A superscript (as in x2, the mathematical notation for x multiplied by itself).
These elements describe aspects of a user interface:
Accel
GUIButton
GUIIcon
GUILabel
The text of a label in a GUI.
GUIMenu
GUIMenuItem
GUISubmenu
KeyCap
KeyCode
The internal, frequently numeric, identifier for a key on a keyboard.
KeyCombo
KeySym
MenuChoice
MouseButton
Shortcut
A key combination for an action that is also accessible through a menu.
Many of the technical inlines in DocBook are related to programming.
Action
ClassName
The name of a class, in the object-oriented programming sense.
Constant
ErrorCode
ErrorName
ErrorType
Function
The name of a function or subroutine, as in a programming language.
Interface
InterfaceDefinition
Literal
MsgText
Parameter
Property
A unit of data associated with some part of a computer system.
Replaceable
ReturnValue
StructField
StructName
The name of a structure (in the programming language sense).
Symbol
Token
Type
VarName
These inlines identify parts of an operating system, or an operating environment:
Application
Command
The name of an executable program or other software command.
EnVar
Filename
MediaLabel
A name that identifies the physical medium on which some information resides.
MsgText
Option
Parameter
Prompt
A character or string indicating the start of an input field in a computer display.
SystemItem
There are also a number of general-purpose technical inlines.
Application
Database
Email
Filename
Hardware
InlineGraphic
An object containing or pointing to graphical data that will be rendered inline.
Literal
MediaLabel
A name that identifies the physical medium on which some information resides.
Option
Optional
Replaceable
Symbol
Token
Type
A typical Book
, in English at least, consists of
some meta-information in a BookInfo
(Title
, Author
,
Copyright
, and so on), one or more
Preface
s, several Chapter
s, and perhaps a
few Appendix
es. A Book
may also
contain Bibliography
s,
Glossary
s, Index
es and a
Colophon
.
Example 2.2, “A Typical Book” shows the structure of a typical book. Additional content is required where the ellipses occur.
Example 2.2. A Typical Book
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <book> <bookinfo> <title>My First Book</title> <author><firstname>Jane</firstname><surname>Doe</surname></author> <copyright><year>1998</year><holder>Jane Doe</holder></copyright> </bookinfo> <preface><title>Foreword</title> ... </preface> <chapter> ... </chapter> <chapter> ... </chapter> <chapter> ... </chapter> <appendix> ... </appendix> <appendix> ... </appendix> <index> ... </index> </book>
Chapter
s, Preface
s, and
Appendix
es all have a similar structure. They
consist of a Title
, possibly some additional
meta-information, and any number of block-level elements followed by
any number of top-level sections. Each section may in turn contain any
number of block-level elements followed by any number from the next
section level, as shown in Example 2.3, “A Typical Chapter”.
Example 2.3. A Typical Chapter
<!DOCTYPE chapter PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <chapter><title>My Chapter</title> <para> ... </para> <sect1><title>First Section</title> <para> ... </para> <example> ... </example> </sect1> </chapter>
For documents smaller than a book, such as: journal articles, white
papers, or technical notes, Article
is frequently
the most logical starting point. The body of an
Article
is essentially the same as the body of a
Chapter
or any other component-level element, as
shown in Example 2.4, “A Typical Article”
Article
s may include
Appendix
es, Bibliography
s,
Index
es and Glossary
s.
Example 2.4. A Typical Article
<!DOCTYPE article PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <article> <artheader> <title>My Article</title> <author><honorific>Dr</honorific><firstname>Emilio</firstname> <surname>Lizardo</surname></author> </artheader> <para> ... </para> <sect1><title>On the Possibility of Going Home</title> <para> ... </para> </sect1> <bibliography> ... </bibliography> </article>
The reference page or manual page in DocBook was inspired by, and in fact designed to reproduce, the common UNIX “manpage” concept. (We use the word "page" loosely here to mean a document of variable length containing reference material on a specific topic.) DocBook is rich in markup tailored for such documents, which often vary greatly in content, however well-structured they may be. To reflect both the structure and the variability of such texts, DocBook specifies that reference pages have a strict sequence of parts, even though several of them are actually optional.
Of the following sequence of elements that may appear in a RefEntry
, only two are obligatory: RefNameDiv
and RefSect1
.
DocInfo
The DocInfo
element contains
meta-information about the reference page (which should not be
confused with RefMeta
, which it
precedes). It marks up information about the author of the document,
or the product to which it pertains, or the document's revision
history, or other such information.
RefMeta
RefMeta
contains a title for
the reference page (which may be inferred if the
RefMeta
element is not present) and an indication
of the volume number in which this reference page occurs. The
ManVolNum
is a very UNIX-centric concept. In
traditional UNIX documentation, the subject of a reference page is
typically identified by name and volume number; this allows you to
distinguish between the uname command,
“uname(1)” in volume 1 of the documentation and the
uname
function, “uname(3)” in
volume 3.
Additional information of this sort such as conformance or
vendor information specific to the particular environment you are
working in, may be stored in RefMiscInfo
.
RefNameDiv
The first obligatory element is RefNameDiv
, which is a wrapper for
information about whatever you're documenting, rather than the
document itself. It can begin with a RefDescriptor
if several items are being
documented as a group and the group has a name. The RefNameDiv
must contain at least one
RefName
, that is, the name of
whatever you're documenting, and a single short statement that sums up
the use or function of the item(s) at a glance: their RefPurpose
. Also available is the RefClass
, intended to detail the
operating system configurations that the software element in question
supports.
If no RefEntryTitle
is given in the
RefMeta
, the title of the reference page is the
RefDescriptor
, if present, or the first
RefName
.
RefSynopsisDiv
A RefSynopsisDiv
is intended
to provide a quick synopsis of the topic covered by the reference
page. For commands, this is generally a syntax summary of the command,
and for functions, the function prototype, but other options are
possible. A Title
is allowed, but
not required, presumably because the application that processes
reference pages will generate the appropriate title if it is not
given. In traditional UNIX documentation, its title is always
“Synopsis”.
RefSect1
…RefSect3
Within RefEntry
s, there are only three levels
of sectioning elements: RefSect1
,
RefSect2
, and RefSect3
.
Example 2.5, “A Sample Reference Page” shows the beginning of a RefEntry
that illustrates one possible
reference page:
Example 2.5. A Sample Reference Page
<refentry id="printf"> <refmeta> <refentrytitle>printf</refentrytitle> <manvolnum>3S</manvolnum> </refmeta> <refnamediv> <refname>printf</refname> <refname>fprintf</refname> <refname>sprintf</refname> <refpurpose>print formatted output</refpurpose> </refnamediv> <refsynopsisdiv> <funcsynopsis> <funcsynopsisinfo> #include <stdio.h> </funcsynopsisinfo> <funcprototype> <funcdef>int <function>printf</function></funcdef> <paramdef>const char *<parameter>format</parameter></paramdef> <paramdef>...</paramdef> </funcprototype> <funcprototype> <funcdef>int <function>fprintf</function></funcdef> <paramdef>FILE *<parameter>strm</parameter></paramdef> <paramdef>const char *<parameter>format</parameter></paramdef> <paramdef>...</paramdef> </funcprototype> <funcprototype> <funcdef>int <function>sprintf</function></funcdef> <paramdef>char *<parameter>s</parameter></paramdef> <paramdef>const char *<parameter>format</parameter></paramdef> <paramdef>...</paramdef> </funcprototype> </funcsynopsis> </refsynopsisdiv> <refsect1><title>Description</title> <para> <indexterm><primary>functions</primary> <secondary>printf</secondary></indexterm> <indexterm><primary>printing function</primary></indexterm> <function>printf</function> places output on the standard output stream stdout. … </para> </refsect1> </refentry>
DocBook contains markup for the usual variety of front- and backmatter necessary for books and articles: indexes, glossaries, bibliographies, and tables of contents. In many cases, these components are generated automatically, at least in part, from your document by an external processor, but you can create them by hand, and in either case, store them in DocBook.
Some forms of backmatter, like indexes and glossaries, usually require additional markup in the document to make generation by an application possible. Bibliographies are usually composed by hand like the rest of your text, unless you are automatically selecting bibliographic entries out of some larger database. Our principal concern here is to acquaint you with the kind of markup you need to include in your documents if you want to construct these components.
Frontmatter, like the table of contents, is almost always generated
automatically from the text of a document by the processing
application. If you need information about how to mark up a table of
contents in DocBook, please consult the reference page for
ToC
.
In some highly-structured documents such as reference manuals, you can automate the whole process of generating an index successfully without altering or adding to the original source. You can design a processing application to select the information and compile it into an adequate index. But this is rare.
In most cases—and even in the case of some reference manuals—a useful index still requires human intervention to mark occurrences of words or concepts that will appear in the text of the index.
Docbook distinguishes two kinds of index markers: those that are singular and result in a single page entry in the index itself, and those that are multiple and refer to a range of pages.
You put a singular index marker where the subject it refers to actually occurs in your text:
<para> The tiger<indexterm> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm> is a very large cat indeed. </para>This index term has two levels,
primary
and
secondary
. They correspond to an increasing amount
of indented text in the resultant index. DocBook allows for three
levels of index terms, with the third labeled
tertiary
.
There are two ways that you can index a range of text. The first is to put index marks at both the beginning and end of the discussion. The mark at the beginning asserts that it is the start of a range, and the mark at the end refers back to the beginning. In this way, the processing application can determine what range of text is indexed. Here's the previous tiger example recast as starting and ending index terms:
<para> The tiger<indexterm id="tiger-desc" class="startofrange"> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm> is a very large cat indeed… </para> ⋮ <para> So much for tigers<indexterm startref="tiger-desc" class="endofrange">. Let's talk about leopards. </para>
Note that the mark at the start of the range identifies itself as the
start of a range with the Class
attribute, and provides an ID
.
The mark at the end of the range points back to the start.
Another way to mark up a range of text is to specify that the entire
content of an element, such as a chapter or section, is the complete
range. In this case, all you need is for the index term to point to
the ID
of the element that
contains the content in question. The Zone
attribute of indexterm
provides this functionality.
One of the interesting features of this method is that the actual index marks do not have to occur anywhere near the text being indexed. It is possible to collect all of them together, for example, in one file, but it is not invalid to have the index marker occur near the element it indexes.
Suppose the discussion of tigers in your document comprises a
whole text object (like a Sect1
or a Chapter
) with an
ID
value of
tiger-desc
. You can put the following
tag anywhere in your document to index that range of text:
<indexterm zone="tiger-desc"> <primary>Big Cats</primary> <secondary>Tigers</secondary></indexterm>
DocBook also contains markup for index hits that point to other index
hits (of the same type such as "See Cats, big" or "See also
Lions"). See the reference pages for See
and
SeeAlso
.
After you have added the appropriate markup to your document, an external application can use this information to build an index. The resulting index must have information about the page numbers on which the concepts appear. It's usually the document formatter that builds the index. In this case, it may never be instantiated in DocBook.
However, there are applications that can produce an index marked up in
DocBook. The following example includes some one- and two-level
IndexEntry
elements (which
correspond to the primary and secondary levels in the
indexterm
s themselves) that begin with the letter D:
<!DOCTYPE index PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <index><title>Index</title> <indexdiv><title>D</title> <indexentry> <primaryie>database (bibliographic), 253, 255</primaryie> <secondaryie>structure, 255</secondaryie> <secondaryie>tools, 259</secondaryie> </indexentry> <indexentry> <primaryie>dates (language specific), 179</primaryie> </indexentry> <indexentry> <primaryie>DC fonts, <emphasis>172</emphasis>, 177</primaryie> <secondaryie>Math fonts, 177</secondaryie> </indexentry> </indexdiv> </index>
Glossary
s, like Bibliography
s, are often
constructed by hand. However, some applications are capable of
building a skeletal index from glossary term markup in the document.
If all of your terms are defined in some glossary database, it may
even be possible to construct the complete glossary automatically.
To enable automatic glossary generation, or simply automatic linking
from glossary terms in the text to glossary entries, you must add
markup to your documents. In the text, you markup a term for
compilation later with the inline GlossTerm
tag. This tag can have a LinkEnd
attribute whose value is the ID of the actual entry in the
glossary.[11]
For instance, if you have this markup in your document:
<glossterm linkend="xml">Extensible Markup Language</glossterm> is a new standard…
your glossary might look like this:
<!DOCTYPE glossary PUBLIC "-//OASIS//DTD DocBook V3.1//EN"> <glossary><title>Example Glossary</title> ⋮ <glossdiv><title>E</title> <glossentry id="xml"><glossterm>Extensible Markup Language</glossterm> <acronym>XML</acronym> <glossdef> <para>Some reasonable definition here.</para> <glossseealso otherterm="sgml"> </glossdef> </glossentry> </glossdiv>
Note that the GlossTerm
tag
reappears in the glossary to mark up the term and distinguish it from
its definition within the
GlossEntry
. The ID
that
the GlossEntry
referenced in the
text is the ID of the GlossEntry
in the Glossary
itself. You can use the link between source and glossary to create a
link in the online form of your document, as we have done with the
online form of the glossary in this book.
There are two ways to set up a bibliography in DocBook: you can have
the data raw or
cooked. Here's an example of a raw
bibliographical item, wrapped in the Biblioentry
element:
<biblioentry xreflabel="Kites75"> <authorgroup> <author><firstname>Andrea</firstname><surname>Bahadur</surname></author> <author><firstname>Mark</><surname>Shwarek</></author> </authorgroup> <copyright><year>1974</year><year>1975</year> <holder>Product Development International Holding N. V.</holder> </copyright> <isbn>0-88459-021-6</isbn> <publisher> <publishername>Plenary Publications International, Inc.</publishername> </publisher> <title>Kites</title> <subtitle>Ancient Craft to Modern Sport</subtitle> <pagenums>988-999</pagenums> <seriesinfo> <title>The Family Creative Workshop</title> <seriesvolnums>1-22</seriesvolnums> <editor> <firstname>Allen</firstname> <othername role=middle>Davenport</othername> <surname>Bragdon</surname> <contrib>Editor in Chief</contrib> </editor> </seriesinfo> </biblioentry>
The “raw” data in a Biblioentry
is comprehensive to a
fault—there are enough fields to suit a host of different
bibliographical styles, and that is the point. An abundance of data
requires processing applications to select, punctuate, order, and
format the bibliographical data, and it is unlikely that all the
information provided will actually be output.
All the “cooked” data in a Bibliomixed
entry in a bibliography, on the
other hand, is intended to be presented to the reader in the form and
sequence in which it is provided. It even includes punctuation between
the fields of data:
<bibliomixed> <bibliomset relation=article> <surname>Walsh</surname>, <firstname>Norman</firstname>. <title role=article>Introduction to Cascading Style Sheets</title>. </bibliomset> <bibliomset relation=journal> <title>The World Wide Web Journal</title> <volumenum>2</volumenum><issuenum>1</issuenum>. <publishername>O'Reilly & Associates, Inc.</publishername> and <corpname>The World Wide Web Consortium</corpname>. <pubdate>Winter, 1996</pubdate></bibliomset>. </bibliomixed>
Clearly, these two ways of marking up bibliographical entries are suited to different circumstances. You should use one or the other for your bibliography, not both. Strictly speaking, mingling the raw and the cooked may be “kosher” as far as the DTD is concerned, but it will almost certainly cause problems for most processing applications.
[7] Many of these things are influenced by the SGML declaration in use. For the purpose of this discussion, we assume you are using the standard DocBook declaration.
[8] This is not absolutely true. SGML allows for the possibility that the reference could be implied by the application, but this is very rarely the case.
[9] Essentially, it can ensure that two different owners won't accidentally tread on each other. Nothing can prevent a given owner from reusing public identifiers, except maybe common sense.
[11] Some sophisticated formatters might even be able to establish the link simply by examining the content of the terms and the glossary. In that case, the author is not required to make explicit links.