Advanced Features That Are Simple To Use

Introduction

The preceding chapter made reference to a number of advanced features that are described in this chapter. These features are not advanced in the sense of "difficult", but in the sense of "more powerful." They are intended to make the framework easier and more natural to use.

There are several important advanced features:

  1. Changing ownership of children
  2. Handling element IDs
  3. Generating efficient object-access methods using NMTOKEN declarations.
  4. Forwarding attributes using NMTOKENS
An understanding of all these features is necessary to use the framework well, and this chapter covers them in detail. Several other features, such as handling of the REQUIRED, IMPLIED and FIXED keywords are covered as well.

Entity handling, which is a really advanced topic, but required to deal with binary data and useful for a variety of other tasks, is dealt with separately in the next chapter.

Ownership of Children

As mentioned in the previous chapter, XMLElements by default own their children. This lets the framework manage the application tree in a natural and efficient manner. When an element is deleted, its entire sub-tree is deleted with it. This is not always what we would like to happen -- in particular, there may be cases when we would like to preserve part of the element's content for use somewhere else in the tree. In this case, the best way to preserve the sub-tree is to use the swapChild() method to exchange the head of the sub-tree of interest with a null pointer, as follows:

int nIndexToSave = 5; XMLElement *pSubTreeToSave = 0; pElement->swapChild(nIndexToSave,pSubTreeToSave);
Doing this will cause delete to be called on the null pointer, which is always safe, and the saved sub-tree can be put to use somewhere else, probably also using swapChild().

If you want to preserve the entire contents of the element, all you have to do is call setOwnsChildren(false) on it, and this will leave all of the children in place. A memory leak checker would be a good thing to have around if you start doing this very often.

Note that children are distinct from members: by default all objects own their pointer members, and will delete them when their destructor is called. If this is not the desired behaviour, simply set the pointers to null in the user destructor preamble block to prevent their deletion. Calling setOwnsChildren(false) does not have any affect on the handling of pointer members.

ID and IDREF Attributes

XML documents are trees, not graphs. They cannot contain loops, and this means that they should not contain the same pointer twice. An example is shown in Figure 4.1 of what happens when a tree contains the same pointer in more than one location.

===========================================================
Figure 4.1 -- Making a Tree a Graph
===========================================================

		   Root
	 	    |
    	-------------------------
	|	|	 	|
        |     Child2          Child4
        |       |         
     Child1-----|
	
===========================================================
In this case, the first child of the root is also the first child of the second child of the root. Such an element does not know who its parent is, and notions like in-order traversal of the tree are ill-defined.

I must emphasize that although structures like this will not occur unless you explictly create them, the NPF does not prevent them from occuring. If you do explicitly create loops, however, bad things will happen.

There is a real problem here, though, which is that conceptually we often want to have "the same thing" appearing in multiple places in a document/application, and it is not satisfactory to have multiple copies of something around when what we really want is multiple references to it. Fortunately, XML has a mechanism for handling just this situation: ID and IDREF attributes.

An element may have at most one attribute with type ID. An example of a declaration with and ID attribute is shown in Figure 4.2. In this case, the name of the attribute does not affect its C++ type, because ID and IDREF attributes are both mapped to strings, but I have given it an "str" prefix for consistency.

===========================================================
Figure 4.2 -- An Attribute with Type ID
===========================================================

<!ELEMENT person EMPTY>
<!ATTLIST person
strID ID #REQUIRED>
	
===========================================================
Having an ID attribute modifies the generated code in several ways. The constructor and addAttribute() methods are modified so that when an element with and ID is created, the ID is registered in a static map in the base class. This makes it easy to rapidly find an element with a given ID. The set method for the attribute also deals with the registration methods, first unregistering the old name before registering the new. Note that an empty string is an illegal value for an ID. In general, IDs should be human-readable names, which is why they are strings rather than numbers. IDs must also conform the XML's notion of a valid name, which means they must start with a letter and basically contain only letters, numbers, hyphens, underscores and periods. It is up to the developer to ensure that these restrictions are conformed to.

Elements with IDs also have a getID() method generated. This over-rides the method in the base class, which returns an empty string, and allows the framework to access the ID regardless of what you call it. You can use either getID() or the generated get method to get the ID -- it does not matter which.

ID handling is the reason behind the dummy constructor that gets generated for every element as well. The element factory uses these dummy constructors as a way of preventing registration of elements with IDs during startup. This is necessary because the factory uses a static map to store a pointer to each derived class it knows about, and there is no way to tell this static map to wait until the static map used by the registration mechanism is available before registering the new elements. There are many other ways of getting this result, but using a dummy constructor was the fastest and easiest to implement.

IDREFs are identical to IDs in their handling rules: and IDREF is just a reference to an ID. An element can have any number of attributes of type IDREF, and the generated code for each of them allows simple access to the underlying element. For each IDREF attribute, the framework generates a method to get the element pointed to by that IDREF value. Because there is no way of knowing what type that element may be, it has to be returned as a pointer to an XMLElement and cast to the correct type. An example is shown in Figure 4.3.

===========================================================
Figure 4.3 -- Code Generated for IDREF Attribute
===========================================================
<!ELEMENT client EMPTY>
<!ATTLIST client
strServer IDREF #REQUIRED>
	
XMLElement* getServerElement() 
	{return XMLElement::getElement(m_strServer);}
===========================================================
There are also the usual set and get methods generated for IDREF attributes.

Using this mechanism, it is possible to have your documents as deeply intertwined as the Gordian Knot itself, all the while retaining a basic underlying tree structure that is simple to manage and maintain.

NMTOKEN Declarations for Type-Safe Child Access

Perhaps the most powerful advanced feature is the use of NMTOKEN declarations to generate type-safe access methods for specific children. This mechanism is really more powerful than it needs to be, but ensures that developers are not significantly inconvenienced by the framework's generic typing.

The problem is this: most classes have a pretty well defined structure. They contain a known number of known types, which may as well be put in a known order. The XML declaration for such a class might look like:

	<!ELEMENT person (address, job, payrollData)>
In this case, the full machinery of generic access is not needed -- there will never be a tree-walker designed to run over this class. It just isn't very interesting. What we would like is a means of mapping the contained classes to ordinary member variables, while still retaining the full power of XML serialization, forward compatibility and so on.

The way to do this is provided by the NMTOKEN attribute type in XML. A NMTOKEN (pronounced "name token") is a collection of name characters. It is just like a name except that it does not have to start with a letter -- it can start with a number or other allowed name character as well. Why and how such a token type ever got defined is one of the mysteries of SGML that we needn't go into here. What matters is that it provides a convenient syntax for declaring "mapped attributes" -- that is, attributes that are in fact mapped to specific children of the element.

For the person element declared above, the attribute list might look like:

	<!ATTLIST person
	strID ID #REQUIRED
	Address NMTOKEN "0"
	Job NMTOKEN "1"
	Payroll NMTOKEN "2">
The syntax for NMTOKEN declarations is straightforward: the base name of the variable is given as the attribute name, and the zero-based position of occurence in the content model is given in the default value string. More complex declarations are possible: suppose we know that the job class has the content model:
	<!ELEMENT job (location, position, level)>
We could generate code to get at its "level" member directly from the person class by modifying the person attribute list to look like:
	<!ATTLIST person
	strID ID #REQUIRED
	Address NMTOKEN "0"
	Job NMTOKEN "1"
	Payroll NMTOKEN "2"
	Level NMTOKEN "1.2">
This is not something that is recommended, but in keeping with the NPF philosophy of giving developers the power they need, it is possible.

An example of the code generated from NMTOKEN declarations is shown in Figure 4.4. As well as the accessor method there are several other methods that should be understood as well.

	
===========================================================
Figure 4.4 -- Partial Code Generated for NMTOKEN Attribute
===========================================================
<!ELEMENT person (address, job, payrollData)>
<!ATTLIST person
Job NMTOKEN "1">

class person : virtual public XMLElement
{
public:
...
  job* getJob() 
  {
	if (m_bDirtyCache) remapAll();
        return m_pJob;
  }
...
protected:
  job*        m_pJob;
};
===========================================================
The generated person class now contains a member variable that is a pointer to type job, and has a name given by the NMTOKEN attribute name (Job) with "m_p" prepended for consistency with the NPF naming conventions. An accessor method to get this pointer is included in the public interface of the class. This accessor first checks to see if the pointer value cached in the member variable is dirty -- that is, have there been any changes to the structure of the element's content that would invalidate the member variable, so it no longer points to the second child element? If the cached value is dirty, then the remapAll() method is called -- this generated method, and example of which is shown in Figure 4.5, calls getMappedChild() to search the children and remaps each of the mapped member varibles to its corresponding child pointer. This ensures that access to mapped members is efficient -- if the content of their element has not changed, they are almost as efficient to access as an ordinary member variable, with only a single boolean test as added overhead. The accessor is also type-safe, so we don't have to get involved with dynamic casting by hand, which at least reduces the amount of typing.
===========================================================
Figure 4.5 -- Example of remapAll() Method
===========================================================
void person::remapAll()
{
        m_pJob = dynamic_cast<job*>(getMappedChild("1"));
	assert(m_pJob != 0);

        m_bDirtyCache = false;
}
===========================================================
There is one more convenience method that comes into play when NMTOKEN attributes are used to map children to member variables: the init() method. The problem is that to map children to variables, the children must exist. They may be added using addChild, of course, but for simple classes it is convenient to have a method that generates the children for us. This is what XMLElement::init() does. It simply walks over the list of allowed children in the order they appear in the DTD (which is the same as the order they are added to the allowed children list in the generated constructor) and adds one of each to the children. If an element already has some children, they are assumed to be of the correct type and only the children following them in the list are added. If the children are of the wrong type (that is, if they are out of order) the assertion in remapAll() will fail.

The init() method is called automatically by the framework as required, and should not have to be called by user.

Forwarding Attributes with NMTOKENS

Suppose you want to use the NPF with a pre-existing class library, and want to wrap some of the classes in that library with NPF classes to take advantage of the serialization framework. For example, the VTK 3D visualization library contains a camera class that has a position, a point that it is aimed toward, and an "up" vector that defines the camera orientation. It would be convenient to be able to create an NPF camera element that represented these things so that the VTK camera state could be serialized along with the rest of the application.

The NMTOKENS attribute type is used to signal this kind of mapping the the code-generator. In XML, NMTOKENS consist of one or more NMTOKEN attributes, but in this case we are over-loading it with a different meaning. An example of a NMTOKENS attribute delaration is:

 <!ELEMENT test EMPTY>
 <!ATTLIST test
  fX NMTOKENS "0">
and the code generated for this looks like:
        float getX(){
          //##NPF_GET_fX##
          return 0;
          //##NPF_GET_fX##
        };
        void setPoint(float fX){
          //##NPF_SET_pfPoint##

          //##NPF_SET_pfPoint##
        };
In this case, the bodies of the accessor methods are wrapped with user comment blocks, so they can be filled in as required to communicate with the non-NPF class that is being wrapped. While less than ideal for some purposes, this does allow NPF serializaion code to be generated for any class at all, and thus considerably extends the utility of the framework.

Note that unlike ordinary pointer members, NMTOKENS pointer members are not shared across serialization. That is, like pointers in containers, if two objects have NMTOKEN pointer members that are shared between them, after serialization they will have unshared pointers that point to the same value at different locations. For this reason, NMTOKENS pointer members should not generally be used.

REQUIRED, IMPLIED and FIXED Attributes

XML allows attributes to be declared as required, implied or fixed. A required attribute must exist for the document to parse, and implied attribute is ignored or otherwise defaulted if it is not given, and a fixed attribute can only have the value given in the DTD -- it is like a C++ "const" value, and the framework generates const members to represent fixed attributes.

Required and implied attributes are treated identically unless they are pointers. A required attribute that is held as a pointer must be initialized before the object is serialized.

Default and Allowed Values

If a default value is given for an attribute, it is used as an initializer for that attribute in the constructor. If a list of allowed values is given, an enum is created for it, and the member variable is given the enum type. If the variable has a numeric type the enumerated values are made to approximate the given values as closely as possible, although they are given the names of the numbers in this case.

The values in the allowed value list must not be quoted, and may not contain spaces. The default value must be quoted, and may not contain spaces. This is a feature of the current XML draft specification.

Private, Protected and Virtual Bases

The element naming syntax supports inheritance using a special naming convention to identify the type of inhertiance. The most common case, public non-virtual inheritance, is supported by just giving the name of the base class separated by a dot from the element in question:

<!ELEMENT base.derived EMPTY>
which generates a class that looks like:
class derived : public base { ... };
Protected and private inheritance are described by two and three dots respectively, and virtual inheritance follows the same pattern but uses commas in place of dots. The syntax is shown in the table below
Table 1: Syntax for Describing Inheritance in Element Names
==========================================================================
Access	 Virtual?     Syntax			Code
--------------------------------------------------------------------------
public	    no	   base.derived	    class derived : public base
protected   no	   base..derived    class derived : protected base
private	    no	   base...derived   class derived : private base
public	    yes	   base,derived	    class derived : virtual public base
protected   yes	   base,,derived    class derived : virtual protected base
private	    yes	   base,,,derived   class derived : virtual private base
==========================================================================
The DTD generated by the system to use for parsing serialized state contains modified declarations for classes with bases, in which the base class names are stripped out. Note that the common base class XMLNode is automatically added by the system, and it is an error to add it explicitly.

Modified DTD

A number of mentions have been made to the modified DTD. This section briefly reviews the changes that get made to the DTD. The first point is that the name of the modified DTD is the same as the original, but with the letters XG pre-pended. This name must not be changed, as it is the one used by the framework when validating output is generated.

The main modifications to the DTD are the changes related to container classes, in which the container class is given the correct content model and the leaf nodes for the containers are generated. Other changes include the stripping out of NMTOKEN attributes from the attribute lists, as they are not "real" attributes in the XML sense when used by the framework. The final area of change is in the naming of classes with explicitly specified bases: as mentioned above, the names of the base classes are stripped out, as each derived class serializes itself using its name alone.

DTD Structure

The next chapter is all about entities, but there is a special kind of entity that does not fit in with the rest: this is the parameter entity, which is a way of breaking a DTD up in to reuseable fragments. Parameter entity handling in the NPF is sharply limited, but still provides enough power to allow for a high level of design reuse.

A parameter entity is declared as follows:

<!ENTITY % entityName SYSTEM "filename">
The details of entity declarations are discussed in the next chapter: for now all that matters is to understand that the parameter entity is a mapping between the entity name and the contents of the file, and that the "%" sign, which MUST be separated from the name by a space in the declaration, indicates this is a parameter entity rather than one of the other kinds described below. When the text "%entityName;" appears in the DTD, they are replaced by the contents of the named file IF the file contains a complete declaration. Otherwise, because of a feature of the SP parser, the framework does not process things properly. Therefore, in the NPF, only parameter entities that consist of complete declarations are supported. So a parameter entity file is a natural place to put a single declaration, and then that parameter entity represents a single class. Or if there is a small group of classes that work together in a particular design pattern, then they may be put in a single parameter entity file. Thus, parameter entities provide a simple way of breaking your specification up into reuseable pieces.

Summary

This chapter has continued the introduction of the NPF in more detail, with a wide range of powerful features that can be used to represent more complex programming structures. The next chapter will cover the only genuinely difficult aspect of the NPF: entities.