Enlightenment: Narrative Programming by Tom Radcliffe

The Narrative Programming Framework

Having laid out the problems that NP is supposed to solve, and introduced the NP idea, it is now time to see how some of this plays out in the real world; to make clearer what I am talking about by means of a simple example.

NP is about application structure, not about functionality. My coding philosophy is summed up by Rule 122 from Grady Booch's book on managing the OO project: Never write code unless you absolutely, positively must. For most interesting functionality -- be it numerical algorithms, image processing, spell checking, data visualization, querying databases, you-name-it -- there are very good libraries around, and many of them are either free or very cheap. No one who wants to make money in software will do so by spending much time in any area where such code exists, because your competition will use it even if you don't, and so their costs will be lower, their bug-counts much lower, and their development cycle potentially much shorter. The NP view of the ideal application is one in which 90% of the code is written by someone else, and most of the remaining 10% is generated using the Narrative Programming Framework (NPF). The importance of the last few percent is not to be under-rated however -- some custom code is always going to be part of any significant application.

The real trick in using other people's code is to integrate disparate design philosophies into a single, well-integrated whole. If your GUI is structured around Document/View and your visualization library around Model/View/Controller, what are you to do? As described in the previous chapter, the NPF should help you deal with these problems by providing a solid, intuitive structure that is flexible enough to conform to both types of requirement without breaking down.

This chapter introduces the "what" of the NPF, in the sense that the NPF base classes and the DTD conventions for the code generator are the focus. The next chapter introduces the "how" of the NPF, in the sense of describing the Narrative Programming process: how to use NP and the NPF to rapidly create robust, extensible, well-structured applications.

What the NPF Is

The Narrative Programming Framework (NPF) is a small collection of base classes and a code generating application that works with them. The code generator -- xml2cpp -- reads XML DTDs that use a few simple conventions to convey C++ type information, and generates classes that are derived from the NPF base classes. These derived classes have lots of built-in functionality, both inherited from the base classes and due to the generated code. The xml2cpp application also generates a slightly modified version of the original DTD that can be used to parse documents that are produced by the generated classes.

A Simple Example

My first example is going to the the NP equivalent of a "hello world" program. Its purpose is just to introduce the basics of the NP DTD conventions and the base classes. The examples in this chapter are all of this kind: they are chosen to illustrate features of NP and the NPF, not to make a convincing case for why you would use it.

Consider the DTD shown in Figure 3.1. It describes a simple address book with telephone numbers. A person element has name and age attributes, and must contain an address and a double ended queue of phone numbers (deque, the short-hand term for double ended queue, rhymes with "cheque"). The address element has an empty content model, and attributes to hold the various parts of the address. The phone number deque appears to be able to hold a single element of type "long", which is not defined in the DTD, but we shall see that this interpretation is not correct.

==================================================
Figure 3.1 -- Simple DTD for NPF Code Generation
==================================================

<!ELEMENT person (address, dequePhoneNumber)>
<!ATTLIST person
strFirstName CDATA ""
strMiddleInitial CDATA ""
strLastName CDATA #REQUIRED
nAge CDATA "-1">

<!ELEMENT address EMPTY>
<!ATTLIST address
strCity CDATA #REQUIRED
strStreet CDATA #REQUIRED
nBuilding CDATA #IMPLIED
nApartment CDATA #IMPLIED
strPostalCode CDATA #REQUIRED
strCountry CDATA "Canada">

<!ELEMENT dequePhoneNumber (long)>

==================================================

When we process this simple DTD with the xml2cpp application, it results in over 500 lines of C++, not counting comments and blank lines. That is about 100 lines of C++ for each declaration. A C++ class is created by the code generator for each element declaration in the DTD, as well as some helper code to initialize some of the framework's data. The application also generates a modified version of the DTD that makes a bit more sense in purely XML terms than some of the stuff in the original one.

Generated Code: Declaration

For now, let's look at the generated code. Figure 3.2 shows the header file for the "person" class. As can be seen the header has the usual sort of defines to protect it against multiple inclusion, includes for the standard library string class and the NPFElement base class, and then a comment block that looks like a place has been reserved for the user to modify the generated code with her own includes. This appearance is absolutely accurate. At well-defined places throughout the code there are pairs of identical comments of the form "##NPF_USER_..." that indicate a places where the user -- that is, the developer -- can add to the generated code. Code that is put between these comment pairs is preserved across re-generation of the code, so when you change the DTD your modifications are not wiped out.

==================================================
Figure 3.2 -- Header File Generated by xml2cpp for Class: person
==================================================

#ifndef PERSON_H
#define PERSON_H

#include	<string>

#include	"npf_element.h"

//##NPF_USER_INCLUDE##

//##NPF_USER_INCLUDE##

class person : public virtual NPFElement
{
  public:

//##NPF_USER_PUBLIC_DECL##

//##NPF_USER_PUBLIC_DECL##

	person(NPFElement* pParent = 0);
	person(bool bDum)
		{if (bDum) m_strName = "person";}

	~person();

	void  addAttribute(const string &strName, const string &strValue);

	void toXML(class ostream& outStream);

	string getNotationDecls(set<NPFBinary*>l &setBinaryEntity);

	NPFElement* Clone() {return new person();}

	int& getAge() {return m_nAge;}
	void  setAge(int nAge) {m_nAge = nAge;}

	string& getFirstName() {return m_strFirstName;}
	void  setFirstName(string strFirstName) {m_strFirstName = strFirstName;}

	string& getLastName() {return m_strLastName;}
	void  setLastName(string strLastName) {m_strLastName = strLastName;}

	string& getMiddleInitial() {return m_strMiddleInitial;}
	void  setMiddleInitial(string strMiddleInitial) {m_strMiddleInitial = strMiddleInitial;}

  protected:
//##NPF_USER_PROTECTED_DECL##

//##NPF_USER_PROTECTED_DECL##

	void remapAll();

	int	m_nAge;
	string	m_strFirstName;
	string	m_strLastName;
	string	m_strMiddleInitial;
  private:
//##NPF_USER_PRIVATE_DECL##

//##NPF_USER_PRIVATE_DECL##

};

#endif

==================================================

The person class itself is derived from the NPFElement base class, as are all classes that represent XML elements. The inheritance is public and also virtual, in case you really need to derive a single element class from several other element classes. The syntax for generating classes with other bases will be discussed below.

The public interface of the class includes a comment pair that the user can put additional declarations between. This is where all user additions to the public interface of the class must go. After this comes the generated public interface, which consists of a number of specialized methods that are discussed in detail below, and get/set methods for each of the attributes of the class.

The specialized methods consist first of a pair of constructors, one of which takes a parent NPFElement pointer as an argument, the other a boolean. It is apparent that the second constructor is a dummy, and should never be used. It is used by the framework to avoid some problems with the order of initialization of static maps for classes that contain ID attributes -- this will be explained in more detail later, don't worry about it for now.

After the constructors is the destructor declaration, which hardly requires comment, and after the destructor the addAttribute method. This method is used during serialization, when a class is being read from a stream. Recall that the document is saved as XML, and one of XML's limitations is that all attributes are stored as quoted strings, so a floating point value might be saved as an attribute like: fPi="3.14". As part of the reading process, the framework parses the incoming XML, and the XML parser determines the name and value of each attribute, and calls add attribute on the currently active element to add them to that element's attribute list. For the derived classes this involves a process of type conversion so that the string value of the attribute gets converted to the appropriate base type.

The next method is ToXML(), which is responsible for generating well-formed XML for this class. It is most often called by the framework, but under some circumstances may be called by user code. The getNotationDecls() method is required for generated classes that contain certain types of binary data, and will be discussed in more detail where binary data is covered.

The final specialized method is Clone, which just returns an new element of the same type. I could have called this method New(), but prefer to emphasize the correct meaning of biological cloning: it is a means of creating an object that is identical in type, but different in being, and in an only weakly initialized state. That is, a child.

The protected part of the class declaration contains a comment block for user declarations, a declaration for the "remapAll()" method that will be ignored at this point, and then a declaration for each attribute in the element's attribute list. Note the type of the attributes: attributes whose names begin with "str" are given string type. The attribute whose name begins with "n" is given integer type. This is the most important convention that must be used to generate code from DTDs: attribute names encode their types using a simple form of "Hungarian Notation." The full convention is shown in Table ??? in the appendix. Table 3.1 shows the most commonly used types.

==================================================
Table 3.1 -- Hungarian Notation for Attribute Names
==================================================
	Prefix		C++ Type	Modifiers
--------------------------------------------------	
	  n		 int		p, g
	  nl		 long		p, g
	  f		 float		p, g
	  fl		 double		p, g
	  b		 bool		p, g
	  str		 string		g
==================================================

The prefix is distinguished from the attribute name by capitalization: the prefix must be in lower case, and the first letter of the name must be in upper case. Thus, "NName", "Nname" and "nname" are all invalid: only "nName" is allowed. The "modifiers" come before the prefix: p means pointer, g means static (a class global.) Pointers to strings are supported but should rarely be used.

Standard library strings are treated as base types in the NPF. They are the only class type that is supported in attributes. Other classes must be contained as part of the content model of the element -- that is, as part of the XML document tree contained by the element.

The declaration of the private interface of the class is empty except for the comment pair to contain any user additions. One could argue with the decision to generate the bulk of the class members in the protected rather than the private interface, and my defense is just that the purpose of NP is to support rapid development of well-structured applications, not to fulfill every desirable feature of an OO framework. Private members would impose too many restrictions on re-use to be suitable, and the XML DTD does not give a lot of scope for expressing private versus protected membership.

Generated Code: Implementation

The implementation generated for the class "person" is shown in Figure 3.3. There is a user code block for added include and other pre-processor directives. Constants and other definitions required by the implementation can be put in this block as well. I have opted to keep the number of user include blocks as small as possible, and named each one after its generic or most common function, rather than try to decompose the generated files into a bunch thin, semantically meaningful slices. The reason for this is the NP philosophy of not trying to impose too strict a model on developers -- different developers think about file structure in very different ways, so by keeping the solutions generic I am trying not to impose my own ways of doing things on others.

Note that the code generator included with this book may have slightly different user blocks defined than are described here, as this is one of those things that can change due to user requests. If users turn out to really, really need an include block that comes before the standard includes, for instance, one will be added. My feeling thus far has been that unless something really unfortunate has happened -- like a library developer who has decided to redefine a bunch of stuff in the standard includes in her own include files -- it should not matter than user includes are forced to come after the standard includes.

==================================================
Figure 3.3 -- Implementation File Generated by xml2cpp for Class: person
==================================================

#include "person.h"

#include <strstream.h>
#include <stdlib.h>

//##NPF_USER_INCLUDE##

//##NPF_USER_INCLUDE##


person::person(NPFElement* pParent) : NPFElement(pParent), m_nAge(-1)
{ 
//##NPF_USER_CONSTRUCTOR_PREAMBLE##

//##NPF_USER_CONSTRUCTOR_PREAMBLE##

	m_strName = "person";

	m_dequeAllowedChildren.push_back("address");
	m_dequeAllowedChildren.push_back("dequePhoneNumber");

//##NPF_USER_CONSTRUCTOR_BODY##

//##NPF_USER_CONSTRUCTOR_BODY##

}

person::~person()
{
//##NPF_USER_DESTRUCTOR_PREAMBLE##

//##NPF_USER_DESTRUCTOR_PREAMBLE##

//##NPF_USER_DESTRUCTOR_BODY##

//##NPF_USER_DESTRUCTOR_BODY##

}

void person::addAttribute(const string &strName, const string &strValue)
{
	if (strName == "nAge")
	{
	  m_nAge = atoi(strValue.c_str());
	}
	else if (strName == "strFirstName")
	{
	  m_strFirstName = RefToVal(strValue.c_str());
	}
	else if (strName == "strLastName")
	{
	  m_strLastName = RefToVal(strValue.c_str());
	}
	else if (strName == "strMiddleInitial")
	{
	  m_strMiddleInitial = RefToVal(strValue.c_str());
	}
	else
	{
	  NPFElement::addAttribute(strName,strValue);
	}
}

void person::toXML(ostream& ssXML)
{
	ssXML << "<person ";
	ssXML << "nAge=\"" << m_nAge << "\" ";
	ssXML << "strFirstName=\"" << ValToRef(m_strFirstName) << "\" ";
	ssXML << "strLastName=\"" << ValToRef(m_strLastName) << "\" ";
	ssXML << "strMiddleInitial=\"" << ValToRef(m_strMiddleInitial) << "\" ";
	NPFElement::attListXML(ssXML);
	ssXML << ">";
	getChildContentXML(ssXML);
	ssXML << "</person>";

}

void person::remapAll()
{

	m_bDirtyCache = false;
}

string person::getNotationDecls(set<NPFBinary*> &setBinaryEntity)
{
	string strDecls;


	return strDecls;
}

//##NPF_USER_OTHER_METHODS##

//##NPF_USER_OTHER_METHODS##
==================================================

The constructor implementation initializes the various values defined in the DTD -- note that the "age" member is set to -1, as specified by the default value in the DTD. There are also user code blocks before and after the generated body of the DTD, as this is one case where you may want to do things both before and after the standard initialization in some of the more complex cases. The base class constructor does some handling of the parent and root pointers for this object -- details of the base class introduced in the next couple of sections. The body of the constructor sets the element type name and adds the names of the permitted content to the allowed children queue. These are used to create "pre-populated" elements as will be described in the sections on base class functionality.

The destructor in this case does nothing, although if there were pointer members they would be deleted between the user preamble and body blocks. By default, all generated classes own their pointer members, so if you set a pointer member to some memory that is owned by another object, it is very important that you reset it to null before the destructor is called, or in the user destructor preamble.

The addAttribute() is one of the core methods for supporting XML serialization. If the description of the next couple of paragraphs is hard to follow you may want to come back to it after you've read the sections on serialization that follow. Although this method could also be used to set the values of various members, it would be silly to do so in most cases, as the set methods are more efficient. An important difference between the addAttribute() method and the set methods can be seen in the way string objects are handled. The set methods simply assign the class member to the value passed in. The addAttribute() method passes the argument string through the RefToValue method first. This is done to handle special characters during serialization.

Special characters are handled in XML by using "entities". Although too complex to cover in detail here, special characters are given names that can be include in the text as "entity references." For instance, the special characters "<", ">", "&", """ and "'" are given the names "lt", "gt", "amp", "quot" and "apos" in the XML spec, and every XML processor is required to recognize them. An entity reference is an entity name bounded by an ampersand and a semi-colon, so ">" becomes ">" in the XML character stream if it occurs in an attribute value or element content.

For example, if you set the country name in an address entry to be "Trinidad & Tobago" via the set call, when the object is serialized the name will be written out as "Trinidad & Tobago". When the serialized data are read back in, the framework calls addAttribute() to add the attribute value, and the RefToVal call will process the "&" back into just "&". If you try to set the country name to "Trinidad & Tobago" via the addAttribute call, something bad will happen, as the entity processor will see the "&" and look for a semi-colon following it, and will fail. So in almost all cases you are much better off just using the set methods, and not worrying too much about the XML entity mechanism -- the framework should take care of it for you where it matters.

One final feature of the addAttribute() method is worth noting: if the name of an attribute is not recognized, it is passed up to the base class for handling. In this case the name and the value are stored as strings, and are put back where they belong in the XML character stream when the object is serialized again. It is by this mechanism that the NPF supports "forward compatible" files, so that Version 1 of an application can read files from Version 2, do what it can with them, and write them back out again without losing any of the Version 2 specific data. The same mechanism allows all applications built with the framework to share files, so your word-processor can read you CAD files to do things like get meta-data from them.

The toXML() method is the other core serialization method, although it should almost never be called directly. As one might guess, it is responsible for converting the object to a well-formed XML sub-tree that gets written to the ostream object that gets passed to the method as an argument. All toXML() does is write out the element name and attributes, calls the base class to get any attributes that were not recognized, and then calls the base class getChildContentXML() method to walk over the content of the element and generate XML for sub-elements as well. Note that for string-type members the ValToRef() method is called to convert entities into references, ensuring valid XML gets generated.

The last two methods in the generated code can be safely ignored for now, and the final user code block is the place where the implementation for all added methods to the class should go.

Summary of Example

That completes my introduction to the generated code. There are similar classes generated for the other declarations, although the deque class has quite a few differences. These will be discussed after introducing the serialization process and the NPFElement base class functionality.

Also, after serialization and base class functionality are discussed, the modified DTD generated by the framework will be described.

XML Serialization

Serialization is probably the most powerful feature of code generated by the NPF. This section introduces the basic logic of the serialization process. A more detailed look at it will have to wait until after binary data handling has been described.

The interaction diagrams for serialization are shown in Figures 3.4 and 3.5. The main functions are the base class methods writeXML() and readXML() -- these are the only methods that client code should use routinely to deal with serialization. Although the other methods are exposed in the public interface of the class in case they are needed, almost all cases can and should be handled using these methods.

The signature for writeXML() is:

void writeXML(class ostream& outStream, bool bDecompose = false, bool bValidating = false);

It is called with an open output stream as the first argument. If the decompose flag is true then the XML document will have any binary entities it contains written to separate files. This process will be described in more detail in the sections on binary entities. If the validating flag is true then the document that gets generated will include a reference to the DTD, and be subject to validation against that DTD. In the default case the document is well-formed but not validating, so it can still be processed by an XML parser but the exact content model of each element cannot be checked against the DTD. For routine serialization, non-validating output is recommended, but for testing purposes validation is required.

==================================================
Figure 3.4 -- Interaction Diagram for writeXML()
==================================================

This figure left intentionally blank

==================================================

writeXML() is the only method that the client of the framework should call to serialize documents. It does a lot of XML-specific work. In particular, it generates a prolog for the document that defines all of the entities and notation declarations that are used by it. A "notation" is an XML declaration for a binary date type, and will be discussed in more detail where binary data are described.

writeXML() does not return a value, but does throw exceptions if an error occurs on output.

Note that writeXML() can be called on any element whatsoever. XML does not define a "document" element -- any element declared in the DTD can be the top level element in the document. For design purposes it will often be the case that one element will be designated the "root" of all documents, but even in that case there will be times when for testing or other reasons there will be reason to serialize from starting points other than the root.

The signature of readXML() is:

	static NPFElement* readXML(class istream& inStream, 
				   bool bClearReadMap=false);

Unlike writeXML() this is a static method. The reason for this is that we don't want to have to have an element already in order to read one. We want the read element to be created by the framework to represent the top level of the document.

readXML() takes an open input stream as its first argument. The second argument is a flag that tells if the read map should be cleared. Binary entities are put into a map upon being read so they can be reused if necessary. If they need to be re-read rather than just re-used, this flag should be set to true. This will be covered in more detail in the chapter on binary data, which as you will gather by this time is a fairly complex topic. The complexity is worthwhile, though, as there are not many interesting applications that only deal with text data.

==================================================
Figure 3.5 -- Interaction Diagram for readXML()
==================================================

This figure left intentionally blank

==================================================

Internally, the readXML() method creates an XML parser that is used to parse the incoming XML document and ensure that it conforms to the XML spec. The parser used in the framework is non-validating. There are several reasons for this: it adds un-necessary overhead to validate the document every time, and it should be fairly hard for users of the application to create invalid documents if testing has been done properly. Also, while XML is a very powerful tool, it should not be a straight-jacket, and I don't want users or developers to have to jump through a large number of complicated hoops just to avoid some XML restriction that we might be better off without.

Some people will take exception to this cavalier attitude toward the XML standard, but my reason for it is simply the observation that when you give developers or users tools that are too restrictive, they don't use them, and for good reason. The NPF aims to maintain a reasonable middle ground between the chaos that often overcomes application architectures, and the rigidity that sometimes threatens to bring development to a halt in overly restrictive environments.

As the parser runs over the serialized document, it calls a factory class to build new elements of the correct type as they are encountered. The addAttribute() method is used to add attributes to these elements. If an element of unknown type is encountered, the base class itself is used, and, as described above, if an unknown attribute is found it is handled by the base class as well.

NPFElement Basics

Underlying everything we have done up to this point is the NPFElement class. It defines default behaviors that allow the generated code to have only minor specializations to deal with strongly typed attributes. In this section only the simplest aspects of NPFElement are discussed.

As emphasized in the above, the purpose of the NPF is not to restrict developers unduly. In keeping with this philosophy, a great deal of stuff has been put into the public interface of the class that arguably should have been restricted to the protected interface. While I am sympathetic to this argument, I am aware that on the one hand "protected is public" because a sufficiently desperate developer is capable of simply creating a derived class with a public interface that exposes all the protected features of the base class, and that on the other hand there will be cases where consciencious developers have to derive classes simply because they need a bit of the interface that I have protected. Classes should not be created for such cavalier reasons.

Structurally, NPFElement is a tree class -- every NPFElement object contains a list of that element's children, as well as its textual content if it has any. Many of its public methods deal with adding, accessing and deleting children. As with any generic container class, the choice has to be made between type safety and implementation simplicity. Because the mapping between XML and C++ was complex enough to begin with, the choice was made to keep the NPFElement tree as simple as possible, so the element access and storage is all homogeneous: everything that is stored in the tree is an NPFElement. The generated code could have instantiated templates for each derived type that would have allowed type-safe access to children, but this would have increased the complexity of the code and substantially decreased its maintainability. Experience with using the standard library has shown that it is difficult to debug classes whose names are several hundred characters of template instantiation, and for complex content models the generated code could very easily become incomprehensible.

In terms of accessing elements in the tree, this means that a certain amount of casting has to go on. Using dynamic_cast<> consistently, and checking for null pointers after its use, will ensure a reasonable level of type safety at runtime. For simpler content models, which should be the norm, a mechanism exists that allows you to map contained elements to type-safe accessor methods, and it is strongly recommended that this be done wherever convenient, as it will produce much more robust code. This mechanism is discussed in the chapter on advanced features.

Several of the most important methods of NPFElement have already been introduced, and the entire interface is described in the appendix. Here we will look at a few methods for dealing with element content and children. Every XML element can contain both content (text) and children (other elements.) This "mixed content" model is supported by the NPFElement class by having separate accessor methods for content and children. Content blocks are indexed so that they precede child blocks. That is, if an element has a structure like:

	<parent>
		content block 0
		<child0 />
		content block 1
		<child1 />
		<child2 />
		content block 3
		<child3 />
		content block 4
	</parent>

All indices are zero-offset. Even empty content blocks, such as content block 2 in the example, have entries in the content list. The number of content blocks is always one greater than the number of children. The method getChildNumber() can therefore be used to get the number of content blocks as well, by just adding one to the return value. Content blocks are accessed by the getContent(int nIndex) method, which returns a non-const reference to the content block in question. By this means content blocks can be modified. In general, mixed content is a bad idea, and I strongly encourage developers to design DTDs that have no mixed content. An element should contain either text or children -- not both.

There are a number of methods for dealing with children. Children are held internally in a standard library list, so sequential access is much the best way of getting at them. A getChild(int nIndex) method is provided, but it should rarely be used as it requires an O(N) traversal of the list. The methods for dealing with children are summarized in Table 3.2.

==================================================================
Table 3.2 -- NPFElement Methods for Children
==================================================================
  Add a child to this element.  Returns a pointer to the child
  virtual NPFElement* addChild(const string &strName);
------------------------------------------------------------------
  Insert a child before the given index -- the given index becomes
  the index of the child, and insertChild(getChildNumber()) is
  equivalent to addChild().  As with addChild, an pointer to the
  new child is returned.
  NPFElement* insertChild(int nIndex, const string &strName);
------------------------------------------------------------------
  Get a given child of this element -- can be used at any time
  NPFElement* getChild(int nIndex);
------------------------------------------------------------------
  Get the first child of this element -- resets iterator used by
  getNextChild
  NPFElement* getFirstChild();
------------------------------------------------------------------
  Get the next child of this element -- getFirstChild must be called
  before this method is called for the first time.
  NPFElement* getNextChild();
------------------------------------------------------------------
  Get the last child of this element -- does not affect iterator
  NPFElement* getLastChild();
------------------------------------------------------------------
  Get the number of children of this element
  int getChildNumber();
------------------------------------------------------------------
  Remove a child by index -- does not delete the child
  NPFElement* removeChild(int nIndex);
------------------------------------------------------------------
  Swap the given element with the one at the given index, returning
  the one at the given index.  The replaced pointer is not deleted,
  just removed from this parent.  This is the only way to insert a
  child pointer directly into an element.  IT SHOULD BE RARELY USED.
  NPFElement* swapChild(int nIndex, NPFElement* pElement);
==================================================================

The addChild() method used by the serialization code to add a child to the element. Children are added at the end of the list. In the generated code, wherever possible the type-safe generated methods should be used. For instance, suppose we have a content model that allows children of type "address" and type "dequePhoneNumber". The framework will generate two methods:

address* addAddress(); dequePhoneNumber* addDequePhoneNumber();

These are convenience methods that call addChild() with the return value wrapped in the appropriate case. When addChild() is called, the type-name of the element is given, and addChild() calls the element factory to construct the desired object. If the name given is not found in the factory, a generic NPFElement is returned. The typical syntax is:

person *pPerson = dynamic_cast<person*>(addChild("person")); assert(pPerson != 0);

It is very important to check the return value of the dynamic_cast is checked, and the convenience methods do this for you, as well as hiding the rather awkward and verbose casting syntax.

Similar convenience methods are generated to wrap the insertChild() method. insertChild() creates a new element at the index indicated, which means it gets inserted before the old element of that index. Everything gets shifted along in the list.

There are a number of get methods for children. These are all generic, although as described in the chapter on advanced features, there are simple ways of telling the code generator to produce quite complex object-specific get methods. getChild() returns a child by index, and should be avoided wherever possible, because it requires iteration through the list.

It is better to group element accesses so that we deliberately iterate through the list, or to use the object-specific get methods described under advanced features. Iteration is performed by calling getFirstChild() followed by successive calls to getNextChild(). It is a very bad idea to insert or delete elements during the course of iteration. If you need to do this, collect information about where the insertions or deletions must occur during the iteration, and then perform the operations afterwards.

getLastChild() uses the list's back() method to return the last element in the list, and can be called at any time without risk. getChildNumber() unsurprisingly returns the number of children.

removeChild() removes a child from the list but does not delete it. A pointer to the removed element is returned and the caller must take responsibility for it. Ordinarily NPFElements own their children, but there is a means of turning this behavior off when required -- it is described under advanced features.

The addChild() and insertChild() methods take element names as their arguments. There is a reason for this: trees cannot contain loops, and if it was possible to pass pointers to elements directly to the tree it would be very easy to create loops and duplicate sub-trees, both of which violate the XML specification and make serialization nightmarish. XML provides a means of referring to elements from other elements that is well supported by the framework, and can be used in place of pointers to elements wherever it is required. It is described under advanced features.

There are cases, however, where re-structuring of the tree is required, and in keeping with the NP philosophy of making it possible for developers to do dangerous things when they have to, a swapChild() method has been provided that pulls an element pointer out of the tree and replaces it with the pointer that was passed in. It should be very rarely used -- almost any case where it would be used can be fixed up more sensibly by using the XML ID/IDREF mechanism described under advanced features, but if you really need to mess with the tree by hand, a mechanism is provided for you to do so.

Container Classes

The addChild() method in NPFElement is declared as virtual. Why this should be so is not immediately obvious, as there is no addChild() method in the generated code. The reason has to do with support for container classes.

Container classes appear frequently in well-designed code. For complex classes, NPFElement itself is a container, and it should be used as such. It won't be the best container for all purposes, but it is intended to be sufficient for most purposes. The standard library list was used rather than the deque or vector to facilitate adding and removing children, at the price of fast random access. This judgment was made based on a belief about how the framework would be used, and with the awareness that in the cases where fast random access was an issue the framework can be made to generate fast accessor methods, as described under advanced features.

But what about containers for simpler types, like numeric values and strings? The NPFElement class is not at all well suited for these, and it would be nice if we could find a way of representing standard library containers as XML without losing any of the programatic efficiency that those containers give us. This is what the container class syntax does. At the time of this writing, only two container types are supported, deques and maps, but there is no barrier in principle to adding the full suite, and I expect that this will happen before this book goes to press. For now, I will focus on these two because they involve most of the interesting functionality.

First, consider the last declaration in the example DTD:

	<!ELEMENT dequePhoneNumber (long)>

The leading "deque" in the name is significant -- it identifies this element as a standard library container. The type it contains, unsurprisingly, is a long integer. The container elements will work for any base types, and also standard library strings. For more complex types, ordinary elements should be declared. One slightly odd bit of syntax for container classes is that a container holding pointers is declared as:

	<!ELEMENT dequePhoneNumber (long)*>

The "*" in the content model in this case is intended to stand for the dereference operator. This is a bit different from attribute declarations, where a leading "p" is used to indicate pointer types.

The code generated by the declaration is shown in Figure 3.6. It is quite a bit different from that generated for ordinary elements. One noticeable difference is that there are no user blocks -- these container classes are intended for use "as-is."

==================================================
Figure 3.6 -- Code Generated for deque<long> element
==================================================

#ifndef DEQUE_PHONE_NUMBER_H
#define DEQUE_PHONE_NUMBER_H

#include "deque_element.h"


typedef DequeElement<long> DequeElementdequePhoneNumber;

class dequePhoneNumber : public DequeElementdequePhoneNumber
{
  public:

        dequePhoneNumber(NPFElement* pParent = 0) 
		: DequeElementdequePhoneNumber(pParent)
        {
                m_strName = "dequePhoneNumber";
                m_strType = "long";
        }

        dequePhoneNumber(bool bDum)
                {if (bDum) m_strName = "dequePhoneNumber";}

        NPFElement* Clone() {return new dequePhoneNumber();}

  protected:
  private:
};

#endif

==================================================

The other major difference is that the class is derived from an instantiation of the DequeElement template class, rather than NPFElement. DequeElement derives from NPFElement and contains a standard library deque that is its reason for being. It provides the important method:

      template<class T> class DequeElement
      {
       public:
       	...
	deque<T>& getDeque();
	...
      }

which returns a reference to the underlying deque, thus allowing easy programatic access to the container while still maintaining the NPFElement structure. DequeElement also over-rides NPFElement's addChild() method, allowing it to treat things contained by the deque as children. The XML generated for a DequeElement is also special -- a typical example for a deque of long values might look like:

	<dequePhoneNumber Type="long" Size="4">
		<V K="5551212"/>
		<V K="5552214"/>
		<V K="5552322"/>
		<V K="5554234"/>
	</dequeLong>

Now, there is no "V" element declared in the DTD, so to parse things like this it is necessary to generate a modified DTD, as will be described below. The content model for dequePhoneNumber is obviously not elements of type "long", either, so the modified DTD has to include a new declaration for dequePhoneNumber as well.

There are three other types of container available: maps between values, maps between values and pointers, and deques of pointers. Pointer semantics are a bit different from value semantics, and the containers that deal with pointers reflect this. One important aspect that pointer containers have in common with ordinary elements is the question of whether or not they own their children. By default, all NPFElements and pointer containers do own their children, so when an element is deleted its sub-tree is deleted as well. There is a method in NPFElement:

void setOwnChildren(bool bOwnChildren = true);

that can be used to change the ownership policy so that children are not deleted.

Unlike ordinary pointer members, pointers in containers cannot be shared on serialization. That is, for ordinary pointer members the address as well as the value is written into the attribute so that when things are read back in shared pointers will continue to be shared. There are ways to mock this up on a class-by-class basis for pointers to containers, but for reasons of efficiency (always suspect, but in this case I think justified) it seemed overkill to do this.

Modified DTD

The DTD that is generated for our simple example is shown in Figure 3.7. It is similar to the original, but has several additions and modifications.

==================================================
Figure 3.7 -- Modified DTD Generated from Simple Example
==================================================

<!ELEMENT address EMPTY >
<!ATTLIST address
nApartment CDATA  #IMPLIED 
nBuilding CDATA  #IMPLIED 
strCity CDATA  #REQUIRED 
strCountry CDATA  "Canada"
strPostalCode CDATA  #REQUIRED 
strStreet CDATA  #REQUIRED >

<!ELEMENT dequePhoneNumber (V)*>
<!ATTLIST dequePhoneNumber
Type CDATA #FIXED "long"
Size CDATA #REQUIRED>

<!ELEMENT person (address , dequePhoneNumber) >
<!ATTLIST person
nAge CDATA  "-1"
strFirstName CDATA  
strLastName CDATA  #REQUIRED 
strMiddleInitial CDATA  >

<!ELEMENT V EMPTY>
<!ATTLIST V
K CDATA #REQUIRED
V CDATA #IMPLIED>

==================================================

The address and person elements are unchanged, although their ordering is different -- ordering within a DTD is irrelevant.

The important changes are that the content model and attributes of dequePhoneNumber are quite different, and the V-for-value element declaration has been added so the XML for deque's will parse. The V element also has a V attribute which is used for maps. The K attribute is the key in a map and the V attribute is the value. For deques, the K attribute is the value.

Summary

This chapter has covered the basic features of the NPF. There are a lot of them. The framework is intended to generate code that does quite a bit of useful work as well as embody a strong, intuitive representation of the application-as-document metaphor.

Much of this machinery can be used with no clear understanding of how it works, but my preference is to describe it in as much detail as possible without actually going over the code line by line. There are many features that have been passed over lightly in this chapter, and in the several chapters they will be introduced in more detail. Although they are referred to as "advanced features" don't let that put you off -- they are mostly focused on making the framework easier to use, as well as more powerful.