artima - Article Discussion - Flat View

Article Discussion

An Introduction to XML Data Binding in C++

View Threaded

Summary: XML processing has become a common task that many C++ application developers have to deal with. Using low-level XML access APIs such as DOM and SAX is tedious and error-prone, especially for large XML vocabularies. XML Data Binding is a new alternative which automates much of the task by presenting the information stored in XML as a statically-typed, vocabulary-specific object model. This article introduces XML Data Binding and shows how it can simplify XML processing in C++.

21 posts on 2 pages.

« Previous 1 2 Next »

The ability to add new comments in this discussion is temporarily disabled.

Most recent reply: February 11, 2010 9:43 PM by Brian

Boris

Posts: 6 / Nickname: boris / Registered: May 10, 2006 8:23 PM

Re: An Introduction to XML Data Binding in C++

May 8, 2007 0:48 AM

Pointers and ownership are independent of each other. XML is a hierarchical format, i.e. child nodes have only meaning in reference to (in context of) a parent node (except for the document node). Therefore it's quite 'natural' to let the parent nodes own their child nodes (the parent as 'container' for the child-ren). Of course, parent nodes may give access to their child nodes via pointers or iterators. But the lifetime of the child nodes can, and IMO should, be bound to the lifetime of the parents.

I don't understand what the original issue was then. The document tree is dynamically-allocated and returned as a pointer (wrapped into auto_ptr). Sub-nodes are owned by the tree and are returned as references.

James

Posts: 128 / Nickname: watson / Registered: September 7, 2005 3:37 AM

Re: An Introduction to XML Data Binding in C++

May 8, 2007 8:05 AM

> I guess you can define 'brittle code' any way you like
> but what I mean by brittle is that insignificant changes
> cause the code to break.
>
> On the other hand, in the data binding approach, the
> client code that breaks as a result of a change will be
> flagged by the C++ compiler thanks to static typing. In
> case of DOM or your XPath-based manual mapping approach,
> with every change to your XML vocabulary you are left
> wondering (or guessing) whether the change was
> insignificant or the code is now silently broken.

This is true to some degree. We used thorough testing to make sure this did not happen. Specifically, automatic regression scripts. The reality is that the above is not nearly enough to guarantee correctness and regression testing is needed anyway.

Generally speaking, if you are working with standard schemata, they don't change. You might have a new version of the schema that may or may not require coding changes, coding changes that require remapping or just mapping changes but the standard schemata should be fairly stable. The problem was that we had a canonical schema that would have to change in addition to the code and mapping changes. The approach I am describing removes this canonical schema completely and instead uses the class as the canonical structure.

> To make it clearer what I am talking about. We'd have
> say 6 different standardized schemata for a purchase order
> that we had to support with new ones added over time. In
> order to avoid generating 6 sets of classes and writing
> thousands of lines of Java to place those orders, we
> created a canonical schema for a purchase order. Then we
> took this and generated classes from it. Then we had about
> 1000 or so lines of code to put that canonical data into
> stable business Objects. Then we had another 1000 or so
> lines of code to write the data from the usable java
> Object back into the JAXB Object.
>
> There is a much cleaner way to implement this in XSD (I
> don't know about JAXB). The idea is to define a base type
> for all purchase orders in XML Schema. This type can be
> empty or it can contain some common elements/attributes.
> Then you define your purchase orders as extensions of this
> base type. When compiling the schema to C++, you customize
> the base class by adding virtual functions that will
> constitute the interface to all the purchase orders. Then
> you customize the concrete purchase orders by implementing
> those virtual functions. The application code manipulates
> all purchase orders via the customized base class. This
> s approach is also a lot more efficient than XPath-based
> remapping.

I'm not sure I understand what you are describing here but the situation I mean is that you have many different external purchase order schemata. You have a half-dozen to a dozen RosettaNet layouts, you have a number of web service layouts, some custom layouts for important customers and them you might even have non-XML formats such as a number of EDI formats. All the data elements across these formats map into a single true set valid data elements for a purchase orders. The different layouts can be dramatically different. For example, an element that is a child in one could be a parent in another. The names of the elements and their paths are pretty much guaranteed to be different. Mapping the data from all these different formats into a canonical form with Xpath is trivial. Trying to do this with JAXB generated classes is not feasible without a lot of code. Perhaps XSD can map wildly different formats into the same class structures but I imagine that would require some sort of mapping syntax which is not unlike the approach I am describing. Also you can use similar techniques to map EDI formats or any other formats to the same classes and keep order processing logic from being duplicated across the system.

Ray

Posts: 2 / Nickname: lisch / Registered: May 7, 2007 3:35 AM

Re: An Introduction to XML Data Binding in C++

May 9, 2007 5:51 AM

Okay, "simple" was the wrong word.

Some situations require pointers. Some don't. If I were implementing the code manually, I would use pointers only where they were necessary. On the other hand, I don't mind that Code Synthesis uses a uniform mechanism for all child elements. I don't expect a tool to generate code that is identical to the code I would write manually.

Substitution groups and xsi:type require polymorphism, and therefore pointers (or references). Optional elements map cleanly to pointers. Some schemas describe cyclic data, and child elements must be pointers.

John

Posts: 2 / Nickname: jtorjo / Registered: September 24, 2007 0:18 AM

Re: An Introduction to XML Data Binding in C++

September 24, 2007 10:05 PM

template <typename gender_ret_t>
class gender_t: public xml_schema::parser<gender_ret_t>
{
public:
  // Parser hooks.
  //
  virtual void pre ();
  virtual void _characters (const string&);
  virtual gender_ret_t post ();

private:
  ...
};

Why does gender_t need to know about this? It would seem more natural to have different classes which actually hold the logic for filtering which data you need. The code would be way more flexible, and simpler to read...

--
http://John.Torjo.com -- C++ expert
... call me only if you want things done right

Pete

Posts: 1 / Nickname: peteco / Registered: November 28, 2008 8:23 PM

Re: An Introduction to XML Data Binding in C++

November 29, 2008 2:43 AM

FYI - We have a similar C++ XML data binding product called Codalogic LMX. You can find out more at http://codalogic.com/lmx/ .

Brian

Posts: 1 / Nickname: aberle / Registered: February 11, 2010 3:05 PM

Re: An Introduction to XML Data Binding in C++

February 11, 2010 9:43 PM

This is an age old problem. I was the team lead of the development team with the most critical path dependencies on the largest software project in the world during 1999 and 2000 and this very issue wss the focus of my work during that time. I am convinced that the wheel was invented by multiple engineers who were unaware that others had already invented it. The same is true of XML Data binding in C++. I invented it too, and I've been perfecting it for over 10 years on various projects. I have a solution that addresses the issues noted here and some additional issues that repeatedly arise:

1. XML Updates. This is the ability to re-apply a subset of XML into an existing object model. In many cases the XML is bound to indexed objects and we cannot afford to re-index for each update.

2. COM and CORBA interface management. In the same respect that the XML Data Binding can be automated through object oriented practices - so can the instances of interface objects that provide that data to the application layer.

3. State Tracking. The application often needs to distinguish between an empty value <String></String> vs. a missing value - both create an empty string. This provides the validation along with Data Binding.

The source code uses the least restrictive license - less so that GPL. The project is supported and managed from here:

http://www.codeproject.com/KB/XML/XMLFoundation.aspx

Now that it's the year 2010, I believe that nobody else will attempt to reinvent the wheel because there are a few to choose from. IMHO - this wheel is the most polished and well rounded implementation available.

Enjoy.

21 posts on 2 pages.

« Previous 1 2 Next »