Home > Dev Cycle > The Problem with XML

The Problem with XML

March 16th, 2009 Richard Luck

We’ve been working laboriously on a new connector type for Bluyah: XML.  On the surface, this seems like a very simple nut to crack - we’re already parsing RSS and ATOM feeds, both of which are XML-based.  But the truth of the matter is, it’s becoming more complex the deeper we dig into it.

The problem is in the XML format itself.  XML can be anything it’s author wants it to be.  It’s self-describing, sure.  But it’s not pre-defined.  In other words, there is nothing within the structure or definition itself to tell you that in the following snippet, the ‘row’ element should be treated as the rows of data to be reported upon, with the sub-elements to be treated as ‘column’ data:

<report>
    <head>
        <title>Fremont Pizza Parlors</title>
        <records_found>2</records>
        <metadata>
            <parlor length="32" source="name">Parlor</parlor>
            <street length="64" source="street">Address</street>
            <city length="64" source="city" />
            <state length="2" source="st" />
            <zip length="9" source="zip_code" />
            <phone length="10" source="telephone" />
        </metadata>
    </head>
    <data>
        <row>
            <parlor>Piecora's New York Pizza</parlor>
            <street>1401 E Madison St</street>
            <city>Seattle</city>
            <state>WA</state>
            <zip>98102</zip>
            <phone>(206) 322-9411</phone>
        </row>
        <row>
            <parlor>Pagliacci Pizza</parlor>
            <street>426 Broadway E</street>
            <city>Seattle</city>
            <state>WA</state>
            <zip>98102</zip>
            <phone>(206) 726-1717</phone>
        </row>
    </data>
</report> 

As you can see from the above sample, the data we want to loop over for reporting is contained within the ‘data’ element and that the ‘row’ elements are aptly named ‘row’.  But what if they were named ‘places_i_like’?  And what if they were parallel to the ‘head’ element?  How could an application know this without human intervention?

Well … we think we have a way to ‘discover’ the likely candidates and will be introducing that feature in 1.0.2 (See the Product Roadmap for details on several upcoming features).  

A minimal amount of human intervention will be required - but we believe we have a way to drastically reduce the technical hurdles - enough so that the tool will be easily understandably by non-technical users.  

In addition, around release 1.1.0 we will be publishing the specifications for Bluyah’s Basic Reporting Syndication (”BRS”) format, which is based upon Bluyah’s own XML export format (see this xml export for an example).  At that time, the Bluyah application will allow you to create Connectors to BRS formatted data sources as well.

Keep your eyes on this space for more details as they become available.


Tweet This Share via Facebook Digg It! Add to Del.cio.us Add to Technorati Favorites Stumble It! Email this Print Friendly

  1. No comments yet.
Comments are closed.