Apps  Contact  Seminars 

Denormalization – Database and XML documents

by Amrinder Arora
April 18th, 2008

Denormalizing is a process in which hierarchical data is flattened, and data elements common to the records are copied across those records.

Denormalizing a database
The term “Denormalization” is usually used in the context of a database. For example, consider a database consisting of “Project” and “Phase” tables. Let us assume that there exist two projects, each with 4 phases.

In a normalized structure, Phase table has an FK into the Project table, and the information for project (such as the project name) is stored in the project table (the parent table), and is referred from the Phase table (the child table). It kinda looks like this:

Project table Phase table
ID Name
1 New OS
2 New Browser

 

Project ID Phase ID Phase Name
1 1.a Market Research
1 1.b Design
2 2.1 Design
….Other phases…

 

In a denormalized, or a flat structure, both tables can be morphed into one table, say ProjectPhase, and that has information about both Projects and Phases. In such a structure, the project name would be copied at each record corresponding to each phase of the project.

Project ID Project Name Phase ID Phase Name
1 New OS 1.a Market Research
1 New OS 1.b Design
2 New Browser 2.1 Design
….Other phases…


Denormalizing an XML document using XSLT

Ok, now that we have a lengthy discussion on denormalization behind us, let us look at XML denormalization. XML denormalization works, the same way, and hierarchy XML structure can be denormalized into a “flatter” XML document.

Consider this hierarchical (normalized) XML file:

<AllProjects>
<project project_id=”2″>
<name>Brand New Browser</name>
<phase id=”2.A”>
<desc>Firefox Cleanup</desc>
<task priority=”1″><resource>M</resource></task>
<task priority=”2″><resource>M</resource></task>
</phase>
<phase id=”2.B”>
<desc>Rebranding</desc>
<task priority=”1″><resource>B007</resource></task>
<task priority=”2″><resource>SC</resource></task>
</phase>
</project>
………..
</AllProjects>

Consider this corresponding denormalized XML file:

<AllTasks>
<task>
<projectid>2</projectid>
<projectname>Brand New Browser</projectname>
<phaseid>2.A</phaseid>
<phasedesc>Firefox Cleanup</phasedesc>
<priority>1</priority>
<resource>M</resource>
</task>
<task>
<projectid>2</projectid>
<projectname>Brand New Browser</projectname>
<phaseid>2.A</phaseid>
<phasedesc>Firefox Cleanup</phasedesc>
<priority>2</priority>
<resource>M</resource>
</task>

<AllTasks>

This denormalization can be done very easily using an XSLT. The following XSLT conveys some concepts used in this:

<?xml version = “1.0” encoding = “UTF-8″?>
<xsl:transform xmlns:xsl = “http://www.w3.org/1999/XSL/Transform” version = “1.0”>
<xsl:template match = “AllProjects”>
<AllTasks>
<xsl:apply-templates select = “project/phase/task”/>
</AllTasks>
</xsl:template>
<xsl:template match = “task”>
<xsl:element name = “task”>
<projectid><xsl:value-of select=”../../@project_id”/></projectid>
<projectname><xsl:value-of select=”../../name”/></projectname>
<phaseid><xsl:value-of select=”../@id”/></phaseid>
<phasedesc><xsl:value-of select=”../desc”/></phasedesc>
<priority><xsl:value-of select=”@priority”/></priority>
<resource><xsl:value-of select=”resource”/></resource>
</xsl:element>
</xsl:template>
</xsl:transform>

Enter Comments