Monday, May 9, 2016

MISMO Myth Busted: We can use MISMO as a design for a DB

MISMO Myth Busted: We can use MISMO as a design for a DB

Last time I wrote “An XSLT filter can be written to read the MISMO RM in XSD (XML Schema Definition) form and produce the DDL(Data Definition Language, part of SQL[Structured Query Language] to create all the tables with synthetic keys. (Do not expect MISMO to publish this XSLT or DDL). I got thinking how hard could it be?

DDL Zip is a link to Google Doc location that has the XSLT set that created MS SQL Server DDL

  •     MakeDDLPart1.xslt is an XSLT 2.0 file that reads the MISMO 3.4 combined schema and writes MS SQL DDL to produce a table for each data carrying container element in MISMO.
  •          MakeTablesetPart1.sql is an SQL script produced by MakeDDLPart1.xslt that creates a new database with a table column for each MISMO data point. Each table has an identity column named IDNum that is the primary key, and a column ParentIDNum use to foreign key to its parent table.
  •          MakeDDLPart2.xslt is an XSLT 2.0 file that reads the MISMO 3.4 combined schema and writes MS SQL DDL to produce a table for each non-data carrying container element in MISMO.
  •          MakeTablesetPart2.sql produced by MakeDDLPart2.xslt creates a table for “Container” element that has other container elements as its children. Each table has an identity column named IDNum that is the primary key, and a column ParentIDNum use to foreign key to its parent table.
  •          MakeDDLPart3.xslt is an XSLT 2.0 file that reads the MISMO 3.4 combined schema and writes MS SQL DDL to produce a foreign key between the parent and child elements of the MISMO Model
  •      MakeTablesetPart3.sql produced by MakeDDLPart3.xslt creates a foreign key from each child container table to its parent table.

I am always impressed with how much you can get done in a few lines of XSLT (not as much and you could with a line of APL[1] but that was long long ago). The total length of file MakeDDLPart1.xslt is 317 lines.  There is a block of 195 lines the spit out the test to drop and create the database if needed. And 76 lines are used to pick which SQL data type to use for each MISMO data type.

The resulting DDL files have been tested on SQL 2014 DB. 

As I have written last week the MISMO reference model is a tree.  A tree structure is a proper subset of the Relational Model implemented by modern DB systems.  MISMO is a tree because XML is a tree and XML was selected to be the format for delivering data from one trading partner to another.  In other words the XML represents data in motion.  We are building a DB and therefore it is for data at rest.

Because the Reference Model is a tree the relational model will also be a tree. In the XML there are only two kinds of relationships one:one and one:many. This makes the foreign key creation easy. Every child has exactly one parent. But a parent may have many children.

Let’s work through an example on the BORROWER concept.  We have created automatically a table named BORROWER_DETAIL. In the XML BORROWER_DETAIL is a child of BORROWER.  We need a table named BORROWER that is the parent of BORROWER_DETAIL. There is a 1:1 relationship between BORROWER and BORROWER_DETAIL. So BORROWER_DETAIL needs a non-Null column named ParentIDNum [bigint] that is foreign keyed to the BORROWER table IDNum primary key.

A borrower may have many aliases. The table ALIASES also has a foreign key from its ParentIDNum to the primary key of the BORROWER table IDNum primary key. The table ALIAS has a foreign key making a 1:many relationship, back to ALIASES.

Some would say that the ALIASES table serves no purpose in the data base, and they would be right.  One could just have easily that made a foreign key from the ALIAS to BORROWER. 
However, the advantages of following the MISMO RM in constructing the DB are:
  •       Table and key generation is an algorithm applied to the data of the reference model.
  •          Shredding XML into columns then also become algorithmic.
  •          All the information needed to produce MISMO XML is contained in the DB model

There are a couple of wrinkles handled in the XSLT scripts

EXTENSION data

Every container element in MISMO has an Extension element.  We do not need extension tables.  When receiving MISMO XML and shredding it into columns SQL Server has a mechanism for saving anything received that is not in the schema. We will use that mechanism for extensions.

Table and key names.

In most cases there is a complex type with a name equal to the element name it defined.  In some cases we want more than one element to have the same structure.  For example the collateral asset SUBJECT_PROPERTY is the same structure as for other real estate that the borrower owns and the same for comparable properties used in the appraisal.

Elements or Attributes

Most data points are represented as elements in XML.  We handle the few cases that are attributes.

R References 
  1. [1[ https://en.wikipedia.org/wiki/APL_(programming_language)

[]
[]

1 comment:

  1. Great article! Works like a charm to get the DB setup. How do you recommend importing the actual data into these generated tables?

    ReplyDelete