MISMO Myth Busted: We can use MISMO as a design for a DB
Last time I wrote “An XSLT filter can be written to read the
MISMO RM in XSD (XML Schema Definition) form and produce the DDL(Data
Definition Language, part of SQL[Structured Query Language] to create all the
tables with synthetic keys. (Do not expect MISMO to publish this XSLT or DDL).
I got thinking how hard could it be?
DDL Zip is a link to Google Doc location that has the XSLT set that created MS SQL Server DDL
- MakeDDLPart1.xslt is an XSLT 2.0 file that reads the MISMO 3.4 combined schema and writes MS SQL DDL to produce a table for each data carrying container element in MISMO.
- MakeTablesetPart1.sql is an SQL script produced by MakeDDLPart1.xslt that creates a new database with a table column for each MISMO data point. Each table has an identity column named IDNum that is the primary key, and a column ParentIDNum use to foreign key to its parent table.
- MakeDDLPart2.xslt is an XSLT 2.0 file that reads the MISMO 3.4 combined schema and writes MS SQL DDL to produce a table for each non-data carrying container element in MISMO.
- MakeTablesetPart2.sql produced by MakeDDLPart2.xslt creates a table for “Container” element that has other container elements as its children. Each table has an identity column named IDNum that is the primary key, and a column ParentIDNum use to foreign key to its parent table.
- MakeDDLPart3.xslt is an XSLT 2.0 file that reads the MISMO 3.4 combined schema and writes MS SQL DDL to produce a foreign key between the parent and child elements of the MISMO Model
- MakeTablesetPart3.sql produced by MakeDDLPart3.xslt creates a foreign key from each child container table to its parent table.
I am always impressed with how much you can get done in a
few lines of XSLT (not as much and you could with a line of APL[1] but that was
long long ago). The total length of file MakeDDLPart1.xslt is 317 lines. There is a block of 195 lines the spit out
the test to drop and create the database if needed. And 76 lines are used to
pick which SQL data type to use for each MISMO data type.
The resulting DDL files have been tested on SQL 2014
DB.
As I have written last week the MISMO reference model is a
tree. A tree structure is a proper
subset of the Relational Model implemented by modern DB systems. MISMO is a tree because XML is a tree and XML
was selected to be the format for delivering data from one trading partner to
another. In other words the XML
represents data in motion. We are
building a DB and therefore it is for data at rest.
Because the Reference Model is a tree the relational model
will also be a tree. In the XML there are only two kinds of relationships
one:one and one:many. This makes the foreign key creation easy. Every child has
exactly one parent. But a parent may have many children.
Let’s work through an example on the BORROWER concept. We have created automatically a table named BORROWER_DETAIL. In the XML BORROWER_DETAIL
is a child of BORROWER. We need a table named BORROWER that is
the parent of BORROWER_DETAIL.
There is a 1:1 relationship between BORROWER and BORROWER_DETAIL.
So BORROWER_DETAIL
needs a non-Null column named ParentIDNum [bigint] that is foreign keyed to the
BORROWER table IDNum
primary key.
A borrower may have many aliases. The table
ALIASES also has a foreign key from its ParentIDNum to the primary key of the
BORROWER table IDNum primary key. The table ALIAS has a foreign key making a
1:many relationship, back to ALIASES.
Some would say that the ALIASES table serves no
purpose in the data base, and they would be right. One could just have easily that made a
foreign key from the ALIAS to BORROWER.
However, the advantages of following the MISMO
RM in constructing the DB are:
- Table and key generation is an algorithm applied to the data of the reference model.
- Shredding XML into columns then also become algorithmic.
- All the information needed to produce MISMO XML is contained in the DB model
There are a couple of wrinkles handled in the
XSLT scripts
EXTENSION data
Every container element in MISMO has an
Extension element. We do not need
extension tables. When receiving MISMO
XML and shredding it into columns SQL Server has a mechanism for saving
anything received that is not in the schema. We will use that mechanism for
extensions.
Table and key names.
In most cases there is a complex type with a
name equal to the element name it defined.
In some cases we want more than one element to have the same
structure. For example the collateral
asset SUBJECT_PROPERTY is the same structure as for other real estate that the
borrower owns and the same for comparable properties used in the appraisal.
Elements or Attributes
Most data points are represented as elements in
XML. We handle the few cases that are
attributes.
R References
[]
[]
Great article! Works like a charm to get the DB setup. How do you recommend importing the actual data into these generated tables?
ReplyDelete