MISMO Myth Buster: We cannot use MISMO Because…
Introduction
“We cannot use MISMO because we are modern people and prefer Jason. XML is so year 2000.”
“We cannot use MISMO because we are people rooted in proven traditions of COBOL copybooks and DASD.”
“We cannot use MISMO because we are people who know and love database systems”
Over the next few blogs we will cover each of these variants of the myth in detail. Before we do the database reason we need some background
Background
MISMO publishes the Reference model (RM) as an XSD file
because in that form it is directly usable by the most MISMO constituents. It
could just as easily be published in UML or the proprietary model format of
some modeling platform. We surveyed the largest organizations in the industry
and found no consensus on a modeling platform. Not all modeling platforms
accept UML. Most small organization had
only passed familiarity with UML if any.
So MISMO published its model as and XML Schema.
What is an RM? Here are the things that a reference model should do:
- Define data points giving them names, definitions and data types.
- Gather together data points that support a similar business concept using normalization rules.
- Detail the cardinality of the sets of data points ( Optional, Required, May Occur many times)
- Detail the relationship between business concepts.
The MISMO RM does all of this.
One thing that make the XML model a little less general than
say a relational model is that an XML model is a tree. Therefore the MISMO RM could be implemented
in any implementation platform that supports a tree structure. A tree structure is so traditional you can
build it in COBOL copy books on DASD. A tree structure is a proper subset of
the relational model so it can be implemented in an SQL relational database.
Jason can express tree structures.
The Database case
Suppose you we going to create a brand new database based on
the MISMO RM. For now we will accept that you want to do that and leave to a
later blog the topic of choosing to use the DB to store the XML or choosing to
shred the XML data points into columns of tables. The support for XML columns
in Oracle and MS SQL server and others is very good and faster than you
probably expect. But for now we will discuss the shredding approach.
Defining the logical level of a database implementation
involves these steps.
- Define data points giving them names, definitions and data types.
- In SQL this means writing DDL (Data Definition Language)
- Gather together data points that support a similar business concept using normalization rules.
- The DDL defines tables and the columns with their data types.
- All of the MISMO data types have SQL equivalent data types
- The MISMO RM is normalized. The XML container elements that contain data points can be viewed as tables.
- Relational model normalization requires that each row of a table have a primary key.
- If you must have natural keys create a composite key from the natural key for the table. This may take some figuring out.
- If you are OK with synthetic keys add an auto-increment column.
- An XSLT filter can be written to read the MISMO RM in XSD form and produce the DDL to create all the tables with synthetic keys. (Do not expect MISMO to publish this XSLT or DDL)
- Detail the cardinality of the sets of data points ( Optional, Required, May Occur many times)
- In the relational model foreign keys connect table.
- The MISMO RM XML Tree uses 1:1 and 1:many relationships by containment. These are expressed as contained elements that contain other containers.
- Build the foreign key DDL by following the XML container relationships.
- An XSLT can be written to read the MISMO RM XSD representation and produce the DDL.
- Detail the relationship between business concepts.
- As above the XML tree defined the relationships between tables.
Implementation specific Roadblocks and Refinements
Column Name Length
Depending on your choice of SQL database there may be some
additional work to do. MS SQL can
support the 80 character maximum column names in MISMO. Oracle has a smaller
limit. In the Oracle case you will need a two column file (XML or CSV does not
matter) where one column is the MISMO data point name and the other the Oracle
Column name you want. The XSLT will use that as a reference when making the
Oracle DDL.
Reused containers
The MISMO RM uses some container repeatedly in several
contexts. For example the ADDRESS container is used in 329 contexts in version
3.4 for example here are eight.
//DEAL/ASSETS/ASSET/ASSET_HOLDER/ADDRESS
//DEAL/ASSETS/ASSET/OWNED_PROPERTY/PROPERTY/ADDRESS
//DEAL/ASSETS/ASSET/OWNED_PROPERTY/PROPERTY/HOMEOWNERS_ASSOCIATIONS/HOMEOWNERS_ASSOCIATION/ADDRESS
//DEAL/ASSETS/ASSET/OWNED_PROPERTY/PROPERTY/PROPERTY_UNITS/PROPERTY_UNIT/ADDRESS
//DEAL/COLLATERALS/COLLATERAL/PLEDGED_ASSET/ASSET_HOLDER/ADDRESS
//DEAL/COLLATERALS/COLLATERAL/PLEDGED_ASSET/OWNED_PROPERTY/PROPERTY/ADDRESS
//DEAL/COLLATERALS/COLLATERAL/PLEDGED_ASSET/OWNED_PROPERTY/PROPERTY/HOMEOWNERS_ASSOCIATIONS/HOMEOWNERS_ASSOCIATION/ADDRESS
//DEAL/COLLATERALS/COLLATERAL/PLEDGED_ASSET/OWNED_PROPERTY/PROPERTY/PROPERTY_UNITS/PROPERTY_UNIT/ADDRESS
In a good database design you would not want 329 ADDRESS
tables. You would want one with 329
different foreign keyed tables pointing to rows in it. The DDL would need to be
further manipulated to accomplish this.
It is unclear whether this step could be automated. Probably.
Conclusion
The MISMO Reference model can be implemented in a relational
database of your choice.
Next time we will dive deeper into making a DB from the MISMO model.
No comments:
Post a Comment