Friday, October 5, 2018

Blockchain and MISMO v3.x Document

Compare and contrast

Introduction


There is a lot of hype about Blockchain. As a result it is difficult to get a grasp on how much of it is “poetic license” and how much real. It is possible to understand Blockchain by looking at how it is constructed.

When Blockchain is used for a cryptocurrency there are a collection of features that do not apply to using Blockchain to hold a time ordered immutable registry, as would be used to manage non-currency assets like mortgage collateral ownership, default process registry, loan construction registry.
We do not need:
  • Voting algorithms – Our registries will be shared with trusted clients.
  • Solving complex cryptographic puzzles – The primary reason the cryptocurrency transaction rate is so slow is they want to beat the double spend problem by purposefully delaying adding links to the chain to be sure that the double spend attempt is detected. The transactions are slowed by requiring certain aspects of the hash value.
  • The complexities of running a distributed database – Our in-house registries may need to reside on a few parallel platforms but that is a lot easier to solve than managing tens of thousands of copies.

Without these features what does Blockchain become? A linked list where the hash value of the previous link is in the block being constructed.

Blockchain is not new. It is the oldest technology on the planet. In fact right now inside your cells DNA strands are being used to create the various proteins needed for your biochemistry to work. DNA is a linked list of protein construction instructions with error correcting hash values that the RNA replication process uses to make sure the construction is correct. 

Blockchain, at its core, is not at all new. Therefore, it should not be a surprise that this core idea can be found elsewhere in technology. One place is the MISMO document structure.

The Basics

Linked List

Linked list is one of the basic data structures in software engineering. Every developer that has taken a data structures class has done a homework assignment with linked lists. Linked lists can be implemented in machine language and in every programming abstraction layer and language that sits inside the computer. Files on a disk are a linked list of disk blocks. Memory allocation in an operating systems uses linked lists.



Hash Function


Hash functions have been around almost as long as linked lists. The idea of a hash function is to have a simple algorithm that takes an arbitrary sequence of bits and returns a value of known length.  Different algorithms have different properties. For example there is the Soundex function converts a last name into a number where similar names are numbers that are close together.


For cryptographic hashing we want just the opposite behavior we want two strings of input that differ by only one bit to be arbitrarily different.  It is easy to understand why.  If you knew Alice’s Soundex hash number you could repeatedly guess her last name until you found one with the same Soundex value. Cryptographic hash functions must be irreversible. The Soundex function is reversible.
Many times a hash value is published with a file. Alice downloads a file containing an installer for some new software. The publisher of the software also publishes the hash value. Alice recalculates the hash after down loading. If the values match Alice knows that the copy she has matches bit for bit with the original file.

Signing the Hash

You can use encryption to authenticate the source of a message. For example Alice wants Bob to know that a PDF file came from her. She sends Bob the file and the hash of the file. However this time she encrypts the hash with her private key and sends that to Bob. Now Bob performs these steps.
  • Decrypts the hash with Alice’s public key.
  • Generates the hash value of the file.
If the hash values match then Bob now knows these things:
  •  Bob’s copy of the file matches bit for bit with the file Alice sent. If it did not the hash would not match.
  • The file came from Alice. Only Alice’s public key can decrypt the hash value that she encrypted with her private key. Since the decrypted hash value matched the hash of the document Bob knows the hash came from Alice.

The Chain and the Onion

Now we will combine these tools to solve specific problems. First we will construct a simple block chain where Alice agrees to sell Bob a file and Bob pays Alice for the file. Then we will tackle a use case about documents.

The Chain

In Block 1 Alice places the URL of the file she is selling to Bob along with the hash of the file. Both are encrypted with first her private key then with Bob’s public key. No one except Bob and Alice can access the file because the URL is encrypted. Only Bob can decrypt the URL with his private key and then decrypt the file came from Alice. Bob does the same thing again with Alice’s public key. This proves to Bob that with the Hash value and proves the file is bit for bit the same one Alice sent.

Block 2 is the transaction where Bob pays Alice. The data of the transaction could be a cryptocurrency or wire transfer information. All encrypted by Bob’s private key then Alice’s public key.


In a real system that used Blockchain, all the hashing encrypting and decrypting would be taken care of and not done manually.
Now that Bob has the file, a document perhaps, how does he know it is authentic? He knows it has not changed since it left Alice.


What happened to it before it came to Alice’s care? For this we need to build a document structure with an internal audit trail.

The Onion

Most documents in the real estate finance space pass through many hands (systems) before they are preserved for posterity. A Blockchain is a good place to save it for posterity but the document itself can contain its own history. Here is a typical flow:

  • A document preparation company, or group in a large lender maintains the requirements for each kind of document in each jurisdiction. They do not want the document changed along the way, other that filling in form fields.
    • It is expensive to audit those documents post-closing.
    • It can be even more expensive to miss a change.
  • The lender fills in some form fields and send it on to the closing/escrow agent.
    • The lender does not want any changes to the facts they entered.
  • The closing/escrow agent needs to add certain data without disturbing what the lender entered.
    • Under “Know Before You Buy” (aka TRID) the Lender is responsible for the document. Lenders should sign off on last minute changes.
    • Post-closing audits of document changes can be expensive.
    • Missing a change can be even more expensive.
  • Many documents must be legally signed by borrowing customer.
    • In the interest of better efficiency many lenders are embracing electronic signatures.
  • On some documents the signature needs to be notarized.
    • More and more states are allowing electronic notarization.
  • Some documents must be recorded in the property records system of the local jurisdiction.
    • The identification information about the recordation needs to be placed in the document itself.
Each step along the way must be done without disturbing the document information collected so far. Each organization along the way faces huge risk if the data they added is changed further down the pipe. Open edit documents do not meet the needs of the trust profile we must have. PDF forms allow one party to enter data and sign a document but lack this assembly line feature. What to do?

UETA (Uniform Electronic Transactions Act )

The Federal Law that allows electronic transactions to have the same legal status as wet ink signed transactions is UETA. Every state either has an equivalent or similar law that allows enforceability of state level transactions. The Federal and most state laws define a safe harbor set of criteria that an electronic transaction must meet in order to obtain legal protections. One of those provisions is the ability to audit the document.

MISMO

In order to meet the safe harbor conditions of UETA MISMO and Fannie Mae created the SMARTDOC®™ technology (Securable, Manageable, Archive-able, Retrievable, Transferable DOCument). Any document in MISMO 3.x has these capabilities.
  • An Audit Trail of changes to the document (What changed, by whom, when)
  • The data presented on the document available to system processing
  • The document rendering views (PDF, HTML or SVG) as the document evolves
  • The signatures of the signatories
  • A set of system signatures

A future article will cover the other parts of SMARTDOC®™ for now we look at the system signatures collection.
Each signature uses the now familiar pattern:
  • A URL to the material being signed
    • URL (Uniform Resource Locator) has a form to point to places inside a document
  •  A cryptographic hash value of that content.
  • The encrypted value of the hash value

Example Onion

When a subset of a document is hashed then the hash value is encrypted we call that a “Tamper evident seal” Any single bit change in the scope of the hashing will cause the validation to fail aka breaks the seal.

  • A document preparation company, or group in a large lender creates their document forms in MISMO 3.x format with a built in system signature that wraps all the text and none of the data fields. This is their tamper evident seal.
    • It is inexpensive to audit those documents post-closing by testing the tamper evident seal.
  • The lender fills in some form fields and sends it on to the closing/escrow agent
    • The lender does not want any changes to the facts they entered so they add a system signature using their private key. The scope of their tamper evident seal includes the hash and signature of the doc prep system seal.
    • It is inexpensive to audit the lender added data post-closing by validating their tamper evident seal.
  • The closing/escrow agent needs to add certain data without disturbing what the lender entered.
    • The lenders system signature proves they signed off on the data provided to the closing/escrow agents.
    • The closing/escrow agents system adds a tamper evident seal for the scope of changes they added. It should be outside the scope of the lender signature.
    •  Post-closing audits of document is a simple validation step of the tamper evident seal are inexpensive.
  • Many documents must be legally signed by borrowing customer.
    • In the interest of better efficiency most lenders are embracing electronic signatures.
    • Adding a tamer evident seal that includes the borrower signatures is done after signing. The seal is crated using the private key of the closing agent.
  • Electronic Notarization can be added and sealed
  • Some documents must be recorded in the property records system of the local jurisdiction.
    • The recording system or the system that receives the recording identifiers adds a tamper evident seal and includes in its scope the location of the identifiers and the hash of the previous seal.
  • The identification information about the recordation needs to be placed in the document itself.

Conclusion

We have seen that where an immutable serial record is needed a simple linked list that includes the hash of the previous link in its scope and the signatures of hashes can be used. These same tools, Linking, Hashing and Encryption can be combined into many designs that provide immutability or tamper evident to promote the security of electronic records.

Thursday, May 11, 2017

Extensions, using UCD as an example

UCD Extensions

Introduction

UCD (Uniform Closing Dataset) is one of the big Real Estate Finance changes for 2017-2018. In it Fannie Mae [1] and Freddie Mac [2] (The GSEs) will require delivery of the data set for all loans to be underwritten and guaranteed by them. This is a big deal because it represents about 60% of the current loan volume coming from Mortgage Banks and non-banks alike.

The basic design for UCD is the MISMO 3.3 release
[3]. Unfortunately, timing prevented the GSEs from giving MISMO all the data they wanted to use in UCD in time for the 3.3 publication. As a solution the GSEs use extensions in the http://www.datamodelextension.org  namespace. The specifications request the use of the “gse” abbreviation for  http://www.datamodelextension.org .

Validating Extension Data

Example files provided by the GSEs will validate with respect to the MISMO 3.3 release because the content of the element need only be valid XML to pass the tests. MISMO as delivered does not validate inside other peoples namespace based extensions.

The GSEs have not published and probably will not publish a schema that will validate the content “gse” extensions. In this blog we attempt to construct an unofficial schema for that purpose.  It is intended to give interested parties a leg up on how to do this. We do no intend to keep it current with GSE changes. It is released AS IS to promote understanding.

There are four approaches to building extensions that can be used to validate XML files that contain the extensions:

1. Best Practice – After version 3.5 use the “redefines” method to add restrictions to the unique definitions of OTHER under each extension point. This is the best practice because you can add your extensions without editing the MISMO delivered files. (Actually you need to be using a a catalog to achieve no editing.)
2. Acceptable practice – Before version 3.5 edit the release files to create unique definitions of OTHER then use the redefine method. It is acceptable because it is limiting the files that belong to MISMO ( their copyright) that you are editing.
3. OK practice – Edit the file that contains the Complex Type extensions ( e.g. file MISMOComplexTypeExtensionsB299 in version 3.3) to add in the bridge to your namespace. Like number 2 above it limits what you edit.
4. Poor practice – Edit anything you want. This is poor because it makes almost none of your work reusable when the time comes to upgrade to a later version. I see people do this who want to start with the “Combined” schemas MISMO publishes. I would suggest using one of the other methods and then create a Combined schema of your own with a tool that “flattens” a schema.[5] 

As part of this blog I have created the files needed to use the Acceptable Practice mentioned above.

     Acceptable practice

 
In order to keep the UCD file names unique I copied and renamed the MISMO files then edited them. Here is a list of the changes made to the MISMO published 3.3 files. These files are found at [4]

MISMO Name
UCD name
Description
Change
MISMO_3.3.0_B299
MISMO_3.3.0_B299_UCD
Root schema
1)Change xs:include to use the UCD files. Add xs:import to “gse” definitions.
MISMOComplexTypeExtensionsB299
MISMOComplexTypeExtensionsB299_UCD
Defines the OTHER element for each extension point
1)Change xs:include to use the UCD files.
2)Changes OTHER definition to have unique definition 3) Add xs:include for the MISMOExtensionDetails_UCD file.
MISMOComplexTypesB299
MISMOComplexTypesB299_UCD
Defines container elements
1)Change xs:include to use the UCD files.
MISMODataTypesB299
MISMODataTypesB299_UCD
Defines data point data types
1)Change xs:include to use the UCD files.
MISMOEnumeratedTypesB299
MISMOEnumeratedTypesB299_UCD
Defines enumerate types
1)Change xs:include to use the UCD files.

MISMOExtensionDetails_UCD
Holds the new unique OTHER definitions.

gseUCDBridge.xsd
The MISMO namespace file that imports the GSE namespace material and redefines the OTHER definitions as needed.

gseUCD.xsd
Definitions of the gse namespace.

GSE Extensions

Overview

The GSEs have requested that the namespace http://www.datamodelextension.org be used with the abbreviation “gse”. (See unique ID 0.054  in spreadsheet appendix_i_uniform_closing_dataset.xslx Tab “appendix_i_uniform_closing_dataset” We will call the spreadsheet “the specifications”.)
If you are using an OASIS catalog [6] this will look up the schema file needed to support a namespace. If you are developing in .NET your application may want to have a custom “resolver” to accomplish the same thing. [7]

Since we will be developing and testing XSD files and XML example files outside of a custom application we will include the file location of the gse extension schema in the MISMO/UCD schemas using an xs:import expression. We will also be including the file location of the MISMO_3.3.0_B299_UCD.xsd file in the XML root element.

It is considered a security risk to allow the source of the XML you receive to specify what schema to use. Oasis Catalog implementations and .NET resolver implementation are set up to ignore those values. The XML editors I use are set up to use those values. The “As is” files provided with this article SHOULD NOT use the internal schema locations when they are run through your system.

Attributes

The GSE have used an attribute “gse:DisplayLabelText” on several elements. See Unique ID 10.354 in the specifications where it is defined for use on the “LiabilityType” element in LIABILITY_DETAIL.

The MISMO schema set as published allows attribute from any non-MISMO namespace to be used in the XML. Therefore, nothing specifically needs to be in our gse.xsd to support it.  However, it is beat practice to always define everything in your namespace. So we include a global attribute to define it. The specification wants a maximum length of “gse:DisplayLabelText”to be 150 characters. We create a simple type to define that restricted length and use it in the attribute definition.

Elements

There are two part to adding elements.
1. Change the definition of the EXTENSION/OTHER element to identify the GSE namespace element child of OTHER to be included. ( aka the bridge)
2.  Define the GSE namespace element in its file.

The Bridge

In section Validating Extension Data we decided to use the “Acceptable Practice approach. In it we redefine the unique OTHER definitions.

In Schema Sample 1 The Bridge we see how to build the bridge

Line 1 imports the GSE namespace definition

Line 2 begins the redefinition of the content of the new MISMOExtensionDetails_UCD.xsd file we created to match the technique from version 3.5.

Line 3 begins the redefinition of the LIABILITY_DETAIL_OTHER_BASE complex type.

Line 4 begins the declaration that LIABILITY_DETAIL_OTHER_BASE is complex, not simple (i.e a single fact)

Line 5 declares that we are building this new definition of LIABILITY_DETAIL_OTHER_BASE by restricting it. The name of the complex type in line 3 MUST match the name of the complex type being restricted.

Line 6 like the original the content of OTHER is a sequence. I sequence of just one item in this case.

Line 7 a reference to the IntegratedDisclosureSectionType element defined in the file gseUCD.xsd.

Line 8 end the sequence

Line 9 end the restriction

Line 10 end the complex content

Line 11 end the complex type

Line 12 end the redefine.

1     <xsd:import namespace="http://www.datamodelextension.org" schemaLocation="gseUCD.xsd"/>
2     <xsd:redefine schemaLocation="MISMOExtensionDetails_UCD.xsd">
3     <xsd:complexType name="LIABILITY_DETAIL_OTHER_BASE">
4       <xsd:complexContent>
5        <xsd:restriction base="LIABILITY_DETAIL_OTHER_BASE">
6         <xsd:sequence>
7          <xsd:element ref="gse:IntegratedDisclosureSectionType"/>
8         </xsd:sequence>
9        </xsd:restriction>
10      </xsd:complexContent>
11    </xsd:complexType>
12    </xsd:redefine>

When we add additional GSE data points the pattern will be the same as in Schema Sample 1 The Bridge. The additional complex type redefinitions will go lines 11 ad 12.

If you have selected any other approach from Validating Extension Data the pattern will be similar. For the “OK practice” approach I would move the LIABILITY_DETAIL_EXTENSION complex type to the top of the file. Edit the LIABILITY_DETAIL_EXTENSION complex type shown in Schema Sample 2 Before to look like Schema Sample 3 After.

Schema Sample 2 Before
1     <xsd:complexType name="LIABILITY_DETAIL_EXTENSION">
2     <xsd:sequence>
3       <xsd:element name="MISMO" type="MISMO_BASE" minOccurs="0"/>
4       <xsd:element name="OTHER" type="OTHER_BASE" minOccurs="0"/>
5     </xsd:sequence>
6     </xsd:complexType>

Schema Sample 3 After

1    <xsd:complexType name="LIABILITY_DETAIL_EXTENSION">
2     <xsd:sequence>
3       <xsd:element name="MISMO" type="MISMO_BASE" minOccurs="0"/>
4       <xsd:element name="OTHER" type="OTHER_BASE" minOccurs="0">
5        <xsd:complexType >
6         <xsd:sequence>
7          <xsd:element ref="gse:IntegratedDisclosureSectionType"/>
8         </xsd:sequence>
9        </xsd:complexType>
10     </xsd:element>
11    </xsd:sequence>
12   </xsd:complexType>

Enumerated Types
When a data point in the GSE namespace has a enumeration of valid values the gseUCD.xsd file should contain those definitions.
      <xsd:simpleType name="LateChargeEnum">
             <xsd:restriction base="xsd:string">
                   <xsd:enumeration value=""/>
                   <xsd:enumeration value="FlatDollarAmount"/>
                   <xsd:enumeration value="NoLateCharges"/>
                   <xsd:enumeration value="PercentageOfDelinquentInterest"/>
                   <xsd:enumeration value="PercentageOfNetPayment"/>
                   <xsd:enumeration value="PercentageOfPrincipalBalance"/>
                   <xsd:enumeration value="PercentageOfTotalPayment"/>
                   <xsd:enumeration value="PercentOfPrincipalAndInterest"/>
             </xsd:restriction>
      </xsd:simpleType>

References