Monday 29 October 2012

Schema-less or Schema-more?

If you are an advocate schema-less design, this post isn't for you. Someone will come along later and tidy up your mess, not to worry. If however you are the one trying to assemble a fragmented patchwork of conflicting information by unravelling a granny-knot of select distincts and full outer joins, fan traps, chasm traps, and traps that defy description, this might just help.

The rules for laying out a data model are not complicated. I'm not really sure why they aren't more widely followed, they enable the rapid acquisition of so much insight that you will quickly feel like you have an unfair advantage over others who do not approach data modelling or analysis this way.

The basic idea is that cardinality flows down the page, and process flows across it.

The rules are as below. They are pretty dry, but so you'll probably want to flick back to them. In my next post, I'll illustrate how they work and what you can learn from this visual language (the answer is - A LOT!)
  1. Detail flows down- 1:n parent-child relationships go vertically STRAIGHT down, with the parent at the top. If you find they are not straight, make the entities wider!
  2. Process flows left to right - 1:1 relationships or 1:n relationships that result from a state transition or high level process are horizontally STRAIGHT across, with the parent to the left. If they start zig-zagging, make the entities taller!
  3. One subject area per page - As a rule of thumb thats between about 5 and 35 entities. If you need to make lines cross a lot, you are either looking at duplicated relationships, or multiple subject areas on a single diagram. 
  4. Organise the entities into tiers - The tiers go down the page. I use Coad's UML entity types (Description, Party/Place/Thing, Role, Moment), expressed in more human-friendly language (Reference data,  Master Data, Agreements or Roles, Transactions)
  5. Represent Subtypes with the Barker notation 
    • Use nested boxes for strongly-typed subtypes (when subtypes fall into different entities). 
    • Don't USUALLY represent weakly-typed subtypes on the diagram (i.e. when subtypes are distinguished by a SUBTYPE attribute rather by an explictly declared subtype).
    • Strongly typed subtypes are mutually exclusive (i.e. an entity can't be two things at once).
    • Make the subtypes complete (a subtype of "Other" draws attention to an area of incomplete understanding).