cohesion institute
Data Warehouse
Architecture Best Practices
December 5, 2005
Speaker: R. Michael Pickering
President, Cohesion Systems Consulting Inc.
cohesion institute
Agenda




Introductions
Business Intelligence Background
Architecture Best Practices
Questions & Answers
October 3, 2015
DW Architecture Best Practices
2
cohesion institute
Data Warehouse
Architecture Best Practices
Introductions
cohesion institute
Presenter Biography

R. Michael Pickering
President and Chief Architect,
Cohesion Systems Consulting Inc.




over 8 years DW experience


previously, Managing Consultant, BI&W, Oracle
Consulting (Canada)
before that, Red Brick Systems, Inc.
Manulife Reinsurance, Bell Canada, USDA, Kraft
Foods, LCBO, Telecom Argentina, Nortel Networks,
Procter & Gamble, Bayer, Syncrude, OMoHLTC…
Mr. Pickering has had DW articles published
in The Handbook of Data Management
October 3, 2015
DW Architecture Best Practices
4
cohesion institute
Cohesion Systems Consulting

Provides DW and BI services,
specializing in:





Architecture & Implementation Consulting
Project Management
Databases, Appliances & Emerging
Technology
Training & Mentoring
Since inception in 2000, clients have
included Enbridge, CIBC, The Bank of
New York, Loyalty Management Group,
Canada Post Borderfree, Katz Group
October 3, 2015
DW Architecture Best Practices
5
cohesion institute
Audience Survey

By a show of hands, please indicate your
experience with:







normalization
dimensional modeling
operational data store
data consolidation
Extract Transform Load (ETL)
metadata architecture
DW appliances
October 3, 2015
DW Architecture Best Practices
6
cohesion institute
Data Warehouse
Architecture Best Practices
Business Intelligence
Background
cohesion institute
What is Business Intelligence?


A Data Warehouse is usually one
component of an overall business
intelligence solution
IT people may be tempted to think in
terms of products and technologies
BUT...
October 3, 2015
DW Architecture Best Practices
8
cohesion institute
Overarching Goal


October 3, 2015
The overarching goal of
business intelligence is to
provide the information
necessary to MANAGE a
business
This means providing
information in support of
management decision making,
which is why BI is also called
“Decision Support”
DW Architecture Best Practices
9
cohesion institute
BI is about “Data Abstraction”
Stages
(4)


wisdom
knowledge
information
data
audience for a data warehouse typically
considers higher slices of data abstraction
pyramid
lowest level of pyramid is too detailed &
unwieldy
October 3, 2015
DW Architecture Best Practices
10
cohesion institute
It’s Not Technology

Business Intelligence is about delivering
business value

provide tangible benefit by answering important
questions that can help the business to achieve its
strategic focus

Improving profitability



Reducing cost



Who are our five most profitable clients?
What are our least profitable products?
Who are our lowest cost suppliers?
Which materials incur highest spoilage costs?
Improving customer satisfaction

What factors may lead to lost customers?
October 3, 2015
DW Architecture Best Practices
11
cohesion institute
Business of BI


In some cases, legislation such as SarbanesOxley or Basel II makes some kind of BI
fundamental to doing business
Many leading companies use BI to achieve
competitive advantage

E.g. Walmart, Dell, Amazon.com, Kraft, American
Express, etc…
October 3, 2015
DW Architecture Best Practices
12
cohesion institute
Data Warehouse Architecture

architecture is about delivering
an elegant solution that meets
the solution requirements


this means really understanding the
problem
DW architecture is part art, part
science
October 3, 2015
DW Architecture Best Practices
13
cohesion institute
Good Architecture

‘It’s not easy to describe a good
design, but I’ll know it when I
see it’
October 3, 2015
DW Architecture Best Practices
14
cohesion institute
BI Architecture Requirements




must recognize change as a
constant
take incremental development
approach
existing applications must
continue to work
need to allow more data and
new types of data to be added
October 3, 2015
DW Architecture Best Practices
15
cohesion institute
End User Acceptance

understandability



understandability is in the eyes
of the beholder
want to hide the complexity
try to make it:


October 3, 2015
intuitive, obvious
visible, memorable
DW Architecture Best Practices
16
cohesion institute
End User Acceptance

performance



October 3, 2015
don’t want to interrupt
the thinking process
provide one click,
instantaneous access
warehouse must be
available, “production”
system
DW Architecture Best Practices
17
cohesion institute
Data Warehouse
Architecture Best Practices
Architecture
Best Practices
cohesion institute
High Level Architecture

remember the different “worlds”






on-line transaction processing (OLTP)
business intelligence systems (BIS)
users are different
data content is different
data structures are different
architecture & methodology must be
different
October 3, 2015
DW Architecture Best Practices
19
cohesion institute
Two Different Worlds

On-Line Transaction Processing

Entity Relational Data Model





created in 1960’s to address performance
issues with relational database
implementations
normalized to most efficiently get data in
divides the data into many discrete entities
many relationships between these entities
this approach was documented by C.J. Date in
An Introduction to Database Systems
October 3, 2015
DW Architecture Best Practices
20
cohesion institute
Two Different Worlds

Business Intelligence Systems

Dimensional Data Model




also called star schema
designed to easily get information out
fewer relationships than ERD, the only
table with multiple joins connecting to
other tables is the central table
developed in 1960’s by data service
providers, formalized by Ralph Kimball in
The Data Warehouse Toolkit
October 3, 2015
DW Architecture Best Practices
21
cohesion institute
Entity Relation Disadvantages





all tables look the same
people can’t visualize/remember
diagrams
software can’t navigate as schema
becomes too complex
business processes mixed together
many artificial keys created
October 3, 2015
DW Architecture Best Practices
22
cohesion institute
Dimensional Model
Advantages





simplicity
humans can navigate and remember
software can navigate
deterministically
business process explicitly separated
(Data Mart)
not so many keys (keys = # of
attendant tables)
October 3, 2015
DW Architecture Best Practices
23
cohesion institute
Best Practice #1

Use a data model that is optimized
for information retrieval



dimensional model
denormalized
hybrid approach
October 3, 2015
DW Architecture Best Practices
24
cohesion institute
Data Acquisition Processes

Extract Transform Load (ETL)



the process of unloading or copying data
from the source systems, transforming it
into the format and data model required in
the BI environment, and loading it to the DW
also, a software development tool for
building ETL processes (an ETL tool)
many production DWs use COBOL or other
general-purpose programming languages to
implement ETL
October 3, 2015
DW Architecture Best Practices
25
cohesion institute
Data Quality Assurance

data cleansing



the process of validating and enriching
the data as it is published to the DW
also, a software development tool for
building data cleansing processes (a data
cleansing tool)
many production DWs have only very
rudimentary data quality assurance
processes
October 3, 2015
DW Architecture Best Practices
26
cohesion institute
Data Acquisition & Cleansing

getting data loaded efficiently and
correctly is critical to the success of
your DW


implementation of data acquisition &
cleansing processes represents from 50
to 80% of effort on typical DW projects
inaccurate data content can be ‘the kiss
of death’ for user acceptance
October 3, 2015
DW Architecture Best Practices
27
cohesion institute
Best Practice #2

Carefully design the data
acquisition and cleansing processes
for your DW



Ensure the data is processed
efficiently and accurately
Consider acquiring ETL and Data
Cleansing tools
Use them well!
October 3, 2015
DW Architecture Best Practices
28
cohesion institute
Data Model


Already discussed the benefits of a
dimensional model
No matter whether dimensional
modeling or any other design
approach is used, the data model
must be documented
October 3, 2015
DW Architecture Best Practices
29
cohesion institute
Documenting the Data Model

The best practice is to use some kind of data
modeling tool







CA ERwin
Sybase PowerDesigner
Oracle Designer
IBM Rational Rose
Etc.
Different tools support different modeling notations,
but they are more or less equivalent anyway
Most tools allow sharing of their metadata with an
ETL tool
October 3, 2015
DW Architecture Best Practices
30
cohesion institute
Data Model Standards



data model standards appropriate for the
environment and tools chosen in your data
warehouse should be adopted
considerations should be given to data access
tool(s) and integration with overall enterprise
standards
standards must be documented and enforced
within the DW team


someone must ‘own’ the data model
to ensure a quality data model, all changes
should be reviewed thru some formal process
October 3, 2015
DW Architecture Best Practices
31
cohesion institute
Data Model Metadata




Business definitions should be recorded
for every field (unless they are technical
fields only)
Domain of data should be recorded
Sample values should be included
As more metadata is populated into the
modeling tool it becomes increasingly
important to be able to share this data
across ETL and Data Access tools
October 3, 2015
DW Architecture Best Practices
32
cohesion institute
Metadata Architecture


The strategy for sharing data model
and other metadata should be
formalized and documented
Metadata management tools should
be considered & the overall
metadata architecture should be
carefully planned
October 3, 2015
DW Architecture Best Practices
33
cohesion institute
Best Practice #3

Design a metadata architecture
that allows sharing of metadata
between components of your DW

consider metadata standards such as
OMG’s Common Warehouse
Metamodel (CWM)
October 3, 2015
DW Architecture Best Practices
34
cohesion institute
Alternative Architecture
Approaches




Bill Inmon: “Corporate Information
Factory”
Hub and Spoke philosophy
“JBOC” – just a bunch of cubes
Let it evolve naturally
October 3, 2015
DW Architecture Best Practices
35
cohesion institute
What We Want
(Architectural Principal)



In most cases, business and IT
agree that the data warehouse
should provide a ‘single version of
the truth’
Any approach that can result in
disparate data marts or cubes is
undesireable
This is known as data silos or…
October 3, 2015
DW Architecture Best Practices
36
cohesion institute
Enterprise DW Architecture


how to design an enterprise data
warehouse and ensure a ‘single
version of the truth’?
according to Kimball:



start with an overall data architecture
phase
use “Data Warehouse Bus” design to
integrate multiple data marts
use incremental approach by building one
data mart at a time
October 3, 2015
DW Architecture Best Practices
37
cohesion institute
Data Warehouse Bus
Architecture

named for the bus in a computer




standard interface that allows you to plug
in cdrom, disk drive, etc.
these peripherals work together smoothly
provides framework for data marts to
fit together
allows separate data marts to be
implemented by different groups, even
at different times
October 3, 2015
DW Architecture Best Practices
38
cohesion institute
Data Mart Definition

data mart is a complete subset of the
overall data warehouse



a single business process OR
a group of related business processes
think of a data mart as a collection of
related fact tables sharing conformed
dimensions, aka a ‘fact constellation’
October 3, 2015
DW Architecture Best Practices
39
cohesion institute
Designing The DW Bus


determine which dimensions will be
shared across multiple data marts
conform the shared dimensions



produce a master suite of shared
dimensions
determine which facts will be shared
across data marts
conform the facts

standardize the definitions of facts
October 3, 2015
DW Architecture Best Practices
40
cohesion institute
Dimension Granularity

conformed dimensions will usually
be granular



makes it easy to integrate with
various base level fact tables
easy to extend fact table by adding
new facts
no need to drop or reload fact tables,
and no keys have to be changed
October 3, 2015
DW Architecture Best Practices
41
cohesion institute
Conforming Dimensions



by adhering to standards, the separate
data marts can be plugged together
 e.g. customer, product, time
they can even share data usefully, for
example in a drill across report
ensures reports or queries from different
data marts share the same context
October 3, 2015
DW Architecture Best Practices
42
cohesion institute
Conforming Dimensions (cont’d)

accomplish this by adding any dimension
attribute(s) needed in any data mart(s)
to the standard dimension definition


attributes not needed everywhere can always be
ignored
typically harder to determine how to load
conformed dimensions than to design
them initially



need a single integrated ETL process
what is the SOR for each attribute?
how do we deal with attributes for which there is
more than one possible SOR?
October 3, 2015
DW Architecture Best Practices
43
cohesion institute
Conforming Facts


in an enterprise, some metrics may
not have the same generally
accepted definition across all
business units
conforming facts is generally a
bigger design challenge than
conforming dimensions

why?
October 3, 2015
DW Architecture Best Practices
44
cohesion institute
Conforming Facts - Benefits



ensures the constituent data marts can as
clearly as possible represent fact data
expressed on the same basis using consistent
definitions
ensures reports or queries from different data
marts share consistent content
success of an Enterprise DW hinges on
successfully conformed facts


any perceived inconsistencies in fact definitions across
data marts will generally be considered to be a DW
bug or data problem by users
if users don’t have full confidence in data quality they
may stop using the DW
October 3, 2015
DW Architecture Best Practices
45
cohesion institute
Data Consolidation


a current trend in BI/DW is ‘data
consolidation’
from a software vendor
perspective, it is tempting to
simplify this:

‘we can keep all the tables for all your
disparate applications in one physical
database’
October 3, 2015
DW Architecture Best Practices
46
cohesion institute
Data Integration


To truly achieve ‘a single version of
the truth’, must do more than simply
consolidating application databases
Must integrate data models and
establish common terms of reference
October 3, 2015
DW Architecture Best Practices
47
cohesion institute
Best Practice #4

Take an approach that consolidates
data into ‘a single version of the
truth’

Data Warehouse Bus


conformed dimensions & facts
OR?
October 3, 2015
DW Architecture Best Practices
48
cohesion institute
Operational Data Store (ODS)




a single point of integration for disparate
operational systems
contains integrated data at the most
detailed level (transactional)
may be loaded in ‘near real time’ or
periodically
can be used for centralized operational
reporting
October 3, 2015
DW Architecture Best Practices
49
cohesion institute
Role of an ODS in DW
Architecture


In the case where an ODS is a
necessary component of the overall
DW, it should be carefully integrated
into the overall architecture
Can also be used for:



Staging area
Master/reference data management
Etc…
October 3, 2015
DW Architecture Best Practices
50
cohesion institute
ODS Data Model

Not clear if any design approach for
an ODS data model has emerged
as a best practice




normalized
dimensional
denormalized/hybrid
any suggestions?
October 3, 2015
DW Architecture Best Practices
51
cohesion institute
Best Practice #5

Consider implementing an ODS only
when information retrieval requirements
are near the bottom of the data
abstraction pyramid and/or when there
are multiple operational sources that
need to be accessed



Must ensure that the data model is
integrated, not just consolidated
May consider 3NF data model
Avoid at all costs a ‘data dumping ground’
October 3, 2015
DW Architecture Best Practices
52
cohesion institute
Capacity Planning



DW workloads are typically very
demanding, especially for I/O capacity
Successful implementations tend to grow
very quickly, both in number of users
and data volume
Rules of thumb do exist for sizing the
hardware platform to provide adequate
initial performance

typically based on estimated ‘raw’ data size
of proposed database e.g. 100-150 Gb per
modern CPU
October 3, 2015
DW Architecture Best Practices
53
cohesion institute
SMP Server Scale Up



Scaling performance within a single SMP
server is referred to as ‘scale up’
Database benchmarks suggest Windows
scalability is near that of Linux
IBM claims near-linear scalability for Linux
(on commodity hardware) up to about 4
processors


Probably not cost effective to scale up Linux
much beyond 4 processors
IBM claims near-linear scalability for AIX
on POWER5 up to about 8 processors
October 3, 2015
DW Architecture Best Practices
54
cohesion institute
Scale Out



There is an increasing trend in IT to ‘scale out’
processing capacity by deploying many small,
commodity servers rather than a single large
SMP system
This strategy tends to work well for relatively
simple applications such as network or web
servers
For very complex workloads such as a data
warehouse, this strategy is much more difficult
to effectively implement

Especially so for the database server itself
October 3, 2015
DW Architecture Best Practices
55
cohesion institute
Scale Up vs. Scale Out


To obtain the total number of processors
required for the estimated DW workload,
must plan either to scale up or scale out
Both options are viable but, all other
things being equal, scaling up is less
disruptive to end users and requires less
work to implement


scaling up can offer lower hardware
investment, if practical
however, network bandwidth or latency
issues can limit effectiveness of parallelism
October 3, 2015
DW Architecture Best Practices
56
cohesion institute
Best Practice #6


Create a capacity plan for your BI
application & monitor it carefully
Consider future additional performance
demands




Establish standard performance benchmark
queries and regularly run them
Implement capacity monitoring tools
Build scalability into your architecture
May need to allow for scaling both up and
out!
October 3, 2015
DW Architecture Best Practices
57
cohesion institute
Open Source Affordability

Another emerging trend in IT generally is to
utilize Open Source software running on
commodity hardware



this is expected to offer lower total cost of ownership
certainly, GNU/Linux and other Open Source initiatives
do provide very good functionality and quality for
minimal cost
This trend also applies to BI & DW:


most traditional rdbms’s are now supported on Linux
however, open source rdbms’s lag behind on providing
good performance for DW queries
October 3, 2015
DW Architecture Best Practices
58
cohesion institute
DW Appliances


DW appliances, consisting of
packaged solutions providing all
required software and hardware, are
beginning to offer very promising
price/performance
production experience is limited so
far, so this is not yet a ‘best practice’
October 3, 2015
DW Architecture Best Practices
59
cohesion institute
Data Warehouse
Architecture Best Practices
Q&A
cohesion institute
cohesion systems consulting inc
the modern art
of data abstraction
Descargar

Data Warehouse Architecture Best Practices