Data Warehouse Lifecycle

Please download to get full document.

View again

of 10
5 views
PDF
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Document Description
Understanding The Data Warehouse Lifecycle CONTENTS Overview The Data Warehouse Lifecycle Model Today Conclusions Marc Demarest July 2006 WhereScape Software marc@wherescape.com Copyright © 2000-2007 by WhereScape Software All rights reserved. www.wherescape.com 1 3 8 Abstract Despite warnings made by W.H. Inmon and others at the outset of the data warehousing movement in the early 1990s, data warehousing practice for the past decade at least has been prefaced on the assumption that, o
Document Share
Documents Related
Document Tags
Document Transcript
  Understanding The DataWarehouse Lifecycle CONTENTS Overview 1>The Data Warehouse Lifecycle Model Today 3>Conclusions>   8 Marc DemarestJuly 2006WhereScape Softwaremarc@wherescape.comCopyright © 2000-2007 by WhereScape Software All rights reserved. www.wherescape.com   Abstract Despite warnings made by W.H. Inmon and others at the outsetof the data warehousing movement in the early 1990s, datawarehousing practice for the past decade at least has beenprefaced on the assumption that, once in production, datawarehouses and data marts were essentially static, froma design perspective, and that data warehouse changemanagement practices were fundamentally no differentthan those of other kinds of production systems.The pace of business change, combined with the ongoing searchfor competitive advantage through better decision-making ina climate characterized by commodity transactional systemsand (increasingly) commodity decision support infrastructure,underscores the extent to which an organization’s understandingof, and control over, the entirety of the data warehousing lifecyclemodel can mean the difference between competitive differentiationon the one hand, and millions of dollars in cost sunk in brittle dead-end data warehousing infrastructure on the other. Copyright© 2000-2007 by WhereScape Software. All rights reserved.This document may be distributed in its entirety with all elementsretained in their srcinal form without permission, but may not beexcerpted without the explicit written permission of WhereScapeSoftware.WhereScape® is a registered trademark of WhereScape Software.WhereScape24, WhereScape RED, WhereScape Administrator, REDRepository, Live Prototyping, Live Metadata, Rapid Deployment, ClosedLoop Enhancement, Pragmatic Data Warehousing and Pragmatic DataWarehousing Methodology are trademarks of WhereScape USA, Inc. Allother trade and service marks are property of their respective holders.  | page 1July 2006Copyright © 2000-2007 by WhereScape Software All rights reserved.www.wherescape.comUnderstanding The Data Warehouse Lifecycle Overview In Building The Data Warehouse, published in 1991, W.H. Inmon made theobservation that:The classical system development lifecycle (SDLC) does not work in the worldof the DSS analyst. The SDLC assumes that requirements are known at the startof the design (or at least can be discovered). However, in the world of the DSSanalyst, requirements are usually the last thing to be discovered in the DSSdevelopment lifecycle (p. 23). A t that time, Inmon advocated a data-drivenapproach to designing data warehouses,pointing out that (a) data warehouse analystsfrequently understood their requirements, and thedata available to them, only after they had theopportunity to perform various kinds of analysis onthat data, and (b) the traditional waterfall-orientedmodels of software development (particularly thoseenforced by high-end computer-aided softwareengineering, or CASE, tools) were unlikely toproduce workable data warehousing environments.One of the earliest – and to this day the mosteffective – responses to the datadriven nature of decision support systems was the dimensionalschema design methodology pioneered by RalphKimball and others. Dimensional modeling soughtto interact with the business user at the businessvocabulary and business process level, to designinherently-legible star schema based on the keynominative elements of those business vocabulariesand processes. The population of those schema wasthen largely a technical matter of matching availabledata elements in transactional source systems tothe designed schema, creating or synthesizing dataelements when they were not available natively inthe systems of record. The fundamental notion behind dimensional modelingwas, we believe, that while it might not be possibleto gather data requirements from a community of business analysts, it was in fact possible to gather analytical requirements from a community of businessanalysts, and subsequently to map available and/or synthesizable data in the organization to thoseanalytical requirements, as embodied in a dimensionalmodeler’s star schema designs.By the end of the 1990s, however, dimensionalmodeling practitioners found that dimensionalmodeling exercises depended, ultimately, on theability of designers to prototype their dimensionaldesigns quickly, and expose business analysts tothose designs, populated with actual data, before thedesigns were put into production.This rapid-prototype-and-iterate cycle was necessary,dimensional designers discovered, because – insupport of Inmon’s srcinal point – a businessanalyst’s understanding of her decision-making needsand capabilities was often crystallized only by seeingwhat she had asked for during an initial requirementsgathering process. Overview  | page 2July 2006Copyright © 2000-2007 by WhereScape Software All rights reserved.www.wherescape.comUnderstanding The Data Warehouse LifecycleOverview The pattern of behavior that drove the dimensionalmodeling community to a recognition of the need for rapid-prototype-and-iterate cycles was, by the end of the 1990s, quite widely reported, and cross-cultural. Asked in open-ended fashion to describe their >information needs, business analysts frequentlyresponded with one of two generic positions:‘What data is available?’ and ‘I need everythingwe have,’ which good designers recognized asbeing fundamentally the same answer. Asked to review and approve a populated>dimensional model based on their statedanalytical needs and business models, businessanalysts frequently responded with variants of ‘Yes, this is what I asked for,’ and ‘Now that Isee it, I’d like to make some changes,’ followedby requests for often fundamental design modications or entirely new kinds of schema.  At roughly the same time as practitioners werediscovering the need for rapid prototype-and-iterate cycles (and the need for tools to supportthat process), teams operating and managingproduction data warehouses and marts werediscovering a number of additional problems withthen-conventional notions of how to build datawarehouses and data marts:Prototyped data warehouses and data marts>often could not be moved into production without either signicant modication to the technology infrastructure used to build the prototype or wholesale rehosting, because the tools andtechnologies used to manage production datawarehouses and marts – including but not limitedto extraction, transformation and load (ETL),scheduling and monitoring tools – were differentthat those used to build the prototypes. 1 Data warehouses and data marts were often>incredibly expensive to manage in production. It was not uncommon for a signicantly-sized production data warehouse to require 5-7 full-time-equivalents on an ongoing basis to keepthat data warehouse stable, in production andavailable. This high second-order operating costwas most often attributable to the brittleness of the technology infrastructure of the productionwarehouse, the high rates of ETL failure, and theinability of data warehouse operators to recover from ETL failures gracefully and quickly.Once in production and stable, the technology,>processes and procedures wrapped aroundthe typical data warehouse or data mart wereso complex, and the technology deliveringdata into the target data warehouse and datamart schema so carefully balanced (oftendown to the point release of each revision levelof each software product contributing to theinfrastructure), that change of any sort, andparticularly changes to the core data warehouse or data mart schema needed to reect changes in the business environment (and therefore inthe business analysts’ needs) were impossible toimplement.The notion that data warehouses and data martshad a lifecycle, and that that lifecycle involved areturn to design at the schema level, was thus well-established as a notion among practitioners by theend of 1990s.Yet today, a Google search on the phrase “datawarehouse lifecycle” reveals relatively few content-rich sites, and data warehouse lifecycle modelsare still often presented using waterfalls or threadmodels that end with deployment, which is in factwhere real-world data warehousing – in terms of  ongoing business benet – begins. 1 In WhereScape’s estimation, a substantial number of the specic project failures reported in the late 1990s – when datawarehouse/data mart project failure rates were held by analysts to be as high as 70% – were attributable to the inability to take prototyped data warehouses or data marts into production in a timely fashion.
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks