<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Bob Lambert &#187; Data Management</title>
	<atom:link href="http://robertlambert.net/tag/data-management/feed/" rel="self" type="application/rss+xml" />
	<link>http://robertlambert.net</link>
	<description>on business-aligned information technology</description>
	<lastBuildDate>Sat, 24 Jul 2010 20:26:02 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.0</generator>
		<item>
		<title>SQL Saturday #30, Richmond Virginia, April 10, 2010</title>
		<link>http://robertlambert.net/2010/04/sql-saturday-30-richmond-virginia-april-10-2010/</link>
		<comments>http://robertlambert.net/2010/04/sql-saturday-30-richmond-virginia-april-10-2010/#comments</comments>
		<pubDate>Fri, 09 Apr 2010 16:44:06 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[IT]]></category>
		<category><![CDATA[Business Analysis]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Database Design]]></category>
		<category><![CDATA[Requirements]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=900</guid>
		<description><![CDATA[Thanks to all who attended my presentations at SQL Saturday on April 10.  Here are the materials from my two presentations: - The Business End of Data Modeling (2.5m powerpoint presentation) - Normalize Metadata For Data Integration Analysis (5.5m full version, zip including presentation and code samples) - Normalize Metadata For Data Integration Analysis (small) [...]]]></description>
			<content:encoded><![CDATA[<p>Thanks to all who attended my presentations at SQL Saturday on April 10.  Here are the materials from my two presentations:</p>
<p>- <a href="http://robertlambert.net/wp-content/uploads/2010/04/BusinessEndOfDataModeling20100410.pps">The Business End of Data Modeling</a> (2.5m powerpoint presentation)</p>
<p>- <a href="http://robertlambert.net/wp-content/uploads/2010/04/NormalizeMetadataForDataIntegrationAnalysis.zip">Normalize Metadata For Data Integration Analysis</a> (5.5m full version, zip including presentation and code samples)</p>
<p>- <a href="http://robertlambert.net/wp-content/uploads/2010/04/NormalizeMetadataForDataIntegrationAnalysissmall.zip">Normalize Metadata For Data Integration Analysis (small)</a> (2m reduced size version, graphics removed from ppt file)</p>
<p>Here are some quick notes for those looking to run the Metadata prototype:</p>
<p>The prototype metadata database includes SQL Server 2008 data definition language and data manipulation language (DDL and DML) needed to create the database and populate it with tables and columns from Microsoft’s AdventureWorksDW sample database. It also includes a sample requirements spreadsheet and source-to-target map, and SSIS jobs to load the spreadsheets to corresponding metadata tables. These define fictional requirements and mappings to populate the AdventureWorksDW FACTInternetSales table from tables in the AdventureWorks sample database.</p>
<p>AdventureWorks and AdventureWorksDW are available here: <a title="AdventureWorks DB and DW downloads" href="http://msftdbprodsamples.codeplex.com/Wikipage" target="_blank">http://msftdbprodsamples.codeplex.com/Wikipage</a> (accessed 4/14/2010)</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2010/04/sql-saturday-30-richmond-virginia-april-10-2010/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>On DW federation, whac-a-mole, and integrating business data</title>
		<link>http://robertlambert.net/2010/01/on-dw-federation-whac-a-mole-integrating-business-data/</link>
		<comments>http://robertlambert.net/2010/01/on-dw-federation-whac-a-mole-integrating-business-data/#comments</comments>
		<pubDate>Sat, 02 Jan 2010 17:39:10 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Alignment]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[CapTech]]></category>
		<category><![CDATA[Data Quality]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=691</guid>
		<description><![CDATA[Information Management recently sent around their pick of best IM blog articles of 2009.  Among them was Forrester’s James Kobelius’s reaction to Bill Inmon’s “incineration of a straw man concept that he refers to as ‘virtual data warehousing (DW).’”  According to Mr. Inmon, virtual data warehousing reminds him of the carnival game called whac-a-mole.  He [...]]]></description>
			<content:encoded><![CDATA[<p><a title="Information Management" href="http://www.information-management.com/" target="_blank">Information Management</a> recently sent around their pick of best IM blog articles of 2009.  Among them  was <a title="Inmon’s Vitriolic Slap At “Virtual Data Warehousing” Does Not Withstand Scrutiny - James Kobelius" href="http://www.information-management.com/blogs/inmon_kobielus_virtual_data_warehousing_challenge-10015212-1.html?ET=informationmgmt:e1275:1038858a:&amp;st=email" target="_blank">Forrester’s  James Kobelius’s reaction</a> to Bill Inmon’s “incineration of a straw man  concept that he refers to as ‘virtual data warehousing (DW).’”  <a title="The Elusive Virtual Data Warehouse - Bill Inmon" href="http://www.b-eye-network.com/view/9956/" target="_blank"></a></p>
<p><a title="The Elusive Virtual Data Warehouse - Bill Inmon" href="http://www.b-eye-network.com/view/9956/" target="_blank">According to Mr. Inmon</a>,  virtual data warehousing reminds him of the carnival game called <a title="Whac-a-mole at wikipedia" href="http://en.wikipedia.org/wiki/Whac-A-Mole" target="_blank">whac-a-mole</a>.  He says  “just when you think this incredibly inane idea has died and just when someone  has delivered what should have been a deathly blow, out it pops again from  another hole.” There’s just a very informal definition of virtual DW in Mr.  Inmon’s post (remember, he says he’s whacked this mole before), but, as I  interpret, he’s talking about a system built after a decision to avoid all the  expense of building a data warehouse by just having a query engine that pulls  the data from wherever it lives. Mr. Inmon argues that a query accessing diverse  databases would leave data integration to the user, and there’s no guarantee  that two users would integrate data the same way.  He cites virtual database  query inefficiency risks and, on the assumption that the query is trolling  operations focused databases, says that source data would be “tuned” to  operational rather than informational specifications for history retention and  completeness.</p>
<p>Mr. Inmon’s ideas drew quick reaction from Mr. Kobelius and <a title=" Time to Rexamine the &quot;Virtual&quot; Data Warehouse" href="http://www.b-eye-network.com/blogs/raden/archives/federated_data_warehouse/" target="_blank">Neil  Raden</a>.  Each in his own measured way stresses that integration can be  compatible with distributed architectures, and that there is a DW solution  architected for efficiency that includes effective data integration from diverse  sources: the Federated Data Warehouse.</p>
<p>Experience and emerging tools reinforce their point.  According to a colleague at CapTech, for smaller organizations &#8220;you can deal with this issue using a BI tool with a metadata layer that has joins predefined: the data integration is done by the BI metadata modeler.&#8221;  Another CapTech&#8217;er cites mashup as a potential quick and dirty approach.  Check out &#8220;7 Mashups Every Company Needs&#8221; <a title="7 Mashups Every Company Needs" href="http://www.jackbe.com/mashups/7mashups.php" target="_blank">here</a>.</p>
<p>A  well-architected federated warehouse certainly can integrate and deliver data,  maintain history, and enable a “single version of the truth”, perhaps in a more  timely manner than a “traditional” DW architecture.  On this question the devil  is in the specifics of the situation.  It is difficult to argue  one way or another out of the context of a real project in a real  organization.</p>
<p>However, even though it certainly has a technical side, data integration is  first a <em>business </em>activity.  Sometimes when we apply terms like  “semantic rationalization” to software components, we in IT start believing you  can actually build a machine that does the things you need to do to rationalize  data semantics, like figure out the corporate definition of a customer.  Of  course all we can do in IT is to build the empty shell.  The real work happens  when business people from departments whose data is being integrated sit down  and decide how they are going to define “staff member”, “customer”, and so on.   Only business professionals can say, for example, whether they want to include  contractors in staffing reports or whether the term “customer” includes  homebuyers under contract but not yet closed.</p>
<p>Integration tools that support data warehouses, whether centralized or  federated, are only as good as the business consensus behind them. The consensus  behind integrated data is arguably more rewarding to the business that the tools  because with consensus on critical objects and events come non-IT-specific  improvements like reduction of repetitive and conflicting business processes,  reduced communication breakdown due to terminology disconnects, and more.</p>
<p>To me the beauty of the Inmon DW model is that it provides a mechanism that  can assist an organization in evolving toward improved <a title=" Guage your Data Warehousing Maturity" href="http://www.information-management.com/issues/20041101/1012391-1.html" target="_blank">information  maturity</a>.  Organizations achieve some benefit by simply integrating data  into a single data warehouse.  However, the data warehouse also makes source  data quality problems obvious and blatantly reveals differences in data meaning  from one operational source to another.  So the warehouse delivers some benefit  early and also shows how much better it would be if data were integrated.  It  therefore becomes a tool for identifying, assessing, prioritizing, and  motivating correction of data deficiencies.</p>
<p>For organizations not so far along on the maturity curve, the additional  complexity of the federated warehouse tends to obscure this data quality  feedback loop. Federation based on drawing from operational sources integrates  data from a set of different databases built toward different architectural  goals.  On the other hand, the logical data model for the enterprise warehouse  is the enterprise data model, and its architectural objective is to integrate  enterprise data to provide a single source of truth.  Therefore, the enterprise  data warehouse provides an architectural focal point for integration.  It  isolates responsibility for improving data integration crisply at either the  source or the warehouse, and — within the framework of solid information  management strategy, management, and facilitation — motivates diverse business  players to work toward consensus definition of enterprise data.</p>
<p>Federation, or virtual data warehousing if you will, can be the best strategy  for the mature organization that has already integrated business data to a  consistent enterprise view.  For the rest of us, the single centralized  warehouse with its unambiguous architectural goals and borders seems the  shortest distance to achieving the business benefits of data integration.</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2010/01/on-dw-federation-whac-a-mole-integrating-business-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Coming soon: data like money</title>
		<link>http://robertlambert.net/2009/05/coming-soon-data-like-money/</link>
		<comments>http://robertlambert.net/2009/05/coming-soon-data-like-money/#comments</comments>
		<pubDate>Sat, 23 May 2009 14:43:54 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Alignment]]></category>
		<category><![CDATA[CapTech]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Strategy]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=455</guid>
		<description><![CDATA[It is a commonplace to say we should manage data like a resource. But when you think about it, data is an asset but not a resource.  Data isn&#8217;t a thing like real estate, employees, or customers, but rather it represents all of those things.  In data-geek-speak, data is a meta-resource that holds information about [...]]]></description>
			<content:encoded><![CDATA[<p>It is a commonplace to say we should manage data like a resource. But when you think about it, data is an asset but not a resource.  Data isn&#8217;t a thing like real estate, employees, or customers, but rather it represents all of those things.  In data-geek-speak, data is a meta-resource that holds information about resources.  That makes data a lot like money.</p>
<p>In his book <em>Money Mischief </em>Milton Friedman made the point that money has no intrinsic value: &#8220;<a title="Money, Value, and Monetary History" name="text-2" href="http://www.friesian.com/money.htm" target="_blank">The value of money is the value people</a><a title="Money, Value, and Monetary History" href="http://www.friesian.com/money.htm" target="_blank"> <em>attribute</em> to what they <em>want</em> to exchange, no more, no less.</a>&#8221; Likewise, data has no value in itself.  Its value is derived from people&#8217;s desire to know about the things the data describes, and how reliably and accurately it describes those things.  So an organization&#8217;s data, like its money, is not a resource in itself.  It is an asset that represents the resources that an organization manages and controls.  It follows then that data management should look a lot like money management.</p>
<p>A cornerstone of our economic stability is consensus that organizations must manage money well and make their internal money management visible to investors, regulators, and independent standards groups.  We&#8217;ve evolved a standard for money management where a department represented by a C-level executive administers formal accounting, budgeting, planning, and financial reporting.  The organization evaluates every manager&#8217;s compliance to money management policies, and independent auditors evaluate the organization&#8217;s soundness in terms of its money management.  Accounting professionals meet rigorous, generally respected certification standards.</p>
<p>Overall, our volume of online purchases and use of FDA-approved drugs, for example, attest to our general confidence in current data management practices.  But still, data  professionals know that it could be a lot better.  Scarcely a week goes by without another scandal involving lost customer data, and consider these snafus:</p>
<ul>
<li><a title=" 	 Katrina data management snafus compound chaos" href="http://searchcio.techtarget.com/news/article/0,289142,sid182_gci1132934,00.html" target="_blank">This article</a> cites multiple non-compliant databases as a significant contributor to the chaos in reuniting families in the wake of the Katrina disaster</li>
<li>&#8220;The Mars <em>Climate Orbiter</em>, a key part of NASA&#8217;s program to explore the planet Mars, vanished in September 1999 after rockets were fired to bring it into orbit of the planet. An investigative board later discovered that NASA engineers failed to convert English measures of rocket thrusts to newtons, a metric system measuring rocket force, and that was the root cause of the loss of the spacecraft. The orbiter smashed into the planet instead of reaching a safe orbit.&#8221; (<a title=" Data quality management: Problems and horror stories" href="http://searchdatamanagement.techtarget.com/generic/0,295582,sid91_gci1251808,00.html" target="_blank">cited here</a>)</li>
<li>One Fortune 1000 services company carried separate customer records in each of its operating units resulting in a number of anomalies visible to the customers.  For example, the same customer would receive separate invoices with different terms for each of the services purchased from the company.</li>
</ul>
<p>In parallel with emergence of these types of issues, regulators and industry associations have set data management standards for many industries and practice areas.   Food and consumer product safety rests on a regulatory foundation of correctly recording and managing results of inspections.  The International Air Transport Association sets <a title="IATA Safety Data Management" href="http://www.iata.org/whatwedo/safety_security/safety/safety_data.htm" target="_blank">standards for safety data collection and management</a>.  Likewise, the US Food and Drug Administration and other governing bodies set clinical safety data management and reporting standards.</p>
<p>It is just a matter of time before the many separate externally imposed data management guidelines congeal into a a set of general best practices that apply across the organization.  Then investors, regulators, and standards groups will hold organizations responsible for effective data management in the same way they are held to account for effectively managing money. An internal department represented by a C-level executive will administer formal data management standards and procedures.  The organization will evaluate every manager&#8217;s compliance with data management policies, independent auditors will evaluate the organization&#8217;s soundness in terms of the quality of its data management, and data management professionals will be held to rigorous, generally respected certification standards.</p>
<p>Farfetched? Maybe.  But it isn&#8217;t farfetched to think that as a society we&#8217;ll begin to recognize what data professionals have known for a long time: that the quality of an organization&#8217;s products, its care of and protection of its customers, workforce, resources, stewardship of the environment, and even its financial health depend to a significant degree on sound data management practices.</p>
<p>&#8212;</p>
<p>Here are some resources on data management:</p>
<p><a title="DAMA" href="http://dama.org/i4a/pages/index.cfm?pageid=1">DAMA, the organization for data management</a>.</p>
<p><a title="Data Management at Wikipedia" href="http://en.wikipedia.org/wiki/Data_management" target="_blank">The Wikipedia page</a> quotes this definition: &#8220;Data management is the development, execution and supervision of plans, policies, programs and practices that control, protect, deliver and enhance the value of data and information assets.&#8221;</p>
<p><span><a title="Data Stewardship Strategy: 6 Keys to Success" href="http://www.information-management.com/issues/2007_58/data_stewardship_tips-10015252-1.html" target="_blank">Data Stewardship Strategy: 6 Keys to Success</a> by Jill Dych</span><span class="storyByline">é: </span>&#8220;As executives increasingly agree that data is a corporate asset, they are also funding data governance and data quality efforts more willingly. But &#8230; entrenched organizational behaviors are much more difficult to shift. Many companies have introduced the role of data steward before fully defining the role. In these cases, the beleaguered data stewards are doomed before they even begin. &#8221;</p>
<p><a name="&amp;lid=data_quality-articles-pos1" href="http://www.information-management.com/channels/jump.html?portal=data_quality&amp;id=10015411">Leverage Data Quality to Build an Effective Enterprise Architecture</a> by Mark Amspoker.  &#8220;It might be time to rethink the notion that effective information architecture development will solve the data quality problem.&#8221;</p>
<p><a title="Guidelines for Responsible Data Management in Scientific Research" href="http://ori.hhs.gov/education/products/clinicaltools/data.pdf" target="_blank">Guidelines for Responsible Data Management in Scientific Research</a> from the Office of Research Integrity, US Department of Health and Human Services.   &#8220;Data management is one of the essential areas of responsible conduct of research, as outlined by the Office of Research Integrity. This educational course will educate new investigators about conducting responsible data management in scientific research.&#8221;</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2009/05/coming-soon-data-like-money/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Study data early to improve application alignment</title>
		<link>http://robertlambert.net/2009/05/study-data-early-to-improve-application-alignment/</link>
		<comments>http://robertlambert.net/2009/05/study-data-early-to-improve-application-alignment/#comments</comments>
		<pubDate>Mon, 11 May 2009 21:26:52 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Alignment]]></category>
		<category><![CDATA[Business Analysis]]></category>
		<category><![CDATA[Requirements]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=447</guid>
		<description><![CDATA[A recurring theme in the literature on IT over the years has been frequent failure of IT projects.  Most studies lay the bulk of the blame on requirements (examples here and here).  One way to improve accuracy and fit-to-purpose of requirements, and thereby promote project success, is to include data analysis as well as process [...]]]></description>
			<content:encoded><![CDATA[<p>A recurring theme in the literature on IT over the years has been frequent failure of IT projects.  Most studies lay the bulk of the blame on requirements (examples <a title="Failed IT Projects (The Human Factor) by Sheila Wilson" href="http://faculty.ed.umuc.edu/%7Emeinkej/inss690/wilson.htm" target="_blank">here</a> and <a title="Unilog seven deadly sins press release" href="http://www.4-developers.com/docs/7-deadly-sins-press-release.pdf" target="_blank">here</a>).  One way to improve accuracy and fit-to-purpose of requirements, and thereby promote project success, is to include data analysis as well as process analysis in the requirements plan.</p>
<p>I’ve cited <a title="DQ, he isn’t so dumb he just needs glasses" href="../2009/05/dq-he-isnt-so-dumb-he-just-needs-glasses/" target="_blank">here</a> the need to start data interface analysis early to avoid budget and schedule blow-ups when, as a result of not thinking early about interface complexity, data integration work turns out to be bigger and nastier than anticipated.</p>
<p>Early data study also helps business analysts elicit more detailed and accurate business requirements.  Say a mid-level football (soccer) team in the UK is looking to recruit a couple of strikers who can reliably punch home goals for the club.  The obvious data they seek is (1) the number of goals scored per game by each prospect, and (2) over their careers how much time have they spent on the bench due to injury.  At the same time, this club is building a strategic recruiting system to support growth into the higher echelons of English football.  A process-oriented requirements strategy (like the one described <a title="A pretty good requirements analysis checklist" href="../2009/02/requirements-analysis-plan/" target="_blank">here</a>) asks the team&#8217;s recruiters what they need to in order to get good people into the club, and often emerges with a list of statements about what the system will do (”The system shall provide an interface enabling entry of the following player statistics” or “The system shall provide a report ranking players by the following criteria:…”).</p>
<p>It isn&#8217;t necessarily wrong to start with process analysis, especially when backed up with formal techniques like use cases, data flow diagramming, or others, but addition of data analysis early provides ability to be far more perceptive into the real business needs.  Without interviewing anyone a data analyst can know that there are many goals in a game of soccer (OK, to some not nearly enough, but that’s another story), that the attributes of a game include location, weather conditions, date and time, whether it&#8217;s regular season or playoff, and more.  Attributes of a goal: time during the game; left foot, right foot, or head; did it come from a set play or in the run of play; from the left or right side of the field, and much more.</p>
<p>The analyst who knows the data and understands its structure can probe with questions like whether a player tends to score at the end of games, or would it be useful to find one striker who tends to score from the left side of the field and another who scores from the right?  By understanding the data an analyst can understand the business problem more deeply, build better rapport with business people  by asking more informed questions, and cross the business/IT communications gap to define the right requirements so that the right system gets built.</p>
<p>It may be just the organizations I’ve been exposed to, but in my experience data analysis isn&#8217;t typically part of the requirements effort.  Supporting this point, the author of the <a title="&quot;Business analysis&quot; From Wikipedia, the free encyclopedia" href="http://en.wikipedia.org/wiki/Business_analysis" target="_blank">wikipedia page on business analysis</a> entirely omits data analysis, apparently favoring a process-only approach.  On the other hand, object-based techniques offer a balanced approach, studying both data and process by representing things like goals, games, and players as objects with their own attributes and behaviors.  In addition, the <a title="IIBA" href="http://www.theiiba.org/am/" target="_blank">International Institute of Business Analysts (IIBA)</a> includes data-oriented along with process-oriented techniques in its Business Analysis Body of Knowledge (BABOK).</p>
<p>As process/data balance early on in the application lifecycle becomes more widespread analysts should generate more insightful requirements and, other things being equal, the success rate of IT application projects should improve.</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2009/05/study-data-early-to-improve-application-alignment/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>DQ, he isn&#8217;t so dumb he just needs glasses</title>
		<link>http://robertlambert.net/2009/05/dq-he-isnt-so-dumb-he-just-needs-glasses/</link>
		<comments>http://robertlambert.net/2009/05/dq-he-isnt-so-dumb-he-just-needs-glasses/#comments</comments>
		<pubDate>Sun, 03 May 2009 20:58:48 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Alignment]]></category>
		<category><![CDATA[Business Case]]></category>
		<category><![CDATA[CapTech]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Data Quality]]></category>
		<category><![CDATA[Requirements]]></category>
		<category><![CDATA[Strategy]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=418</guid>
		<description><![CDATA[In a recent very thoughtful post on data quality, Paul Erb plays out an analogy comparing data users with Don Quixote and data quality professionals with Sancho Panza, then reverses the analogy to cleverly coin the &#8220;Sancho Panza&#8221; test of data quality professionals.  He encourages data quality professionals promoting the critical role of data quality [...]]]></description>
			<content:encoded><![CDATA[<p>In a recent very thoughtful <a title="I Don't Know Much About Data, but I Know What I Like" href="http://www.typepad.com/services/trackback/6a00d83454da7a69e201156e43e2a4970c" target="_blank">post on data quality</a>, Paul Erb plays out an analogy comparing data users with Don Quixote and data quality professionals with Sancho Panza, then reverses the analogy to cleverly coin the &#8220;Sancho Panza&#8221; test of data quality professionals.  He encourages data quality professionals promoting the critical role of data quality to apply a <em>what would Sancho say </em>test to ensure that they are aligned with the needs and interests of data consumers.</p>
<p>Here&#8217;s Paul&#8217;s description of the Sancho Panza test:</p>
<p style="padding-left: 30px;"><em>Think of Don Quixote [DQ] as the data-quality specialist or even the data management specialist or software vendor, bringing to the world his specialist&#8217;s perspective and vocabulary and enthusiasm, influenced by the books he&#8217;s read, visioning everyday business practices, with his value added, as goldmines for the organization.  Meanwhile Sancho Panza represents the person who does a practical job every day, who knows what works around here and what doesn&#8217;t.</em></p>
<p style="padding-left: 30px;"><em>I advocate to Data Quality (let&#8217;s call it DQ) consultants that they listen to this Sancho Panza, and consider themselves as Don Quixote.  Sancho doesn&#8217;t know much about data, but he knows what he likes&#8230; He&#8217;s open to listening, but slow to change, and he&#8217;ll tell you what he thinks.</em></p>
<p>Paul&#8217;s article reminded me that as a child I thought the problem with Don Quixote was that he tilted at windmills and attempted to ambush acting troupes because of his bad eyesight.  Of course this is not the case, but to me it provides a relevant perspective on data quality in many organizations.</p>
<p>Here&#8217;s the problem I&#8217;ve seen play out on a number of IT application projects:</p>
<ol>
<li>A high level business study recommends replacement or improvement of a current application.</li>
<li>The organization approves the project described in a business case citing benefits named in the business study and costs detailed for infrastructure, package software, and application development, but data-related costs are glossed over or left out entirely.</li>
<li>The project begins with a requirements phase that collects hundreds of imperative statements (&#8220;The system shall&#8230;&#8221;)  from business people who will use the system.</li>
<li>Late in the requirements phase, the team finds that data integration work in system interfaces will be more complex than expected.  A common example: the project requires changes to a feeder application with no documentation and no in-house support expertise.</li>
<li>Project leadership goes back to the sponsor seeking more money.</li>
</ol>
<p>In these situations the business case was incorrect because it did not account for all of the costs of data integration.  I&#8217;ve seen projects weather steps four and five well, but often discovery of previously unseen data complexity starts a disruptive chain of events.  (Sadly for the project manager, such situations are often seen as a failure of project management and corrected accordingly, but that&#8217;s a topic for another post.)</p>
<p>In my view the root cause of unforeseen data complexity on projects is the lack of a data constituency in current IT. It is only recently that success of companies like Google and Amazon have motivated emergence of data as a key business resource in the collective consciousness. Famous success stories notwithstanding (<a title="Show Me the Money: A DM/BI Business Value Primer" href="http://www.google.com/url?sa=t&amp;source=web&amp;ct=res&amp;cd=4&amp;url=http%3A%2F%2Fwww.information-management.com%2Fspecialreports%2F2009_133%2Fbi_data_management_business_value-10015103-1.html&amp;ei=d_j9SaV_kfgwpJTlxwQ&amp;usg=AFQjCNE695M1rfsa2Ex7jvl4eA-_W9S75A" target="_blank">see this link</a>), there are relatively few senior IT managers with data quality backgrounds.  Conversely, many rose through the ranks of the infrastructure, application development, or business (process) analysis groups.</p>
<p>It will be a while before, for example, a Mobil CIO&#8217;s predecessor jobs include definition of a metadata repository or elimination of multipurpose data, but in the meantime here&#8217;s what we can do:  <a title="Big project coming up? Learn to two-step." href="http://robertlambert.net/2009/03/big-project-two-step/" target="_blank">add a business case to the application lifecycle as the last step in requirements</a>.  Stop the project when the real costs are known, recalculate the cost/benefit, and ask the sponsors if the project should continue.  Give Sancho (in this case the project team) a chance to speak to the reality of the situation, and hand to Don Quixote (project sponsors) the eyeglasses of in-depth visibility into real costs. If the decision is to move ahead with the project, then all share the same vision and the sponsors have endorsed the actual project, not the fuzzy image from earlier on that might have been a windmill.</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2009/05/dq-he-isnt-so-dumb-he-just-needs-glasses/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>No business value in nulls</title>
		<link>http://robertlambert.net/2009/04/no-business-value-in-nulls/</link>
		<comments>http://robertlambert.net/2009/04/no-business-value-in-nulls/#comments</comments>
		<pubDate>Sun, 05 Apr 2009 22:10:43 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Analysis]]></category>
		<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Business Analysis]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Database Design]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=345</guid>
		<description><![CDATA[It seems I&#8217;m frequently in conversations about using null to represent a business value.  To paraphrase, say there are credit and cash customers, and there&#8217;s a suggestion to set &#8220;Customer_Type&#8221; to &#8220;C&#8221; for credit and null for cash.  To data and database professionals this is obviously a bad idea, but it&#8217;s not obvious from a [...]]]></description>
			<content:encoded><![CDATA[<p>It seems I&#8217;m frequently in conversations about using null to represent a business value.  To paraphrase, say there are credit and cash customers, and there&#8217;s a suggestion to set &#8220;Customer_Type&#8221; to &#8220;C&#8221; for credit and null for cash.  To data and database professionals this is obviously a bad idea, but it&#8217;s not obvious from a business point of view.</p>
<p>In a database null means that there is literally no value, or the value is indeterminate.  Null is not the same as zero or blank.  When a database operation involves nulls the result can be difficult to predict for someone not practiced in SQL.  In many cases the answer is null.  For example, 1+0=0 but 1+null=null.  In plain English, what you&#8217;re asking the DBMS to do in the latter case is to add 1 to [I don't know what], and of course 1+[I don't know what] equals [I don't know what].</p>
<p>So, if you use null to represent a business value then you might not get the results you&#8217;re looking for when you try to get business answers out of your database.  For example, say &#8220;C&#8221; represents credit customers and null represents cash customers, and you have 2 cash and 1 credit customers.   In SQL Server if you use a Count function to tally all of your cash customers the answer isn&#8217;t 2, it is null.</p>
<p>That&#8217;s one example of why it&#8217;s not a good idea to try to represent a business fact with a null value.  It doesn&#8217;t make business sense and in this case the DBMS, correctly, won&#8217;t make sense of it for you.</p>
<p>To be clear, whether or not a given database column permits null values is an entirely different question, best left to database designers.  For example, a database table might record which patient occupies which hospital bed.  It may be reasonable and correct to assign a null patient ID if the bed is currently available.  However, there are alternative methods of representing this situation, and the database designer should be free to choose the right alternative taking into account the specifics of the application under development.</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2009/04/no-business-value-in-nulls/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Someone&#8217;s integrating your data</title>
		<link>http://robertlambert.net/2009/03/someones-integrating-your-data/</link>
		<comments>http://robertlambert.net/2009/03/someones-integrating-your-data/#comments</comments>
		<pubDate>Thu, 12 Mar 2009 23:16:54 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Database Design]]></category>
		<category><![CDATA[Project Management]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=223</guid>
		<description><![CDATA[Here&#8217;s a little-recognized fact about data integration: if you run a business or any sizable chunk of one, someone is integrating your data. In my professional life I have on occasion suggested data integration efforts.  Sometimes my suggestions have been accepted and sometimes not.  As an IT professional I understand that different managers have different [...]]]></description>
			<content:encoded><![CDATA[<p>Here&#8217;s a little-recognized fact about data integration: if you run a business or any sizable chunk of one, someone is integrating your data.</p>
<p>In my professional life I have on occasion suggested data integration efforts.  Sometimes my suggestions have been accepted and sometimes not.  As an IT professional I understand that different managers have different priorities, and in a given business situation sometimes other things are more important for example than having a single, consistent source for all customer records, or making sure production data matches financial data.</p>
<p>But as a customer?  That&#8217;s different.</p>
<p>A couple of years ago I bought a laptop from a company renowned for quality and customer service.  For the first weeks the computer was all it was cracked up to be, but then it cracked up.  The screen developed a mysterious flicker.  After a few diagnostic conversations they replaced the main logic board.  The problem recurred a few months later, and this time the company traded the lemon for a new computer.</p>
<p>All was well, but here we encounter the first data integration problem: they said <em>I</em> needed to call a different number to have my three-year service agreement transferred, which I did.</p>
<p>Months later I called service for a minor problem, and they had no record of the service agreement for the computer.  My warranty was still connected to the lemon.  After about an hour on the phone this company&#8217;s outstanding support staff came up with a more than satisfactory solution.</p>
<p>Even so, this company&#8217;s service records weren&#8217;t integrated with its warranty records.  In this case data integration happened because of my insistence and the service staff&#8217;s creativity.  The cost?  Only considering my last encounter, three service professionals were tied up for about an hour, and I&#8217;ll think twice before I buy again from this company.</p>
<p>It seems the choice is either pay now to integrate so all applications work from consistent data or pay later by having staff, customers, and suppliers do it on a case-by-case basis.</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2009/03/someones-integrating-your-data/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Beware the devils in the details of data integration</title>
		<link>http://robertlambert.net/2009/03/data-integration-devil-in-details/</link>
		<comments>http://robertlambert.net/2009/03/data-integration-devil-in-details/#comments</comments>
		<pubDate>Sun, 01 Mar 2009 14:26:50 +0000</pubDate>
		<dc:creator>Bob</dc:creator>
				<category><![CDATA[Data Management]]></category>
		<category><![CDATA[IT]]></category>
		<category><![CDATA[Project Management]]></category>
		<category><![CDATA[Data Modeling]]></category>
		<category><![CDATA[Database Design]]></category>

		<guid isPermaLink="false">http://robertlambert.net/?p=189</guid>
		<description><![CDATA[Much of today’s IT application development – custom or off-the-shelf – involves integrating data from legacy systems, third- party software products and external data sources such as demographics or mail lists.  More often than not, data integration is unexpectedly complex, either due to data quality issues or the nature of the data integration itself. Here [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 259px"><a href="http://www.information-management.com/infodirect/20021004/5854-1.html"><img title="Information Management" src="http://www.information-management.com/media/ui/informationmgmt_logo.gif" alt="Excerpt from Illusions, Allusions – Let’s Get Real about Database Design, InfoManagement Direct, October 4, 2002" width="249" height="73" /></a><p class="wp-caption-text">Excerpt from &quot;Illusions, Allusions – Let’s Get Real about Database Design&quot;, October 4, 2002</p></div>
<p>Much of today’s IT application development – custom or off-the-shelf – involves integrating data from legacy systems, third- party software products and external data sources such as demographics or mail lists.  More often than not, data integration is unexpectedly complex, either due to data quality issues or the nature of the data integration itself.</p>
<p>Here are some typical examples:</p>
<ul>
<li>One ERP package uses the same table for both Sales Quotes and Sales Orders. Columns that mean one thing for Quotes mean quite something else Orders. One team extracting data from this ERP package continually mixed up, for example, Date Received on the Quote with Date Prepared for the Order. The designer who blindly copies data from input systems can propagate these issues. In this case, the correct solution is to extract the two documents into separate tables in the destination system, making each column describe either a quote or an order, not both.</li>
<li>Marketing databases often store data purchased from several third parties on the same set of customers. These sources usually include overlapping columns with different values. For the same customer, different sources might store different values for the person’s address, credit scores or even name. It is sometimes important to preserve all of the columns from all of the sources and to maintain the information on where the data came from as well as what its value was. This can result in a messy database design, where columns again carry dual meaning: their value and their source.</li>
<li>Codes from legacy databases tend to evolve into complex forms, embedding more and more information into a single field. This is perhaps a natural reaction to the slow evolution of the system relative to changes in business, as users shoehorn information into the system that it was not designed to store. For instance, in a legacy system a one- character code might classify customers by &#8220;customer category,&#8221; with values 1 for small business, 2 for mid-size, and 3 for Fortune 5000. Users might add codes 4, 5 and 6 for corresponding values for aerospace customers, then 7 for federal government, and so on. The database designer must know the data well to extract each embedded concept into a different destination column.</li>
</ul>
<p>When data integration is part of a project, expect complexity and leave room in interface development estimates for devils in the details of source system analysis and integration design.</p>
]]></content:encoded>
			<wfw:commentRss>http://robertlambert.net/2009/03/data-integration-devil-in-details/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
