Thursday, September 11, 2008

Tool mentality

The tool mentality is the tendency to focus on the tool as opposed to the task. While knowing how to use and implement a tool is important, knowing how to find and use the right data is the most important skill a data management professional can have. A competent data management professional should be able to come quickly to speed with an 'insert tab A into slot B' automation tools but a person good at using some given 'insert A into slot B' tool may not understand the data. In data profiling, for instance, the successful program critically requires that the team understands what the data is, where the data comes from and what it means, who will use it and who has used it, how the data is processed and why the data is important. In my experience, tools tend to drive a tool-mentality culture that ignores these requirements while a more heavily manual environment tends to create a culture that focuses on these needs. Profiling tools provide the ability to reduce set up time while providing invaluable training to your more junior data profiling personnel and/or allow more expensive personnel to do more, produce the clues to what more in-depth ad-hoc queries are needed to better profile the data and support a documentation structure. Some tools allow me to easily automate and manage data rules (many of the tools needed to meet these requirements are an integral part of modern DBMS) and/or provide the monitoring needed for production profiling (many of these tools are bundled with the DBMS).

Tuesday, July 29, 2008

Impact of Unstructured Data

What are some of the things that unstructured data impacts? The list is continually growing but the data must be transformed from unstructured to structured data. There are tools on the market (either ones you will need to buy or might already have) that will support this requirement. In an earlier post, I talked about prototyping and this transformation would be a good candidate. Data integration needs will change in that unstructured data would need to be transformed into structured data and data discovery techniques would need to change. Is traditional ETL the best route or would extract, load and then transform work better? Would you only need evolutionary change that may put another letter in the traditional DI techniques be enough or would a revolutionary change be needed? Is there a better way that has not yet been implemented? How would this effect and change data capture you have implemented? How you do data modeling changes?. Currently most data arrives to the data warehouse in structured form already so most traditional methods work. Since you probably want the same level of professional management for your unstructured data as your structured data and the same level of service to the end-user, you probably want to be able to put the unstructured data into the data warehouse. How do you do that in a way that supports access as a part of your overall BI program and whole document access? Does this effect what your data warehouse team needs to know about handling unstructured data? Think what new training these people need.

Sunday, July 27, 2008

Thoughts about Unstructured Data

Most BI/DW environments are supported by a robust technology stack for structured data however are not well suited for supporting semi-structured/unstructured data. Does this mean that existing investments will be replaced? I don't think so since modeling such data into more structured data formats can often be automated and the process is well-known. Many data warehouses are built on database systems that have XML in the database, capabilities to index unstructured data, built-ins for simple parsing and publicly available tools to support more complex requirements. All of these provide the ability for semi-structured and unstructured data to be stored in the current data warehouse and thus managed in the same way that the structured data is managed. There are several end-user tools that also meet many of these requirements either as dedicated BI tools or office tools that can save unstructured documents in semi-structured forms. This data can now be cleansed, moved, backed-up, searched and otherwise managed as structured data in the data warehouse.

Thursday, July 24, 2008

More on Data Governance (or as the cool kids call it, DG)

How do you define data governance? According to TDWI, DG is some form of controls for data and its usage. What kind of controls? Controls can tighten access to data to meet compliance to certain security standards, keep your clients from fleeing or create market opportunities by truthfully showing potential clients that their data is secure. You can also use data governance controls to enhance your data integration efforts by either improving access needed data or standardizing the structure of that data.

What are the critical attibutes of data governance? First, the report talks about the "four Ps": People, Policies and Procedures enable the Process. Second, DG must be cordinated with other forms of governance. What are some of the most important? Third, data itself is not really what is governed here rather how the data is accessed and managed. Fourth, a DG initiative intersects with many different business initiatives, and it often is a critical services. I think a good DG program may be incubated in many data-driven business initiatives. Fifth, DG also touches many data management practices so any automation you need might be found in the tools you already use for many of your data managment practices. Last, DG is a balancing act among many competing priorities in your business. All this means that a good data governance program is cross-functional so shouldn't the data governance board be staffed by cross-functional personnel and led by people who have both the politcal and inherrent power to drive the program? Are there other ways to get a strong enough mandate?

Why do you need a DG program? Is compliance important? Do fines and jail terms scare you? How about clients abandoning you because of bad data management? Could you lose revenue? High quality, auditable data decreases costs and improves the quality of initiatives that consume the data. I've found that a DG progam can even ferret out broken, obsolete or otherwise useless business processes. In one case, certain data-entry codes were only known to a certain person nearing retirement. All organizations are subject to the possibility of mergers, acquistion or reorganizations. DG reduces the risks associated with these events, speed up the time period it takes to complete such a project and otherwise reduces other costs.

What are some of the other benefits of data governance? What are your barriers? What have you automated and how did you do it?

Sunday, July 20, 2008

Data Governance

What was the starting point for your data governance? Was it compliance, a BI initiative, a CRM implementation or something else? Data Governance efforts need to be prioritized while compliance is often the driver, data quality and integration efforts are more common. Remember, any data governance effort will eventually involve almost all parts of any organization. Many new initiatives will be enabled while others will be postponed or eliminated, broken business processes will be identified and cross-functional communications will be improved. What organizational structures best support a DG effort? I've usually emphasized that a DG effort should include people who can enforce policies across the business as well as those who understand the process of data management.

Saturday, July 19, 2008

Prototyping

Recently, I read an article by Jeff Reagan about data warehouse prototyping (he calls it virtual prototyping). I've seen similar concepts proposed to support data mining and data quality programs. I've also come across some tools that might enhance the experience of the end-user in a prototyped environment. The greatest selling point of this type of prototyping is that you can use it to prove business value without the heavy investment in time and money that often occur when a BI program is implemented. Another great feature was the ability to share in an interactive way the proposed results of a specific BI project which not only would excite them about the business initiatives that the BI/DW program will support but creates a potential for end-user involvement that helps insure the success of your BI program. It may even allow for new or improved requirements. These were some of the advantages presented, can you think of any more? Do you think this could solve certain issues with your BI program? What problems could this create?