Sunday, July 27, 2008

Thoughts about Unstructured Data

Most BI/DW environments are supported by a robust technology stack for structured data however are not well suited for supporting semi-structured/unstructured data. Does this mean that existing investments will be replaced? I don't think so since modeling such data into more structured data formats can often be automated and the process is well-known. Many data warehouses are built on database systems that have XML in the database, capabilities to index unstructured data, built-ins for simple parsing and publicly available tools to support more complex requirements. All of these provide the ability for semi-structured and unstructured data to be stored in the current data warehouse and thus managed in the same way that the structured data is managed. There are several end-user tools that also meet many of these requirements either as dedicated BI tools or office tools that can save unstructured documents in semi-structured forms. This data can now be cleansed, moved, backed-up, searched and otherwise managed as structured data in the data warehouse.

No comments: