Big Data and the Three Little Pigs
I’ve recently been involved in a project that advised clients on how to manage their enterprise data assets. This invariably revolved around the issue of how to remain agile and responsive to business demands for analytics while maintaining integrity and reliability of the data. There is also the issue of soaring ETL cost for provisioning this data.
Big Data looms large in this discussion, recognising the need to be able to manage this beast and at the same time continue to support the need for traditional enterprise relational structured data. The answer is around matching the level of data integrity and reliability with the intended use of the data. We often hear an approach where data is categorised in to value types – Gold, Silver or Bronze. I actually prefer the three little pigs model of STRAW, STICKS, and BRICKS. This more closely reflects the nature of housing data in an environment with differing levels of protection (integrity and reliability), matched to effort (cost) of delivery.
Financial, external reporting and regulatory compliance data needs to be built in a house made of BRICKS. It needs to be on solid foundation, with data reconcilable to the original source. And as often is the case for financial data, it is used for multiple functions in the enterprise covering decisions on product and service pricing, sales performance and analysis and the all-important staff/executive incentive reward calculations.
Data housed in STICK is sturdy and has a well thought out designed and structure. It is application specific and is not designed for enterprise usage and cross functional sharing. A lot of marketing data marts would generally fall in to this category with purpose specific data transformation. The data transform, reconciliation and lineage requirements would just be enough for the specific business purpose and nothing more. By nature, a house made of STICKS is cheaper and faster to build than a house made of bricks.
And finally, the house made of HAY. It’s essentially a pile of data that is transformed, grouped, summarised on the specific day that it needs to be used. There is little structure and probably in the original source data format. You can shape it in any way you want. The solution design is often not easily replicated or scaled up, but is good in answering a non-specific and non-recurring business issue. As such there is little requirement for reconciliation and replication.
The analogy is fairly elementary. Where it comes into its own is in applying to Data Governance principles. Business areas would want a fast and cheap answer if they can get away with it – IT would want a solution that they can stand on to deliver a robust and reliable service. Here-in lies the inherent conflict in most Business and IT discussions.
The HAY solution is attractive because it’s generally quick and cheap and the cost of failure is very low. Data labs and discovery environment are thriving because of this new economics of value recognition from data (including Big Data) specially in a self-service environment. It is common for the business to prefer “quick” and “ cheap” solutions.
The big shock comes when IT comes back with the cost of converting the HAY solution in to a STICK or BRICK environment. “Why can’t they use the solution we designed in 2 weeks?”. Well, they can. But building bricks on top of a hay foundation . . . even sticks on hay doesn’t work. And there is little IT savings in leveraging a designdone in HAY in order to build STICK and BRICK solutions. The main value is the certainty of the usefulness of the information that will be produced.