9号彩票开户

关注微信  |  微博  |  腾讯微博  |  RSS订阅
读者QQ群③:168129342,投稿请发dashuju36@qq.com
我要投稿

用三只小猪的故事解释大数据存储

 

三只小猪

大数据的部署实施需要结合具体的应用场景。实际上,企业大数据的存储处理可以用 “三只小猪盖房子”(分别使用稻草、木头和砖头)的故事来说明,这个故事能更形象地反映数据存储环境下与交付服务(成本)相对应的不同保护级别(完整性和可靠性)。

财务数据、对外报告和法规遵从性数据需在“砖房”(BRICKS)环境中存储处理。这些数据需要可靠的硬件基础设施,并与其原始来源保持一致。企业中多个职能部门使用产品服务定价决策、销售业绩及分析以及至关重要的员工/管理层薪酬激励机制计算等财务数据,这是很常见的情况。

精心设计的“木房”(STICK)环境可确保存储数据牢固耐用。该环境专用于应用程序,而并非针对企业级使用和跨职能部门数据共享而设计。该数据类型可专门用于数据转换,通常包括大量营销数据集市。仅数据转换、协调及沿袭等必要功能即可满足特定商业用途。与上述“砖房”相比,“木房”从本质上讲,成本更低,速度更快。

最后介绍“草房”(HAY)。“草房”实际上是指在需要使用数据的特定日期对数据进行转换、分组及汇总。其中,数据可能以原始来源的数据格式存在,几乎不需要任何数据结构。用户可任意调整数据格式。虽然 “草房”设计无法轻易复制或纵向扩展,却适用于应对非特定、非重复性商业问题。该方案对数据协调及复制的需求低。

使用“三只小猪”的类比相当直观,但具体解决方案应参考数据管控(Data Governance)方针。如能应对自如,业务部门希望快速获得低成本解决方案;而IT部门则需要依托可靠的解决方案,提供健全、可靠的服务。这也是业务及IT部门大多数讨论中的固有矛盾。

由于部署迅速、成本低且失败的代价低,“草房”解决方案备受关注。在新的经济机制下,特别是在自助式环境下用户对数据(包括大数据)价值的认可,是数据实验室和探索环境快速发展的原因。因此,业务部门选择快速、低成本的解决方案也不足为奇。

但将“草房”方案升级为“木房”或“砖房”环境时,IT部门的成本令人非常震惊。“为什么他们不能使用我们两周内设计的解决方案?”他们可以。但在“草房”的基础上部署“砖房”甚至是“木房”方案都行不通。利用“草房”的设计方案部署“木房”及“砖房”方案,将浪费IT部门大量预算。

其主要挑战是识别数据重要性的数据管控策略和过程。在“草房”环境中设计出的“创意”方案需迁移至更稳定的环境时,参与数据管理方式(草房、木房还是砖房)决策的相关负责人需要全面了解下游数据的重要性。

英语原文:

Big Data and the Three Little Pigs

I’ve recently been involved in a project that advised clients on how to manage their enterprise data assets. This invariably revolved around the issue of how to remain agile and responsive to business demands for analytics while maintaining integrity and reliability of the data. There is also the issue of soaring ETL cost for provisioning this data.

Big Data looms large in this discussion, recognising the need to be able to manage this beast and at the same time continue to support the need for traditional enterprise relational structured data. The answer is around matching the level of data integrity and reliability with the intended use of the data. We often hear an approach where data is categorised in to value types – Gold, Silver or Bronze. I actually prefer the three little pigs model of STRAW, STICKS, and BRICKS. This more closely reflects the nature of housing data in an environment with differing levels of protection (integrity and reliability), matched to effort (cost) of delivery.

Financial, external reporting and regulatory compliance data needs to be built in a house made of BRICKS. It needs to be on solid foundation, with data reconcilable to the original source. And as often is the case for financial data, it is used for multiple functions in the enterprise covering decisions on product and service pricing, sales performance and analysis and the all-important staff/executive incentive reward calculations.

Data housed in STICK is sturdy and has a well thought out designed and structure. It is application specific and is not designed for enterprise usage and cross functional sharing. A lot of marketing data marts would generally fall in to this category with purpose specific data transformation. The data transform, reconciliation and lineage requirements would just be enough for the specific business purpose and nothing more. By nature, a house made of STICKS is cheaper and faster to build than a house made of bricks.

And finally, the house made of HAY. It’s essentially a pile of data that is transformed, grouped, summarised on the specific day that it needs to be used. There is little structure and probably in the original source data format. You can shape it in any way you want. The solution design is often not easily replicated or scaled up, but is good in answering a non-specific and non-recurring business issue. As such there is little requirement for reconciliation and replication.

The analogy is fairly elementary. Where it comes into its own is in applying to Data Governance principles. Business areas would want a fast and cheap answer if they can get away with it – IT would want a solution that they can stand on to deliver a robust and reliable service. Here-in lies the inherent conflict in most Business and IT discussions.

The HAY solution is attractive because it’s generally quick and cheap and the cost of failure is very low. Data labs and discovery environment are thriving because of this new economics of value recognition from data (including Big Data) specially in a self-service environment. It is common for the business to prefer “quick” and “ cheap” solutions.

The big shock comes when IT comes back with the cost of converting the HAY solution in to a STICK or BRICK environment. “Why can’t they use the solution we designed in 2 weeks?”. Well, they can. But building bricks on top of a hay foundation . . . even sticks on hay doesn’t work. And there is little IT savings in leveraging a designdone in HAY in order to build STICK and BRICK solutions. The main value is the certainty of the usefulness of the information that will be produced.

End.

转载请注明来自36大数据(36dsj.com):36大数据 » 用三只小猪的故事解释大数据存储

36大数据   除非特别注明,本站所有文章均不代表本站观点。报道中出现的商标属于其合法持有人。请遵守理性,宽容,换位思考的原则。

评论 抢沙发

  • 昵称 (必填)
  • 邮箱 (必填)
  • 网址
友情链接:北京赛车pk拾开奖直播  鸿利彩票  北京赛车pk拾平台评测网  北京赛车pk拾综合走势图  北京赛车pk拾开奖号码查询  

免责声明: 本站资料及图片来源互联网文章,本网不承担任何由内容信息所引起的争议和法律责任。所有作品版权归原创作者所有,与本站立场无关,如用户分享不慎侵犯了您的权益,请联系我们告知,我们将做删除处理!