The Cabin on the Lake: Data repositories for Business Analytics 

“Big Data” continues to compete with “Blockchain”, “Crypto” and “Cloud” for the top spot in the current technological craze, but not without its reasons.  Big Data Analytics continues to help businesses thrive and grow by condensing the constant influx of data into a digestible, visual representation. A sometimes-overlooked aspect of Big Data is the proper repository needed to contain it. 

Nowadays, most people are familiar with what a database is, at least on a surface level. What’s not as known are the database’s two younger brothers – Data Warehouse and Data Lake. If you take these two technologies at face value, they’re the same thing: a data repository for Big Data to be used for analytics. What differentiates these two is their structure and their purpose. 

Let’s begin with Data Warehouses. Fathered by Bill Inmon in 1970 and later modified by Ralph Kimball in 1996, a Warehouse is a database not unlike relational ones but differ with regards to the structure of their schema. Often when building Data Warehouses, developers make use of the “dimensional modelling” methodology to define the schema. 

In dimension modelling, we have two different types of tables: facts and dimension. A fact table, as the name implies, contains all the facts of the data. You generally store all your aggregations in your fact table, such as sales and quantity. The fact table is then linked to a multitude of dimension tables. These dimensions contain non-aggregatable information. You can have a dimension about your customers, your suppliers, your products, geographical/time data, etc. Your general dimension model structure will have a single fact table linked to all relevant dimension tables. We call this a “star schema” because, as you can guess, the schema happens to look like a star when visualized. Keep in mind this is a very broad definition of dimensional models. In a more nuanced explanation, we would touch upon different types of facts, dimensions, and schemas. 

With all that said, what do we use Data Warehouses for? Generally, for very specific objectives related to analysing archived data. We consume a lot of data every day, but not every piece of data is relevant to driving our business. When we want to analyse that specific thread of data, we build a warehouse based on what we want to draw from that data. Building your warehouse will often require selecting the relevant data sources, extract them to the data and transform it so that its unified, readable, and non-redundant. We can then employ the use of Business Intelligence tools to read our data and create visual dashboards for easier report generation and well-informed decision making. 

Conversely, what is a Data Lake? Conceptually, it’s like a Warehouse; a data repository for Big Data. But that’s where the similarities end. Imagine the structure and rigidness of a Warehouse and throw it out the window. A lake contains large volumes of raw data without any transformations. You also don’t have to be picky with your data. If it exists, it goes into the lake. Compared to the Warehouse, it sounds very archaic, doesn’t it? It doesn’t end there, however. When building a Lake, you don’t tend to have any purpose for it in mind. You build it in hopes that it becomes useful. 

You might be thinking that compared to the Warehouse, the Lake doesn’t sound as appealing. It might be the case that your business isn’t in need of a Lake currently. Because of its raw formatting, Data Lakes make perfect platforms for data scientists to employ Machine Learning algorithms – programs that can “learn” by being trained – to make fast and accurate predictions. Data also is stored into the lake in real time. If you’re part of an industry that needs to make on the spot decisions based on current data, you might benefit greatly from Lakes. 

More and more data is being generated on a daily basis, leading to more potential for growth in your analysis and decision making. However, it’s easy to sink under the vast volume of data you might have, and it doesn’t take much to mismanage your data. Make sure to be as informed as you can possibly be before taking a large step to a more successful future. 

At iMovo, we work with best-in-class technologies to enable organisations to mine and visualise their information in a meaningful way that helps them to take the right decisions at the right time. As an expert in the field of Business Intelligence and Big Data Analytics, we can help your company implement and sustain a product and guide you on how BI and Big Data can complement your existing environment.  For more information contact us on [email protected]