Data Lakes Vs Data Warehouses

FESSEX Consulting can help you accelerate or build from the ground up your “Big Data Infrastructure”, including all your data collection, Data Lake, cleans/results layer; A through Z, we will build your Data Pipeline; furthermore, once the pipeline is in place, we can work with you on what comes next, making your data actionable.

As part of building your entire (or partial) Data Pipeline, we will insure that you remain compliant with Privacy legislation (California CCPA and other US states/EU GDPR/China PIPL/Canada/India/Japan/UEA) around the world. Furthermore, we are obsessed with security and as such, our approach in any infrastructure work is security first.

One misconception that we normally have to address is the discussion between Data Lakes and Data Warehouses. They are not interchangeable. The following is a discussion that helps clarify the differences between both of them and the benefits of using Data Lakes over Data Warehouses.

Contact us today & Get a FREE Discovery Call! Start Here

Using Data Lakes and Experimental Data Science to Accelerate Answering Questions

Traditionally, companies have organized their data in data warehouses. Decisions had to be made as to what data to collect and organize and what data to ignore and lose, potentially forever. This schema presents a problem in that not all questions can be answered expediently; in particular, market-driven and strategic questions since NOT ALL DATA is kept.

Let’s look at a potential use case:

Data Warehouse

Est. time to conclusion: 47 Weeks

The organization’s CEO or CFO have a particular question; for the sake of this use case, let’s assume the data was not collected.

The following timeline will play out in order to answer the questions:

Translate the question into a requirement

1 to 4 weeks

Figure out what data needs to be collected

2 weeks

Figure out where the data is coming from in the system

2 to 4 weeks

Define an implementation plan including technical details

2 to 4 weeks

Make the structural changes to the data warehouse

4 to 8 weeks

Execute the plan including regression and bug fixes onto the system

8 to 16 weeks

Collect data

depending on nature of the question and the need for a statistical significant data set - 12 to 20 weeks

Extract the data and start doing data exploration

4 to 8 weeks

Perform data analysis

Assuming the organization has the capabilities - 2 to 4 weeks

Develop models

Assuming the organization has the capabilities - 6 weeks (minimum)

First draft of an answer

4 weeks

It will take a minimum of 47 weeks, to answer a question where market timing is paramount.
After 47 weeks, the question is no longer relevant.

Data Lake

Est. time to conclusion: 11 Weeks

Before we look onto what it takes to answer the question using modern methodologies, let’s understand what a data lake is:

What is a Data Lake?

Data Lakes are data stores where there is a raw and a cleansed or curated component. The raw component contains all data generated by a system, whether it will be used or not. Storage is inexpensive and the implications of not keeping data are far riskier than the storage cost. The cleansed or curated layer is the needed subset of data that is extracted from the raw layer and is the data needed “right now”. Visualization tools use this layer to create dashboards and produce reports.

The advantage of keeping all data in the raw layer is that when needed the data is there to be curated. Moreover, there is no need to “improve” the data collection infrastructure.

Let’s look at the same use case using a data lake and experimental data science:

Translate the question into a requirement

1 to 4 weeks

Figure out what data needs to be extracted from the data lake

1 to 2 weeks

Extract the data into the cleansed layer and start doing data exploration

2 to 4 weeks

Perform data analysis

2 to 4 weeks

Reuse models or write new ones

Normally based on existing models - 3 to 6 weeks

First draft of an answer

2 weeks

This takes only 11 weeks to answer the same question.

A few areas to note:

No need to wait for data to be collected because it is always fully collected
Data exploration and analysis are two tasks that are constantly going on under this model; adding new and/or more data and refocusing is also a constant
Modifying existing models or even creating new models is accelerated because they are constantly needed for data exploration
Tools are already in place and in constant use

FESSEX Consulting can help you move to a more effective methodology to manage and use your data to generate actionable insights and dramatically improve your operations.

Contact us today & Get a FREE Discovery Call! Start Here

Using Data Lakes and Experimental Data Science to Accelerate Answering Questions

Data Warehouse

Translate the question into a requirement

Figure out what data needs to be collected

Figure out where the data is coming from in the system

Define an implementation plan including technical details

Make the structural changes to the data warehouse

Execute the plan including regression and bug fixes onto the system

Collect data

Extract the data and start doing data exploration

Perform data analysis

Develop models

First draft of an answer

It will take a minimum of 47 weeks, to answer a question where market timing is paramount.
After 47 weeks, the question is no longer relevant.

Data Lake

What is a Data Lake?

Translate the question into a requirement

Figure out what data needs to be extracted from the data lake

Extract the data into the cleansed layer and start doing data exploration

Perform data analysis

Reuse models or write new ones

First draft of an answer

This takes only 11 weeks to answer the same question.

FESSEX Consulting can help you move to a more effective methodology to manage and use your data to generate actionable insights and dramatically improve your operations.

Contact us today & Get a FREE Discovery Call! Start Here

Previous Post5 Step Visual Guide for Understanding Data Insights

Next Post17 Most Innovative Predictive Analytics Startups & Companies (Los Angeles)

Data Lakes Vs Data Warehouses

Contact us today & Get a FREE Discovery Call! Start Here

Using Data Lakes and Experimental Data Science to Accelerate Answering Questions

Data Warehouse

Translate the question into a requirement

Figure out what data needs to be collected

Figure out where the data is coming from in the system

Define an implementation plan including technical details

Make the structural changes to the data warehouse

Execute the plan including regression and bug fixes onto the system

Collect data

Extract the data and start doing data exploration

Perform data analysis

Develop models

First draft of an answer

It will take a minimum of 47 weeks, to answer a question where market timing is paramount. After 47 weeks, the question is no longer relevant.

Data Lake

What is a Data Lake?

Translate the question into a requirement

Figure out what data needs to be extracted from the data lake

Extract the data into the cleansed layer and start doing data exploration

Perform data analysis

Reuse models or write new ones

First draft of an answer

This takes only 11 weeks to answer the same question.

FESSEX Consulting can help you move to a more effective methodology to manage and use your data to generate actionable insights and dramatically improve your operations.

Contact us today & Get a FREE Discovery Call! Start Here

Previous Post5 Step Visual Guide for Understanding Data Insights

Next Post17 Most Innovative Predictive Analytics Startups & Companies (Los Angeles)

It will take a minimum of 47 weeks, to answer a question where market timing is paramount.
After 47 weeks, the question is no longer relevant.