Skip to main content

FESSEX Consulting can help you accelerate or build from the ground up your “Big Data Infrastructure”, including all your data collection, Data Lake, cleans/results layer; A through Z, we will build your Data Pipeline; furthermore, once the pipeline is in place, we can work with you on what comes next, making your data actionable.

As part of building your entire (or partial) Data Pipeline, we will insure that you remain compliant with Privacy legislation (California CCPA and other US states/EU GDPR/China PIPL/Canada/India/Japan/UEA) around the world. Furthermore, we are obsessed with security and as such, our approach in any infrastructure work is security first.

One misconception that we normally have to address is the discussion between Data Lakes and Data Warehouses. They are not interchangeable. The following is a discussion that helps clarify the differences between both of them and the benefits of using Data Lakes over Data Warehouses.

Contact us today & Get a FREE Discovery Call! Start Here

Using Data Lakes and Experimental Data Science to Accelerate Answering Questions

Traditionally, companies have organized their data in data warehouses. Decisions had to be made as to what data to collect and organize and what data to ignore and lose, potentially forever. This schema presents a problem in that not all questions can be answered expediently; in particular, market-driven and strategic questions since NOT ALL DATA is kept.

Let’s look at a potential use case:

Data Warehouse

Est. time to conclusion: 47 Weeks

The organization’s CEO or CFO have a particular question; for the sake of this use case, let’s assume the data was not collected.

The following timeline will play out in order to answer the questions:

1

Translate the question into a requirement

1 to 4 weeks
2

Figure out what data needs to be collected

2 weeks
3

Figure out where the data is coming from in the system

2 to 4 weeks
4

Define an implementation plan including technical details

2 to 4 weeks
5

Make the structural changes to the data warehouse

4 to 8 weeks
6

Execute the plan including regression and bug fixes onto the system

8 to 16 weeks
7

Collect data

depending on nature of the question and the need for a statistical significant data set - 12 to 20 weeks
8

Extract the data and start doing data exploration

4 to 8 weeks
9

Perform data analysis

Assuming the organization has the capabilities - 2 to 4 weeks
10

Develop models

Assuming the organization has the capabilities - 6 weeks (minimum)
11

First draft of an answer

4 weeks

It will take a minimum of 47 weeks, to answer a question where market timing is paramount.
After 47 weeks, the question is no longer relevant.

Data Lake

Est. time to conclusion: 11 Weeks

Before we look onto what it takes to answer the question using modern methodologies, let’s understand what a data lake is:

What is a Data Lake?

Data Lakes are data stores where there is a raw and a cleansed or curated component. The raw component contains all data generated by a system, whether it will be used or not. Storage is inexpensive and the implications of not keeping data are far riskier than the storage cost. The cleansed or curated layer is the needed subset of data that is extracted from the raw layer and is the data needed “right now”. Visualization tools use this layer to create dashboards and produce reports.

The advantage of keeping all data in the raw layer is that when needed the data is there to be curated. Moreover, there is no need to “improve” the data collection infrastructure.

Let’s look at the same use case using a data lake and experimental data science:

1

Translate the question into a requirement

1 to 4 weeks
2

Figure out what data needs to be extracted from the data lake

1 to 2 weeks
3

Extract the data into the cleansed layer and start doing data exploration

2 to 4 weeks
4

Perform data analysis

2 to 4 weeks
5

Reuse models or write new ones

Normally based on existing models - 3 to 6 weeks
6

First draft of an answer

2 weeks

This takes only 11 weeks to answer the same question.

A few areas to note:

  • No need to wait for data to be collected because it is always fully collected
  • Data exploration and analysis are two tasks that are constantly going on under this model; adding new and/or more data and refocusing is also a constant
  • Modifying existing models or even creating new models is accelerated because they are constantly needed for data exploration
  • Tools are already in place and in constant use

FESSEX Consulting can help you move to a more effective methodology to manage and use your data to generate actionable insights and dramatically improve your operations.

Contact us today & Get a FREE Discovery Call! Start Here