Category Archives: SQL Database Study

Data Warehouses and Data Lakes

n this assignment you will grapple with the issue of when to use a data warehouse and when to use a data lake. By doing this assignment you will sharpen your understanding the considerations involved in making these decisions.

This assignment supports the following objectives:

Identify the defining characteristics of a data warehouse
Describe the difference between a data warehouse and a data lake
Identify situations in which a data lake should be used over a data warehouse and vice versa.
For each of these scenarios, decide whether you will use a data warehouse or a data lake to store and manage the data. Justify your answer in terms of the variety of data, the volume of the data, and the velocity of the data as well as what sorts of applications need to be supported in these scenarios.

Scenario 1
A car rental company has over 600,000 cars. Each car has about 30 sensors which the car rental company wants to monitor at 15 minute intervals. The company wants to collect that data for multiple purposes. First, they would like to understand how the car is being driven during rental period (How far? At what speeds? How much time spent in the idling mode? Etc.) in order to possibly make adjustments to their business model as well as to make adjustments the type of routine maintenance of the cars. Second, they would like to know where the cars are located (in order to possibly alert emergency crews, should the need arise). And, third, they would like to determine the current driving condition of the car (that is, the probability that it will need non-routine maintenance work in the immediate future) in order to possibly substitute another car for a car with a high probability of needing non-routine maintenance work. You are tasked to design a system that allows the company to store the data with the ability to analyze and report on it.

Scenario 2
A utility (electricity) company collects the usage of its 5 million customers for the lifetime of their subscription. The company would like to provide a service to its customers where they are able to monitor their hourly, daily, weekly, and monthly usage. You are tasked to design a system that allows the company to collect and store the data, and present that data to their customers on an ad hoc basis. (Please keep in mind that in this scenario there is no requirement that this data source will be used to make any determinations of repair, maintenance, or electricity outages).

The assignment is worth 6 points total: 3 points for each scenario (1 point for making the correct determination, 2 points for the quality of the justification provided.)

Your answers will be assessed on what considerations you bring to bear in justifying your decisions, how clearly you see the types of data that is needed in these scenarios, and in what detail you conceive the applications built on this data.