Data warehouses are used by organizations to liberate their information assets –information that is otherwise trapped across a myriad of disparate operational systems. Organizations use data warehousing architectures to better understand their customers, improve marketing campaigns, enhance customer service, mitigate risk, and so on.
Controversy is the operative word when it comes to selecting data warehousing architecture. At times, it almost sounds like a religious debate as professionals argue about how to build, manage, and use data warehouses and data marts.
So, with all due respect, here is my version of truth. Needless to say, having designed a few multi-terabyte, multi-billion row data warehouses in my day, I’m passionate about this.
This is a warehouse (this picture is worth a 1,000 words). More specifically, a Wal-Mart warehouse located in the middle of nowhere on Interstate 15 between Las Vegas and Salt Lake City.
Warehouses are about strategic distribution. They are engineered to support three primary functions: (1) a receiving function; (2) a staging function; and, (3) a distribution function. Ideally, warehouses are strategically located, i.e., physically constructed in areas where expansion is economical, convenient and located in proximity to efficient distribution channels (think highways or railways). Warehouses are designed to support everchanging inventory requirements (e.g., from pet rocks to tandem bicycles). Their inventory is organized towards maximizing efficiency at scale (e.g., pallets and forklifts). And they are appropriately secured (e.g., protected by a fenced perimeter and a guardhouse which controls the arrival and departure of product).
Marts (picture this), on the other hand, are located and engineered to serve users. They are conveniently located and readily accessible (e.g., on site parking). Content is highly predictable – consumers know which marts have what product. Inventory is organized in a manner best suited to the products offered and customer expectations. This is why Starbucks, Kroger and Payless shoe stores all have unique and highly specific inventory models (picture this). Product is often presented in a manner designed specifically to drive consumption, and frequently optimized towards guiding consumers towards product with higher margins. Marts are secured according to the value of the content – that is why pharmacies are secured differently than 7-11’s.
Structure governs function. Production facilities, warehouses, and marts have different purposes. Therefore, each will structure inventory appropriately.
Question: When is a data warehouse not a warehouse?
Answer: When consumers are found running around the warehouse looking for size 10 shoes.
In my view, many data warehouses are really marketing-oriented data marts because they are engineered solely to serve a specific user mission (not strategic distribution). This is not to say these systems aren’t valuable. I just would not call them warehouses.
Let me suggest only building warehouses when some large number of distribution end points (marts) are envisioned in the future. Otherwise, a more efficient use of resources is to build a few specific marts without building a warehouse at all. Walmart would not have built that warehouse in the middle of nowhere without anticipating a myriad of storefronts (marts).
My “Two Cents” Technical Note: Star schemas don’t belong in the warehouse, but are well suited for certain types of data marts. And, if you are looking for a near real-time data warehouse, start by thinking about OLTP-like schemas (e.g., 3NF). The schemas used by data warehouses and data marts are critical to achieving scalability and sustainability.