SQL and Data Warehousing

by Vickram H 2012-07-31 10:11:00

For several years, the effort to make SQL a viable technology for OLTP applications
shifted the focus away from the original relational database strengths of query processing
and decision making. Performance benchmarks and competition among the major DBMS brands focused on simple transactions like adding a new order to the database or
determining a customer's account balance. Because of the power of the relational database model, the databases that companiesused to handle daily business operations
could also be used to analyze the growing amounts of data that were being accumulated.
A frequent theme of conferences and trade show speeches for IS managers was that a corporation's accumulated data (stored in SQL databases, of course) should be treated as a valuable "asset" and used to help improve the quality of business decision-making.
Although relational databases could, in theory, easily perform both OLTP and decision-making applications, there were some verysignificant practical problems. OLTP
workloads consisted of many short database transactions, and the response time for
users was very important. In contrast, decision-support queries could involve sequential scans of large database tables to answer questions like "What is the average order size
by sales region?" or "How do inventory trends compare with the same time a year ago?"
These queries could take minutes or hours. If a business analyst tried to run one of these queries during a time when business transaction volumes reached their peak, it could
cause serious degradation in OLTP performance. Another problem was that the data to answer useful questions about business trends was often spread across many different
databases, typically involving different DBMS vendors and different computer platforms.
The desire to take advantage of accumulated business data, and the practical
performance problems it caused for OLTP applications, led to a new database trend
called "data warehousing." The idea of the data warehouse is shown in Figure 3-6.
Business data is extracted from OLTP systems, reformatted and validated as necessary,
and then placed into a separate database that is dedicated to decision-making queries
(the "warehouse"). The data extraction and transformation can be scheduled for off-hours
batch processing. Ideally, only new or changed data can be extracted, minimizing the
amount of data to be processed in the monthly, weekly, or daily warehouse "refresh"
cycle. With this scheme, the time-consuming business analysis queries use the data
warehouse, not the OLTP database, as their source of data.
SQL-based relational databases were a clear choice for the warehouse data store
because of their flexible query processing. A series of new companies was formed to
build the data extraction, transformation,and database query tools needed by the data
warehouse model. In addition, DBMS vendors started to focus on the kinds of database
queries that customers tended to run in the data warehouse. These queries tended to be
large and complex—such as analyzing tens or hundreds of millions of individual cash-register receipts to look for product purchase patterns. They often involved time-series
data—for example, analyzing product sales or market share data over time. They also
tended to involve statistical summaries of data—total sales, average order volume,
percent growth, and so on—rather than the individual data items themselves.

Tagged in:

770
like
0
dislike
0
mail
flag

You must LOGIN to add comments