Data Warehouse Architecture
Data warehouse architecture refers to the nature of the warehouse, its features, as well as its functions. There are three main features of data warehouse architecture:
- data architecture
- infrastructure
- technical area
Data architecture revolves around business processes. In a business, there are common processes such as inventory, billing, paying, shipping, etc. The architecture of the system itself demands these processes. Without the buyers, there can be no orders; without consumer demand, there could be no billing or shipping; and without paying consumers, there would be no paid employees.
The next feature of data architecture is infrastructure. If you are concerned with running only a single application, your infrastructure need not be as strong as someone who creates a complex data network. However, if you intend to use your infrastructure for complex data over long periods of time, you may need to do some homework or run some tests on your computer desktop to see if the data system is doing well. The significance of the infrastructure can be seen in the truth that data warehouses are created not only to store pertinent data, but to distribute data to users.
The third feature, the technical area, is the place where the data is converted, cleansed (if need be), and interacts with computer technology before its output. This involves the steps of staging and integration. The data staging process has five steps:
- Extract
- Transform
- Load
- Security
- Job Control
In extraction, the data must be sorted out from unwanted data and combined. Data is transformed by conversion into the necessary form and interacts with other operations within the system. The data are newly combined then joined to more data. In the loading process, the information is preparing to go online for user access.
Security is where the necessary measures are established to prevent fraud and hacking techniques. One good security measure is the administrator’s access—granting only administrators access to certain files. Data encryption policies prevent computer fraud and computer hacking on your network, as well. Last but not least comes job control, where a function is created that monitors the days and times of workers (job scheduling), logging in and out of the system (which keeps track of times and schedules), as well as data access to select individuals in case of an emergency. These data warehouse concepts are important to know.
Often, when businesses are drafting data warehouse architecture, they adopt the Zachman model. The Zachman model, named after John Zachman, was invented while Zachman worked at IBM in the 1980s. The Zachman model usually comes in a 6 x 6 matrix, with one row providing six communication questions of who, what, when, where, how, and why, and six rows of program transformation: conceptual, contextual, logical, physical, and detailed. Instead, businesses could adopt a more simple model than this.
A data warehouse architecture diagram is a detailed drawing of the business processes of a company. There are five centrally important data warehouse architecture types (and thus, basic diagrams): Independent data marts, data mart bus architecture, hub-and-spoke, centralized data warehouse, and federated architecture.
The independent data mart architecture consists of a few basic units:
- source systems
- staging area
- independent data marts
The source systems, origins of data, give way to the staging area where the information is collected, sorted, combined, and then released in small units to the computer user in specific applications.
Mart bus architecture consists of a few basic units: source systems, staging area, dimensionalized data type marts, and end-user access and applications. When it comes to data mart bus architecture, the dimensionalized processes consist of basic business as orders, billing, payment, etc. The information for all these processes must be sorted out, the unnecessary data discarded, and the newly-separated data sent to the computer user.
The hub-and-spoke model contains many of the above diagram features but adds a relational warehouse and dependent data marts to its diagram. The centralized data warehouse is similar to the hub-and-spoke model, minus the dependent data marts. The federated architecture is one of the most basic of all the five architecture types: it only has three units: (1) existing data warehouses, data marts, and legacy systems, (2) logical integration of common data elements, and (3) end user access and applications. Nevertheless, its shared user field is global, so it’s information is globally accessible. Of these five types of data warehouse design, only three are used frequently: (1) data mart bus architecture, (2) hub-and-spoke, and (3) centralized data warehouse.
What are some vital data warehouse interview questions? Some examples of interview questions will come from the areas such as business intelligence (BI) and pertain to physical query, flat aggregation, contemporary business intelligence tools, as well as business content and loading data after installation.
Data mining architecture consists of data that is drawn and sorted from data sources. Its architecture consists of four types: (1) No coupling, where all the information is stored and processed in a flat file. (2) Loose coupling, where there is some gathering of data and data sorting. (3) Semi-tight coupling, where there is sorting, indexing, aggregation, and some small computations of stat functions. Lastly, there is (4) Tight coupling, where processing information is done smoothly and uniformly.
The structure of data warehouse architecture affects its usage because the complex systems (hub-and-spoke, centralized data) are more used due to their capacity to handle numerous functions and applications. Businesses are complex, and thriving businesses become even more complex. Data warehouses that are not build to handle the complex information systems of businesses (such as the no coupling data mining structure) will not last very long.