Data Partitioning is the formal process of determining which data subjects, data occurrence groups, and data characteristics are needed at each data site. It is an orderly process for allocating data to data sites that is done within the same common data architecture.
Data Partitioning is also the process of logically and/or physically partitioning data into segments that are more easily maintained or accessed. Current RDBMS systems provide this kind of distribution functionality. Partitioning of data helps in performance and utility processing.
Data Partitioning can be of great help in facilitating the efficient and effective management of highly available relational data warehouse. But data partitioning could be a complex process which has several factors that can affect partitioning strategies and design, implementation, and management considerations in a data warehousing environment.
A data warehouse which is powered by a relational database management system can provide for a comprehensive source of data and an infrastructure for building Business Intelligence (BI) solutions. Typically, an implementation of a relational data warehouse can involve creation and management of dimension tables and fact tables. A dimension table is usually smaller in size compared to a fact table but they both provide details about the attributes used to describe or explain business facts. Some examples of a dimension include item, store, and time. On the other hand, a fact table represents a business recording like item sales information for all the stores. All fact table need to be periodically updated using data which are the most recently collected from the various data sources.
Since data warehouses need to manage and handle high volumes of data updated regularly, careful long term planning is beneficial. Some of the factors to be considered for long term planning of a data warehouse include data volume, data loading window,
Index maintenance window, workload characteristics, data aging strategy, archive and backup strategy and hardware characteristics
There are two approaches to implementing a relational data warehouse: monolithic approach and partitioned approach. The monolithic approach may contain huge fact tables which can be difficult to manage.
There are many benefits to implementing a relational data warehouse using the data partitioning approach. The single biggest benefit to a data partitioning approach is easy yet efficient maintenance. As an organization grows, so will the data in the database. The need for high availability of critical data while accommodating the need for a small database maintenance window becomes indispensable. Data partitioning can answer the need to small database maintenance window in a very large business organization. With data partitioning, big issues pertaining to supporting large tables can be answered by having the database decompose large chunks of data into smaller partitions thereby resulting in better management. Data partitioning also results in faster data loading, easy monitoring of aging data and efficient data retrieval system.
Data partitioning in relational data warehouse can implemented by objects partitioning of base tables, clustered and non-clustered indexes, and index views. Range partitions refer to table partitions which are defined by a customizable range of data. The end user or database administrator can define the partition function with boundary values, partition scheme having file group mappings and table which are mapped to the partition scheme.
There are so many ways wherein data partitioning can be implemented. Implementation methods vary depending on the database software application vendor or developer. Management of these partitioned data can vary as well. But the important thing to note is that regardless of the software application implementing data partitioning, the benefits of separating data into partitions will continue to bring benefits to data warehouses, which now have become standard requirements for large companies in order to operate efficiently.