Automatic data partitioning is the process of breaking down large chunks of data and metadata at a specific data site into partitions according to the request specification of the client.
Data sites contain multitudes of varied data which can be extremely useful as a statistical basis for determining many trends in businesses. Because data in the data sites can grow at a very fast rate, the demand for internet traffic also increases. a good software with partitioning capability should be employed to manage the data warehouse. Many software application handling data also have advanced functions like traffic shaping and policing so that sufficient bandwidth can be maintained.
Relational database management systems (RDBMS) effectively manage data sites. This database system follows the relational model introduced by E. F. Codd in which data is stored tables while the relationship among data is stored in another tables. This is in contrast to flat files where all data is stored in one contiguous area.
Since RDMS data is not stored in one contiguous area but instead broken down into tables, it becomes easy to partition data whether manually or automatically for easy sharing and distribution.
The biggest advantage to data partitioning is that I can divide large tables and indexes into smaller parts and as a result, the system’s performance can be greatly improved while contention is reduced and data availability and distribution is increased. Automatic data partitioning makes the job of the database administrator a lot easier especially in labor intensive jobs such as doing back ups, loading data, recovering and processing a query.
Data partitioning is commonly done by either splitting selected elements or by creating smaller separate databases each containing the basic components like tables, indexes, and transaction logs.
Horizontal partitioning is a technique where different rows are placed into different tables. For example, zip codes with less than 25000 are placed in a table called EasterCustomer while those greater than 25000 are placed in a table called CustomerWest. If customers want to view a complete list of records, the database uses a view with union function.
Vertical partitioning is another technique wherein tables are created with fewer columns with additional separate tables to store the rest of the remaining columns. Usually, the process involves the use of different physical storage.
Data partitioning is used in a distributed database management system, a software systems which can allow the management of a distributed database. A Distributed database is a collection of many database which are logically interrelated and distributed over many computers in a network. This can allow certain clients to view only the data they need in their specifications while the rest of the viewer can see all the data as one not partitioned.
Most of today’s most popular relational database management systems have different criteria for partitioning data. Their only similarity is that they take a partition key and assign a portion based on some criteria.
Some of the partitioning methods used as criteria include range partitioning, list partitioning, hash partitioning and composite partitioning.
In range partitioning, the database systems selects a partition if the partitioning key is within a certain given range. For example, a partition could include all the rows where a zip code column has values between 60000 and 69999.
List partitioning is a method where a partition is assigned a specific list of values like a list of all countries in Southeast Asia.
Hash partitioning uses the value taken from a hash function. For instance, if there are partitions, the value returned for the function could be from 0 to 3.
Composite partitioning take a combination from the above mentioned portioning methods.