B-Tree is an indexing technique most commonly used in databases and file systems where pointers to data are placed in a balance tree structure so that all references to any data can be accessed in an equal time frame. It is also a tree data structure which keeps data sorted so that searching, inserting and deleting can be done in logarithmic amortized time.
Compared to a real tree, the B-Tree has roots at the top and leaves and nodes at the bottom.
The B-Tree belongs to a group of techniques in computer science known as self-balancing search trees which attempts to automatically keep the number of levels of nodes under the root small at all times. It is the most preferred way to implement sets, associative arrays and other data structures that are used in computer programming languages, relational database management systems and low level data manipulations.
In the B-Tree, the records are stored in the leaves. This is the location where there is nothing more beyond it. The order of the tree determines the maximum number of children per node. Depth refers to the number of required disk access. A B-Tree can have up to millions and billions of records although it is not all the time that leaves necessarily contain a record but more than half certainly do.
When decision points on the tree, which are called nodes, are on the hard disk instead of on random access memory (RAM), B-Tree is the preferred technique as hard disks could work a thousand times slower compared to RAM because processes on hard disks requires mechanical parts. On RAM, processes are done purely in electronic media.
The nodes in a B-Tree can have a variable number of child nodes within a range pre-defined by the system. When a data is inserted or removed from a node, the number of child nodes also changes but the pre-defined ranged should be maintained so internal nodes may either be split or joined.
B-trees do not need frequent re-balancing as both the upper and lower bounds on the number of child nodes are typically fixed. As an example, a 2-3 B-Tree implementation has internal nodes that can only have 2 or 3 child nodes.
To keep the B-tree well balanced, all leaf nodes are required to be of the same depth. The depth only increases very slowly and infrequently.
Searching in a B-Tree structure starts from the root and traversed from top to bottom. Insertion is done by looking for a node where a new leaf or element should be. If there is still room or the maximum legal number of elements is not exceeded, insertion takes place. Otherwise, the leaf node splits into another tow nodes. Deletion in a B-Tree has two strategies. The first involves locating the item to be deleted and immediately doing the action then restructuring the tree. The second involves doing a traversing down the tree and laying out the restructure before deleting.
In a file system, a file may contain any number of B-Trees and each B-Tree must have a unique name composed of any string of characters. Each B-Tree names is saved in the file an item containing the number of the rood node of the B-Tree. Searching, inserting and deleting through the B-Tree starts from the root node.
The B-Tree was created by Rudolf Bayer and Ed McCreight. The B in the B-Tree has ambiguous meaning. Some say it stands for Bayer while others say it stands for Balanced and still there are other who consider it to stand for Boeing where both men were working for Boeing Scientific Research Labs at the time they created the B-tree.