Data Thesaurus

Data Thesaurus deals with understanding patterns, trends, and relationships in historical data, and providing visual information to the decision maker.   Data Thesaurus helps to identify common business terms and data names.  It is useful for locating data in metadata warehouse.

A data thesaurus really consists of several metadata. Metadata is any kinds of data which describes another data. On the other hand, the literal meaning of thesaurus according to dictionary.com is "an index to information stored in a computer, consisting of a comprehensive list of subjects concerning which information may be retrieved by using the proper key terms."

The data thesaurus, as part of the whole data warehouse system, is being implemented in line with all business rules and enterprise data architecture. The terms within it are all pertaining to business words because these terms are chosen and assigned as subject keywords in the use of data warehouse queries and data results.

Basically, a data thesaurus contains preferred terms, non-preferred terms, specifiers and indicators.  Preferred terms are words which should be used in representing a given concept despite the fact that there may be many other words which seem fit.

For example, in a medical data thesaurus, the word "infant" may be the preferred term instead of the word "baby" despite the fact they are commonly used interchangeably in the real world.

Non-preferred terms are of course the opposite to the preferred terms but they have their own considerations too. In the event that there are two or more words which can be used in expressing the same concept, the data thesaurus specifies which one to use as the preferred while listing the others as non-preferred terms.

These non-preferred terms can be synonyms, abbreviations or alternative spellings but one is discouraged from using them. In most data thesaurus implementations, the can be easy to recognized because they are written in italics. Non-preferred terms are used simply to make sure that the preferred term is correct.

Specifiers are used when there are two or more words which are needed to express a concept. An example would be "chief executive officer". The data thesaurus will then make a cross reference about the specifier against a combination of preferred terms so the system can now how to represent the group of words. In many data thesaurus, they are also written in italics but they are typically followed by a + sign e.g. chief executive officer+.

Indicators are also the same as the non-preferred terms but they point to a selection of some possible preferred terms in case there is not exact match that can be found for the concept and a single preferred term.

The ISO 2788 sets the Guidelines for the establishment and development of monolingual thesauri. This standard defines all aspects of a data thesaurus including Scope and field of application;

  • References
  • Definitions
  • Abbreviations and symbols
  • Vocabulary control; Indexing terms (General, Forms of terms, Choice of singular or plural forms, Homographs or polysemes, Choide of terms, Scope notes and definitions)
  • Compound terms (General, Terms that should be retained as compounds, Terms that should be syntactically factored, Order of words in compound terms)
  • Basic relationships in a thesaurus (General, The equivalence relationship, The hierarchical relationship, The associative relationship)
  • Display of terms and their relationships (General, Alphabetical display, Systematic display, Graphic display)
  • Management aspects of thesaurus construction (Methods of compilation, Recording of terms, Term verification, Specificity, Admission and deletion of terms, The use of automatic data processing equipment, Form and contents of a thesaurus, Other editorial matters)

Editorial Team at Geekinterview is a team of HR and Career Advice members led by Chandra Vennapoosa.

Editorial Team – who has written posts on Online Learning.


Pin It