Data exploration is a common process in data warehouses which are characterized by large bulks of data coming from disparate systems. Since these systems can have data in different formats and sources also, when data converge in a central data warehouses, it may be a very difficult task to get the relevant data needed for statistical reporting as well as trend and pattern spotting.
Data exploration helps a data consumer focus an information search on the pertinent aspect of relevant data before true analysis can be achieved. In large data sets, data is not gathered or controlled in a focused manner. Even in smaller data sets, it is also true that data gathered are not in a very rigid and specific technique can result in a disorganized manner and a myriad of subsets each.
In most cases, without a set of techniques, narrowing an information search may cause several problems because one may lost important perspectives of the relevant data among the myriad of sets of unrelated data.
There are generally two methodologies one can have to get relevant data from huge data sources or sets. These are manual and automatic techniques. They are more commonly known as data mining for automatic and data exploration for manual. Although they are categorized as such, these terms are not really well defined in the real IT sense.
Data Mining, along with its near relative, data prospecting, has a wide variety of usage and has been considered by many as a very abused term in everyday usage. Some people consider it as synonymous with data analysis although many believe that they are technically different.
Data mining is a methodology commonly used on very large datasets. In fact, they are used in entire databases running a data warehouse. A common definition of data mining is "the nontrivial extraction of implicit, previously unknown, and potentially useful information from data" or "the science of extracting useful information from large data sets or databases". Although data mining is guided by a human being specifying some parameters, it is an automated algorithm handling the mechanism to carry out the search.
On the other hand, Data Exploration is methodology using manual techniques in order for data user to find his way through large bulks of data and bring important and relevant data to be focused and utilized for analysis. The methodology may apply to data of any type or size but because of its manual nature, many opt to use data exploration for smaller data sets.
While data mining may have the advantage of faster search because of its automatic nature, there also advantages of using the data exploration methodology. One of the major advantages of data exploration is that its manual approach makes the mechanisms unhindered in exploring particular aspects of desired data. In contrast, the automated method is limited to the design because the automating algorithms are fixed and rigid.
To make things easy to understand, it is good to use data exploration when that data needed has very specific and uncommon parameters. Using the data exploration methodology may take a longer time but with proper organization such as constantly keeping records about the exploration, recording one’s thoughts and ideas along the way, and organizing the search results, the complex undertaking may yield better rewards.