Powered By

Free XML Skins for Blogger

Powered by Blogger


Data mining or Data Search - What is the difference and why will it make the difference in the future?

Data Search or Data mining?
Today all of us do our data search on internet today with an internet search engine like www.google.com or www.yahoo.com or its likes. Anyone knows that it is not rare to get hits by the millions for one search alone. In best cases we get only a couple of 10’s of thousands of hits on our keyword we searched for. How is it possible for us to extract any knowledge from all these hits we encounter from our searches? And how do we qualify the hits we get from these as well? This is were we differentiate between the data search and data mining. Data mining goes beyond the search as it tries to find relationships as well as do investigations into unstructured or untraditional data and transform it to knowledge instead of just another information structure.

Data Mining
Data mining is a huge field of interest for many various groups of people and organizations. There is lots of literature, lectures, software available to assist us in our data mining, despite this there are lots of unsolved applied Computer Science research problems in areas such as Artificial Intelligence (AI), Computer Architecture & Engineering (ARC). Database Management Systems (DBMS), Graphics and Human-Computer Interaction (GHCI), Operating Systems & Networking (OSNT), Programming Systems (PS), Scientific Computing (SCI), Security (SEC) and finally Theory (THY).
Data Mining (DM) or at is also called Knowledge Discovery in Databases (KDD) has increased in focus as our bases of data has become larger and more complex. More and more data are found within text just as you read now. Not all data can be searched in traditional manner, such as photos. Rarely you can find what you need from headers, web content keywords and its like. You have to dig pretty much deeper in order to get what you really look for. So, Knowledge Discovery Database or Data mining is consisting of various elements. It includes data discovery, cleaning, and preparation and Visualization is a key component (and can be very problematic). It often involves a search for patterns, correlations, etc.; and automated and objective classification.
It includes data modeling and testing of the models which again depends a lot on the type of data, the study domain (science, commerce, and so on), the nature of the problem and so forth. In general the Data mining algorithms are computational embodiments of statistics

The Industry
There has developed an industry of keyword experts into the market the later years, these experts claim to give you the best hit rate on your web page by using the right key words on your web sites. This may be true, but is that all that takes to be successful on the web? Or is this that will distinguish you from the rest as an individual or as a company? I am not so sure, and I think this really is an overstated marketing hype, however should not be disregarded, but the focus has to come somewhere else to make you stand out in the personal or business market place.
Apart from the internet industry and their keen interest in keywords, we have to look into the matter of data retrieval and advanced search apart from using the internet search engines. We have to first see what kind of research are done in this area.

Data Mining Research - Background
Do we have the capacity to perform real data mining, and most important of all, do we have any research arenas for this type of activity – data mining? We also need to have an understanding of who are the main contributors to information on the internet and how objective and genuine this information is. This will assist you in transforming the information to knowledge.

Data Mining Research - Commercial Side
Yes, we have several private companies such as Clear Forest, that is doing Commercial ROI of Text-Mining, and we have Entopia, Inc. doing Social Networks Analysis and Mining and Inxight which is one of the leading provider of enterprise software solutions for information discovery from unstructured data. Then we have Array Biopharma which uses Visualization and Data Mining to Decipher Chemical and Biological Data.

Data Mining Research - Academic Side
Several universities around the world are dealing with Data Mining as a specific topic in their curriculum. However, few are really deep into this topic, and therefore lots of the research we see today, is founded by private commercial interests.
There is however one academia that we can see makes use of Data Mining as one of their branches of Computer Science, and that is California Tech in USA. Professor S. G. Djorgovski heads introduction and deepening into this subject and clearly he sees the need for education and research into this subject.

Summary
Do we understand the ramification of our data bases and how much we can retrieve from them, my answer is still no!
We do not understand as individuals nor as companies the value of all data we have and how little knowledge we extract from them in our daily use. I have touched into social networks sites on the web and their sparse use of knowledge extraction through their social network analysis tools, as well as parts of the industry like in pharmacy, patent organizations using knowledge Discovery Database or Data mining in their daily work. However given the fact that internet still is immature and still the users are immature, we struggle with the fact that we do not utilize the power we have at hand. As individuals to empower our intelligence or as said by some in the industry, Intelligence Amplifiers or as organizations or companies, to create the business advantage or create new ideas or to become real innovators within the field of interest. Also as marketing tools, the real Data miners are bound to be the winners.

Afterwords
I have some key reference resource persons for you to explore further, and to go from there. That is if you would like to explore more on this subject. Mind yourself, this is not an extensive reference library, but only intended for the purpose to get you into the subject and to understand how much data mining involves and how many angles there are to this subject.
Gregory Piatetsky-Shapiro's KDNuggets
Andy Pryke's "The Data Mine”
ACM's Special Interest Group on Knowledge Discovery in Databases and their newsletter, "Explorations"
Andrew Moore's statistics and data mining tutorials
The Classification Society of North America (CSNA)
Weka package
David Dowe's mixture modeling
Fionn Murtagh's multivariate data analysis software
StatLib at CMU
StatCodes at PSU

By : Stig Kristoffersen

This entry was posted on Wednesday, June 18, 2008 at 2:37 AM . You can follow any responses to this entry through the .

0 comments