text mining

by gowtham 2010-02-11 17:04:30

Text mining has been on the radar screen of corporate users since the mid-eighties. Technical limitations and the overall complexity of utilizing data mining has been a hurdle to text mining that has been surmounted by very few organizations. The early proponents of text mining have been surveillance agencies such as Central Intelligence Agency (CIA) and Britain's MI6.

The security agencies have a government mandate to intercept data traffic and evaluate for items of interest. This also involves intercepting international fax transmission electronically and mining for patterns of interest.

With a covert start, text mining is coming out into the open. Some of the reasons are:

* Storage cost reduction - data are more likely to be stored in a electronic medium even after being declared non-active.
* Data volume increase - the exponential growth of data with the lowering of data transmission cost and increasing usage of the Internet.
* Fraud detection and analysis - there are compelling reasons for organizations to redress fraud. The federal government has mandated new laws to curtail corporate fraud - HIPAA, GLBA, Sarbanes-Oxley, SEC and NASDAQ Compliance.
* Competitive advantage - text mining is used to better understand the realms of data in an organization. An example: Customers contact a company via the Call Center and emails. The notes of the call center reps are achieved along with the emails. Text mining can be used to discover clusters of interesting patterns in customer interactions - "Are Ford Explorers more likely to roll-over when used with Firestone tires?" or "Are errors more likely to occur on the Windows XP platform?".

Text data is so called unstructured data. Unstructured data implies that the data are freely stored in flat files (e.g. Microsoft Word) and are not classified.

Structured data are found in well-designed data warehouses. The meaning of the data is well-known, usually through meta-data description, and the analysis can be performed directly on the data.

Unstructured data has to jump an additional hoop before it can be meaningfully analyzed - information will need to be extracted from the raw text.

Tagged in:

1053
like
0
dislike
0
mail
flag

You must LOGIN to add comments