Data Mining
by Albert Fitzgerald
Editor's Note: Orginally published in the San Diego Daily Transcript, April 2000
Miners looking for precious metals are faced with a daunting challenge that tests their problem solving ability. Buried deep inside a mountain is gold. Obviously, miners don't reduce an entire mountain to rubble with dynamite. Instead, miners probe at key areas of the mountain, systematically and scientifically searching for those lucrative deposits.
Companies looking to strike bottom-line "paydirt" face the same challenges. All too often, companies are sitting on a mountain of data in the form of a large database with no efficient or effective way to extract "nuggets" of valuable information. That's where data mining comes in. Data mining is the automated analysis of large data sets to reveal hidden trends or patterns of information that might otherwise go undiscovered. This previously undiscovered information helps your company make better business decisions by leveraging the combination of your data and your expertise.
Data mining is the marriage of cutting-edge advanced analytical techniques and today's high-speed computing power. The synergy created by this union of analysis and technology can reveal useful patterns of information virtually impossible to discover without data mining.
Data mining has been used to help companies make important business projections in just about every industry. It has helped predict probability of default for consumer loan applications, reduce fabrication flaws in semiconductor chips, and predict audience share for television programs.
Organizations that mine their data save money by making their processes more efficient and make money by finding new sources of revenue.
However, data mining's effectiveness can only be maximized when you've sufficiently prepared yourself, and your data, for the task. There are four key elements essential to successful data mining: establishing objectives/approach/design, data collection, data preparation, and data analysis.
A manufacturer recently hired us to perform data mining on a database of resellers of one of their computer/networking product lines. A step-by-step review of this recent project will serve as a real-world illustration of data mining in action.
Establishing Objectives, Research Approach and Design
Although each step of the research process is important, identifying and defining the marketing research problem is the most important step. The research team needs to have a clear understanding of the problem before the study objectives can be defined. What does your company plan to do with those information "nuggets" once they're extracted? Our client's objectives were to understand and profile these resellers, help develop a strategy for contacting key resellers, and maximize the effectiveness of a sales campaign to these resellers.
Once you've established your objectives, the next step is to develop a research approach and design. What analytical models, research questions and data collection methods do you plan to use? Are you identifying the top 10%? Are you looking to see who is most satisfied?
Understanding the types of data you need to capture is essential. Your data might be interval (scale from 1 to 10), ordinal (data ranked by order), or nominal (no order to data). It is typically preferable to have interval data because you can apply more analytical techniques (mean, median) more easily, thereby increasing the computing power and effectiveness of the data mining process.
Data Collection Process
Now it's time to look at data collection. Where do you get your data? Any and all sources can be useful to the data mining process. Internal data sources such as web site hits, prospect lists, custom surveys, and old customer data records are all valuable. External data sources such as purchased lists or panel lists are also relevant. Our client's database consisted of reseller names and addresses they had captured through several sources, such as leads, inquiries, past sales and trade shows. There were over 34,000 records of resellers.
Once you've established what data you already possess, the cleaning and formatting of that data begins.
Data Preparation
Before raw data can be subjected to statistical analysis, it must be converted into a form suitable for analysis. Data preparation includes editing, coding, data cleaning (consistency checks and missing responses) and statistically adjusting the data (weighting and variable respecification). Check your database thoroughly, and be sure to eliminate duplicate records and cases. Sorting by variables like phone number, company name or address can identify redundant records. Our client's original database consisted of reseller names and addresses they had captured through several sources, such as leads, inquiries, past sales and trade shows. There were over 34,000 records of resellers, all thought to be unique.
The data cleaning uncovered a vastly different looking database. By eliminating duplicate records, the list of unique records was reduced to 11,700 truly unique records, a reduction of over 65%.
Once the unique list was established, the data was formatted. All non-resellers were extracted from the database. A database that began with 34,000 entries was pared to an accurate list of 8,750 qualified resellers.
Data Analysis
Now that the data has been cleaned and formatted, it's time to analyze the data and answer study objectives. A number of statistical techniques can be employed. Statistical techniques can be classified as univariate or multivariate. In some cases, basic statistics may be enough. Frequencies (histograms), means and medians can often tell you a lot. Data reduction, segmentation and modeling techniques may also help.
When we applied statistical and factor analysis techniques to our client's data, the data mining finally struck "paydirt." The analysis revealed that of the 34,000 resellers in our client's database, less than 1,000 resellers accounted for 95% of all unit sales. These critical information nuggets, extracted from a mountain of data, enabled our client to wisely reject their planned implementation of an expensive and unsuccessful mass mail campaign.
The trends and patterns revealed from the data mining influenced our client to implement a targeted marketing strategy focusing on their top 1,000 resellers. It's likely that our client would not have executed such a highly-focused and effective marketing strategy without the patterns and trends of information revealed through data mining.
There's no question that data mining is a powerful research tool that can help your company make important business decisions. So the next time you find yourself sitting atop a mountain of data and no clue where to start digging for gold, don't reach for the dynamite. Just grab a pickax and give data mining a try — you just might strike it rich.
Return to Articles Home
Back to top
|