Contributed
articles
Discovering matrix association from biological
databases
G. B. Singh
(available in PDF
and Postscript
formats)
ABSTRACT: Biological databases have continued
their exponential growth over the last decade, and data mining holds considerable
promise for knowledge discovery in these databases. Discovery of the elements
of locus control from genetic sequences is a significant problem as these
elements are responsible for gene expression and viability of an organism.
Matrix Attachment Regions or MARs are one such type of elements where the
detection has been hampered due to the limited knowledge about their structure.
A discovery approach utilizing statistical estimation of "interestingness"
has been implemented in MARWiz software described in this paper. The
strategy described is of general applicability for detecting other classes
of signals in time-series or DNA sequence data
A note on "Beyond Market Baskets: Generalizing
association rules to correlations"
K.M.Ahmed, N.M.El-Makky,
Y.Taha
(available in PDF
and Postscript
formats)
ABSTRACT: In their paper \cite{dm1}, S.
Brin, R. Motwani and C. Silverstien discussed measuring significance of
(generalized) association rules via the support and the chi-squared test
for correlation. They provided some illustrative examples and pointed that
the chi-squared test needs to be augmented by a measure of interest that
they also suggested. This paper presents a further elaboration and
extension of their discussion. As suggested by Brin et al, the chi-squared
test succeeds in measuring the cell dependencies in a 2x2 contingency table.
However, it can be misleading in cases of bigger contingency tables. We
will give some illustrative examples based on those presented in \cite{dm1}.
We will also propose a more appropriate reliability measure of association
rules
Reports
from the KDD-99 Conference
KDD-99: The Fifth ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining
S. Chaudhuri, D. Madigan,
U. Fayyad
(available in PDF
and Postscript
formats)
ABSTRACT: KDD-99 was the fifth conference
in the KDD series attracting over 200 high quality submissions and almost
600 attendees. Here we describe some of the highlights of the technical
program.
Keywords: KDD Conference overview, ACM SIGKDD.
Data Snooping, Dredging and Fishing: The Dark
Side of Data Mining
D. Jensen
(available in PDF
and Postscript
formats)
ABSTRACT: This article briefly describes
a panel discussion at SIGKDD99.
Keywords:Overfitting, SIGKDD99, Panels
Integrating Data Mining into Vertical Solutions
R. Kohavi, M.
Sahami
(available in PDF
and Postscript
formats)
ABSTRACT: At KDD-99, the panel on Integrating
Data Mining into Vertical Solutions addressed a series of questions regarding
future trends in industrial applications.Panelists
were chosen to represent different viewpoints from a variety of industry
segments, including data providers (Jim Bozik), horizontal and vertical
tool providers (Ken Ono and Steve Belcher respectively), and data mining
consultants (Rob Gerritsen and Dorian Pyle).Questions
presented to the panelists included whether data mining companies should
sell solutions or tools, who are the users of data mining, will data mining
functionality be integrated into databases, do models need to be interpretable,
what is the future of horizontal and vertical tool providers, and will
industry-standard APIs be adopted?
Knowledge Discovery in Databases: Ten years
after
G. Piatetsky-Shapiro
(available in PDF
and Postscript
formats)
ABSTRACT: In this paper, we describe the
past 10 years of KDD and outline predictions for the next 10 years.
Keywords:Knowledge Discovery in Databases, Data Mining, KDD,
History.
Knowledge Discovery in Databases: A discussion
on the last 10 and next 10 years
R. Quinlan
(available in PDF
and Postscript
formats)
ABSTRACT: This paper presents the authors
impressions at the panel with the above title held at KDD-99.
KDD-99 Classifier learning contest
Overview: C. Elkan
(available in PDF
and Postscript
formats)
ABSTRACT: This paper presents a summary
of the results of the classifier learning track of the KDD cup competition
held at KDD-99.
First winner report:
B. Pfahringer
(available in PDF
and Postscript
formats)
ABSTRACT: The first place winners of
the classifier learning contest describe their method in this report.
Second winner report:
I. Levin
(available in PDF
and Postscript
formats)
ABSTRACT: Kernel Miner is a new data-mining tool
based on building the optimal decision forest. The tool won second place
in the KDD'99 Classifier Learning Contest, August 1999. We describe the
Kernel Miner's approach and method used for solving the contest task. The
received results are analyzed and explained.
Keywords: Data Mining competition, decision trees,
optimal decision forest, classification, prediction.
Third winner report:
M. Vladimir, V. Alexei,
S. Ivan
(available in PDF
and Postscript
formats)
ABSTRACT: The MP13 method is best summarized as
recognition based on voting decision trees using "pipes" in potential space.
Keywords: Voting; Decision Tree; Potential Space
KDD-99 Knowledge discovery
contest
Overview: C. Elkan
(available in PDF
and Postscript
formats)
ABSTRACT: This paper presents a summary of the
results of the knowledge discovery track of the KDD cup competition
held at KDD-99.
Co-winner 1: J.
Georges, A. H. Milley
(available in PDF
and Postscript
formats)
ABSTRACT: In this paper, we expand on the 1998
KDD cup competition findings: exploratory data analysis reveals unusual
data anomalies; a two-stage prediction model yields superior results to
those obtained in the 1998 competition; we use a decision tree to better
understand the model (the decision boundary); and we apply a confidence
interval to establish a range upon which we can reasonably judge model
performance.
Keywords: Two-stage prediction, neural network,
decision tree, model performance.
Co-winner 2: S.
Rosset and A. Inger
(available in PDF
and Postscript
formats)
ABSTRACT: This report describes the results of
our knowledge discovery and modeling on the data of the 1997 donation campaign
of an American charitable organization.
Honorary mention:
P. Sebastiani, M. Ramoni, and A. Crea
(available in PDF
and Postscript
formats)
ABSTRACT: This report describes a complete Knowledge
Discovery session using Bayeswar e Discoverer, a program for the induction
of Bayesian networks from incomplete data. We build tw o causal models
to help an American Charitable Organization understand the characteristics
of respo ndents to direct mail fund raising campaigns. The first model
is a Bayesian network induced from the database of 96,376 Lapsed donors
to the June '97 renewal mailing. The network describes the dependency of
the probability of response to the renewal mail on a subset of the variables
in the database. The second model is a Bayesian network representing the
dependency of the dollar amo unt of the gift on the variables in the same
reduced database. This model is induced from the 5\% o f cases in the database
corresponding to the respondents to the renewal campaign. The two model
s are used for both predicting the expected gift of a donor and understanding
the characteristi cs of donors. These two uses can help the charitable
organization to maximize the profit.
Keywords: Bayesian Networks, Customer Profiling,
Missing Data
Other conference
reports
Interface
'99: A Data Mining Overview
A.
Goodman
(available in PDF
and Postscript
formats)
Discovering geographic knowledge
in data rich environments: a report on a specialist meeting
H.J.
Miller and J. Han
(available in PDF
and Postscript
formats)
ABSTRACT: On 18-20 March 1999, a Specialist Meeting
on "Discovering geographic knowledge in data-rich environments" was convened
under the auspices of the Varenius Project of the National Center for Geographic
Information and Analysis (NCGIA). This workshop brought together a diverse
group of researchers and practitioners with interests in developing and
applying new techniques for exploring large and diverse geographic datasets.The
interaction prior to, during and after the three-day workshop resulted
in the identification of research priorities and directions for continued
development of "geographic knowledge discovery" (GKD) theory and techniques.
Keywords: Geographic data mining, spatio-temporal
data mining, geographic information systems, geographic research.
WebKDD-99: Workshop on Web Usage
Analysis and User Profiling
Brij Masand, Dr. Myra Spiliopoulou
(available in PDF
and Postscript
formats)
ABSTRACT: The WEBKDD'99 workshop on \Web Usage
Analysis and User Pro,ling" took place at Aug. 15, 1999 under the auspices
of the SIGKDD International Conference on Knowledge Discovery and Data
Mining (KDD'99). We report on the topics addressed in the workshop, the
contributions and the discussions that took place in its framework.
Keywords: Web usage mining
KDD-99 Workshop on Large-Scale
Parallel KDD systems
M.
Zaki, C.T. Ho
(available in PDF
and Postscript
formats)
SIGMOD 99 Workshop on research
issues in data mining and knowledge discovery
K.
Shim, R. Srikant
(available in PDF
and Postscript
formats)
Interesting KDD news from SIGMOD
99
D.
Keim
(available in PDF
and Postscript
formats)
Book
Reviews
Data Mining Methods for Knowledge Discovery
by K.
Cios, W. Pedrycz and R. Swiniarski, Kluwer
(available in PDF)
ABSTRACT: This paper is a review of the book Data
Mining Methods for Knowledge Discovery", by K. Cios, W. Pedrycz and R.
Swiniarski, Kluwer 1998, 495 pp.
Keywords: Data mining, Book review.
News,
Events and Announcements
(available in PDF
and Postscript
formats) |