About SIGKDD Explorations

Newsletter of the ACM Special Interest Group on Knowledge Discovery and Data Mining

About SIGKDD Explorations

About SIGKDD

Officers

Current Issue

Previous Issues

Upcoming Issues

Submission instructions

Related Links

January 2002. Volume 3, Issue 2

Editorial by M. J. Zaki (available in PDF and Postscript formats or HTML)

Contributed Articles on Online, Interactive, and Anytime Data Mining

Mining Data Streams under Block Evolution
V. Ganti, J. Gehrke and R. Ramakrishnan
(available in PDF and Postscript formats)
ABSTRACT: In this paper we survey recent work on incremental data mining model maintenance and change detection under block evolution. In block evolution, a data is updated periodically through insertions and deletions of blocks of records at a time. We describe two techniques: (1) We describe a generic algorithm for model maintenance that takes any traditional incremental data mining model maintenance algorithm and transforms it into an algorithm that allows restrictions on a temporal subset of the databases. (2) We also describe a generic framework for change detection, that quantifies the difference between two datasets in terms of the data mining models the induce.

Towards Effective and Interpretable Data Mining by Visual Interaction
C. C. Aggarwal
(available in PDF and Postscript formats)
ABSTRACT: The primary aim of most data mining algorithms is to facilitate the discovery of concise and interpretable information from large amounts of data. However, many of the current formalizations of data mining algorithms have not quite reached this goal. One of the reasons for this is that the focus on using purely automated techniques has imposed several constraints on data mining algorithms. For example, any data mining problem such as clustering or association rules requires the specification of particular problem formulations, objective functions, and parameters. Such systems fail to take the user's needs into account very effectively. This makes it necessary to keep the user in the loop in a way which is both efficient and interpretable. One unique way of achieving this is by leveraging human visual perceptions on intermediate data mining results. Such a system combines the computational power of a computer and the intuitive abilities of a human to provide solutions which cannot be achieved by either. This paper will discuss a number of recent approaches to several data mining algorithms along these lines.

Requirements for Clustering Data Streams
D. Barbará
(available in PDF and Postscript formats)
ABSTRACT: Scientific and industrial examples of data streams abound in astronomy, telecommunication operations, banking and stock market applications, e-commerce and other fields. A challenge imposed by continuously arriving data streams is to analyze them and to modify the models that explain them as new data arrives. In this paper, we analyze the requirements needed for clustering data streams. We review some of the latest algorithms in the literature and assess if they meet these requirements.

Interactive Mining and Knowledge Reuse for the Closed-Itemset Incremental-Mining Problem
L. Dumitriu
(available in PDF and Postscript formats)
ABSTRACT: Using concept lattices as a theoretical background for finding association rules has led to designing algorithms like Charm, Close or Closet. While they are considered as extremely appropriate when finding concepts for association rules, due to the smaller amount of results, they do not cover a certain area of significant results, namely the pseudo-intents the form the base for global implications. We have proposed an approach that, besides finding all proper partial implications, also finds pseudo-intents. The way our algorithm is devised, it allows certain important operations on concept lattices, like adding or extracting items, meaning we can reuse previously found results. It is a well-known fact that mining association rules may lead to a large amount of results. Since, the mining results are meant to be understood by the user, we have come to the conclusion that he will benefit more from starting small, with some of the items in the database, understand a small amount of results, and then add items receiving only the extra-results. This way the number of human interventions during the "full" mining process is increased and the process becomes user-driven.

MobiMine: Monitoring the Stock Market from a PDA
H. Kargupta, B.-H. Park, S. Pittie, L. Liu, D. Kushraj and K. Sarkar
(available in PDF and Postscript formats)
ABSTRACT: This paper describes an experimental mobile data mining system that allows intelligent monitoring of time-critical financial data from a hand-held PDA. It presents the overall system architecture and the philosophy behind the design. It explores one particular aspect of the system -- automated construction of personalized focus area that calls for user's attention. The module works using data mining techniques. The paper describes the data mining component of the system that employs a novel Fourier analysis-based approach to efficiently represent, visualize, and communicate decision trees over limited bandwidth wireless networks. The paper also discusses a quadratic programming-based personalization module that runs on the PDAs and the multi-media based user-interfaces. It reports experimental results using an ad hoc peer-to-peer IEEE 802.11 wireless network.

Reports from KDD-2001

KDD Cup 2001 Report
J. Cheng, C. Hatzis, H. Hayashi, M.-A. Krogel, S. Morishita, D. Page, J. Sese
(available in PDF and Postscript formats)
ABSTRACT: This paper presents results and lessons from KDD Cup 2001. KDD Cup 2001 focused on mining biological databases. It involved three cutting-edge tasks related to drug design and genomics.

MDM/KDD: Multimedia Data Mining for the Second Time
O. R. Zaïane and S. J. Simoff
(available in PDF and Postscript formats)
ABSTRACT: This is brief report summarizes the presentations, conclusions and directions for future work that were discussed during the second edition of the International Workshop on Multimedia Data Mining. The report includes references to resources where one can find more information about the workshop format, the proceedings and the workshop participants.

Workshop Report: The Fourth Workshop on Mining Scientific Datasets, August 2001
C. Kamath
(available in PDF and Postscript formats)

Visual Data Mining -- KDD Workshop Report
S. G. Eick and D. A. Keim
(available in PDF and Postscript formats)

BIOKDD01: Workshop on Data Mining in Bioinformatics
M. J. Zaki, J. T. L. Wang, H. T. T. Toivonen
(available in PDF and Postscript formats)
ABSTRACT: In this report we provide a summary of the BIOKDD01 Workshop on Data Mining in Bioinformatics, held in conjunction with the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, August 26, 2001 at San Francisco, California, USA.

When and How to Subsample: Report on the KDD-2001 Panel
P. Domingos
(available in PDF and Postscript formats)

Report on the SIGKDD 2001 Conference Panel "New Research Directions in KDD"
J. Gehrke
(available in PDF and Postscript formats)

Workshop Reports

VDM@ECML/PKDD2001: The International Workshop on Visual Data Mining at ECML/PKDD 2001
S. J. Simoff
(available in PDF and Postscript formats)

ABSTRACT: This brief report presents an overview of the International Workshop on Visual Data Mining, conducted on 4 September 2001 in conjunction with the 12th European Conference on Machine Learning (ECML'01) and the 5th European Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD'01). It includes summary of the presentations and discussions, and provides pointers to relevant resources in the area.

SIGKDD Explorations home page
Send comments and suggestions to sunita@it.iitb.ernet.in