The Artima Developer Community
Leading-Edge Java | Discuss | Print | Email | First Page | Previous | Next
This article is sponsored by the Java Community Process.

Leading-Edge Java
Mine Your Own Data with the JDM API
Exploring the Java Data Mining API
by Frank Sommers
July 7, 2005

Page 1 of 3  >>


The Java Data Mining API (JDM) is the first attempt to create a standard Java API to access data-mining tools from Java applications. JDM promises to bring to data mining what JDBC brought to databases, and to make data mining a new and useful part of an enterprise Java developer's tool chest. This article introduces basic data-mining concepts, and illustrates sample JDM code to model customer behavior.

Queries and navigation represent the key ways we interact with data today. A query applies expressions—written in a query language, such as SQL or XPATH—to a data set, and produces a subset of that data. Navigation allows one to traverse relationships between data items via direct references. Examples of traversal include clicking on a URL or following object references.

Queries and navigation both assume that relationships between data are known in advance. In case of a query language, for instance, the relationship between a company and its employees is represented with a link between company and employee IDs. With navigation, a Company object might produce a collection of Employees, and each Employee can be reached by iterating through that collection.

Relationships between data items, however, are not always known in advance, especially in large databases. Subtle relationships between possibly hundreds of data attributes may collectively tell a story that queries or navigation alone cannot reveal. In such situations, instead of querying or navigating, you may wish to mine the data to bring those hidden relationships to life.

Data mining "is a process for finding patterns and relationships in the data, and using that knowledge to classify new data or gain insights into that data," says Mark Hornick, Oracle Senior Manager for Data Mining Technologies and spec lead for the Java Data Mining API, or JDM (JSRs 73 [1] and 247 [2]).

Until recently, data mining remained the domain of highly skilled specialists. Developer and business tools, however, have started to bring data mining within the reach of any developer and user interested in gaining insight from large data stores. If that trend continues, data mining will occupy an increasingly important role in the arsenal of an enterprise developer tasked with building data access applications.

Data-mining tools were traditionally built either in-house, or provided in products with vendor-specific interfaces. As a result, Java applications had no standard way to interact with data-mining software. The situation was similar to accessing databases prior to the JDBC standard: Each application had to rely on product-specific APIs for data access. The Java Data Mining API (JDM) defines a common Java API to interact with data-mining systems. JDM is best understood as "JDBC for data mining." "JDBC is a very popular API, and we'd like to see JDM be similarly used for accessing data-mining capabilities from vendor products, whether the data be in databases or flat file systems," says Oracle's Hornick.

Because data mining is a complex set of activities, the initial JDM specifications left several important data-mining functions to a future JDM spec version. That allowed JDM 1.0 (JSR 73) to reach final specification stage in August, 2004, and subsequently to find its way into commercial implementations. JDM 2.0 (JSR 247) was accepted to the JCP on June of this year, and it aims to define many additional capabilities left out of the 1.0 spec.

The rest of this article introduces data-mining concepts, and provides a code example that maps those concepts to JDM classes and usage.

Page 1 of 3  >>

Leading-Edge Java | Discuss | Print | Email | First Page | Previous | Next

This article is sponsored by the Java Community Process.

Sponsored Links

Copyright © 1996-2018 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use