The Artima Developer Community
Leading-Edge Java | Discuss | Print | Email | First Page | Previous | Next
This article is sponsored by the Java Community Process.

Leading-Edge Java
Catch Jackrabbit and the Java Content Repository API
by Frank Sommers
June 3, 2005

Page 1 of 3  >>


Relational and object databases lack many data management features required by modern applications, such as versioning, rich data references, inheritence, or fine-grained security. Content repositories extend databases with such additional capabilities. The Java Content Repository API (JSR 170) defines a standard to access content repositories from Java code, and promises to greatly simplify Java database programming. This article reviews the Java Content Repository API and its open-source implementation, Apache Jackrabbit, from a developer's perspective.

Sir Thomas More's famous treatise, Utopia [1], recounts the experiences of a fictitious traveler to an imaginary island where everyone lives well, all citizens are educated, and no one is left behind. Penned in 1516, Utopia describes how government, professions, social relations, travel, the military, religion, and even marriage work in that "ideal" world.

If Sir Thomas were writing today, he would do well to include a chapter on data management in his book. In an ideal world, what would data management be like? While we can only fancy More's description, the Java Content Repository API (JSR 170) [2] expert group may have a partial answer. The new API, which was approved as a final Java standard by the JCP [3] on May 31st, claims to radically simplify Java data management by creating a unified access model for data repositories.

If the Java Content Repository (JCR) API expert group's vision bears out, in five or ten years' time we will all program to repositories, not databases, according to David Nuescheler, CTO of Day Software [4], and JSR 170 spec lead. Repositories are an outgrowth of many years of data management research, and are best understood as fancy object stores especially suited to today's applications.

To experience first hand whether the JCR API's promise of simplifying Java data management is real or utopian, I took the JSR 170 reference implementation, Apache Jackrabbit [5], on a test drive. I built a small blogging application with JCR, and will share my experiences with you in this article.

My findings? The JCR is worth a serious look if you are building real-world, data-centric Java applications. And while programming to a content repository as opposed to a database can save serious development time, the devil—as you've probably expected—is in the details.

Not your father's database

Commercial repositories are often implemented on top of more traditional database, and even filesystem, technology. Therefore, repositories often serve as a layer between a data store, such as an RDBMS, and an application requiring access to persistent data. A repository typically consists of the following components [6]:

The relationships between these components are illustrated in figure 1.

Figure 1: Repository components.

What benefits do these components bring to a plain old database? According Microsoft Research's Phil Bernstein, who served as architect of that company's object repository that first shipped in Visual Basic 5, a repository engine offers six features above a traditional relational or object database management system [6]:

  1. Object management: Managing repository objects means storing a repository object's state. That state comprises the object's property and attribute values. Repositories typically allow applications to manage objects via the repository API.

  2. Dynamic extensibility: Each repository object has a type. The repository information model is a collection of the possible object types in the repository as well as of the objects that implement those types. The repository engine allows adding new types and extending existing types. In contrast to relational databases, a repository information model, including type information, is often implemented not as metadata, but as a collection of first-class repository objects. As a result, a repository often has no metadata in the sense of relational database metadata. This is roughly analogous to how objects run in a Java virtual machine, for instance: Type information is represented by first-class objects of the type Class, and the JVM associates non-Class objects with Class objects that define the object's type.

  3. Relationship management: While relational databases define entity relations between database objects, they do so at the level of the database schema (metadata), not in terms of actual database objects. By contrast, repositories allow object relationships to be specified in terms of first-class objects representing those relationships. For instance, two Page objects might be related via a Link object, denoting that one page links to another. Because Link is a repository object, it can be associated with a rich object type: For example, one describing a bi-directional link between the two pages. A repository engine enforces referential integrity between related objects.

  4. Notification: Objects both inside and outside the repository may listen to changes occurring to repository objects. The repository engine dispatches notifications as such changes take place.

  5. Version management: Most applications today require versioned data: Given a data item, an application must be able to access the current as well as all past versions of that data item. Neither relational nor object databases provide standard, out-of-the-box versioning, leaving versioning chores to each application accessing the data store. By contrast, keeping track of versions, and making those versions available to applications, is an important repository feature.

  6. Configuration management: Applications often need to keep track of subsets of repository objects. For instance, a single repository might contain objects belonging to several users or companies, or might comprise objects for several software packages. Such repository object subsets are termed configurations or workspaces.

If your application can use any of the above features, then repositories might be for you. There are dozens of repository products to choose from. For starters, database vendors often ship a repository component as part of their high-end DMBS product (the Microsoft Repository ships with SQL Server, for instance) [7]. IDE and software configuration tool vendors also include repositories in their offerings. A version control system, such as CVS or Subversion, are specialized repositories [8]. In the near future, even file systems will incorporate some repository features, such as Sun's ZFS filesystem [9], and the WINFS filesystem that will ship with Microsoft's Longhorn operating system [10]. Many open source and commercial content management systems (CMS) are also based on repositories. And now, there is Jackrabbit, an open-source content repository from the Apache Incubator Project.

Page 1 of 3  >>

Leading-Edge Java | Discuss | Print | Email | First Page | Previous | Next

This article is sponsored by the Java Community Process.

Sponsored Links

Copyright © 1996-2018 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use