The Artima Developer Community
Sponsored Link

Java Community News
Natural-Language Processing with Java

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Frank Sommers

Posts: 2642
Nickname: fsommers
Registered: Jan, 2002

Natural-Language Processing with Java Posted: Aug 7, 2007 4:13 PM
Reply to this message Reply
Summary
Algorithmic analysis of natural language can enhance many enterprise applications and lead to improved user experience. In a recent article, Rod Coffin and Matt Smith demonstrate three open-source Java natural language-processing tools.
Advertisement

Much input into enterprise applications comes from humans, and in the form natural language. The ability to algorithmically analyze human-language input can not only provide a better user experience, but can also make an application more effective by automating classification of user input, for example.

In spite of many decades of natural-language processing, relative few applications take advantage of advances in the field and of the numerous natural-language processing open source projects. In a recently published article, Bad at Grammar? Cheat with Java Linguistics Tools, Rod Coffin and Matt Smith demonstrate three such Java tools: LingPipe for text classification, OpenNLP for sentence identification, and Inflector for pluralization.The authors introduce text classification as the capability to:

Programmatically identify the language in which a text is written, its topic, sentiment (i.e. inflammatory, reasoned, etc.), or to identify a possible author. Most text classification techniques involve applying statistical methods to a training corpus (a set of known texts used for training systems) to develop a model for determining the most likely category of future text passages.

The article demonstrates text classification with LingPipe by creating a program that automatically determines which of two topics an email message is about.

While identifying sentences seems easy enough, Coffin and Smith show that there is more to sentence identification than parsing text based on punctuation marks. They introduce OpenNLP, including its SentenceDetector sub-project:

OpenNLP is an umbrella project that includes several projects related to linguistics. The SentenceDetector included with OpenNLP tools uses a maximum entropy ... algorithm that is trained on a corpus of text extracted from the Wall Street Journal.

Finally, the authors introduce the java.net project project Inflector that can perform pluralization of English-language words:

Pluralization of English words is one of those problems where the 80 percent case is easy but the cost for the remaining 20 percent is exponentially more expensive because of English's many irregularities. One Java tool that can perform similar pluralization is the java.net Inflector project... Out of the box, Inflector doesn't handle 100 percent of English irregular words correctly but does provide a framework for handling user-specified pluralizations.

Have you used natural-language processing tools in your projects? If so, what do you think of the effectiveness of such tools?

Topic: Shoal Releases 1.0 EA of Java Clustering Framework Previous Topic   Next Topic Topic: JIDE Software Releases Swing Desktop Application Framework

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use