The Artima Developer Community
Sponsored Link

Java Answers Forum
How do I Extrract strings from HTML page like Price of a book ?

2 replies on 1 page. Most recent reply: Sep 16, 2002 10:40 AM by Don Hill

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 2 replies on 1 page
garvey

Posts: 1
Nickname: garvey3003
Registered: Sep, 2002

How do I Extrract strings from HTML page like Price of a book ? Posted: Sep 14, 2002 11:06 AM
Reply to this message Reply
Advertisement
How can I use parse HTMl to extract text strings from a webpage like price, isbn,title of book.
I'm trying to develop a web spider?
please help


Matt Gerrans

Posts: 1153
Nickname: matt
Registered: Feb, 2002

Re: How do I Extrract strings from HTML page like Price of a book ? Posted: Sep 15, 2002 12:44 AM
Reply to this message Reply
Well, every page will be different, of course. You'll have to go analyze pages from Amazon.com, bn.com, fatbrain.com, and/or the publishers themselves and handle each one differently. Maybe some of them offer web services for getting that information; that would be the ideal solution.

Don Hill

Posts: 70
Nickname: ssswdon
Registered: Jul, 2002

Re: How do I Extrract strings from HTML page like Price of a book ? Posted: Sep 16, 2002 10:40 AM
Reply to this message Reply
If I was doing this, I would want to turn the html into valid XML, this can be done by using a product named "Tidy". Most of the mapping tools on the market use this tool.

After you have run the html against tidy you will have valid XML, this will allow you to use a Document to process the html or in this case it will be a DOM Document. As stated below you can never guarentee that the html will be the same as websites html change all the time.

HTH

> How can I use parse HTMl to extract text strings from
> a webpage like price, isbn,title of book.
> I'm trying to develop a web spider?
> please help

Flat View: This topic has 2 replies on 1 page
Topic: Discussion Forum Source Code Previous Topic   Next Topic Topic: Data Layout Transformation

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use