The Artima Developer Community
Sponsored Link

Agile Buzz Forum
Taking control of XML.XMLParser's Network access

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
James Robertson

Posts: 29924
Nickname: jarober61
Registered: Jun, 2003

David Buck, Smalltalker at large
Taking control of XML.XMLParser's Network access Posted: Nov 28, 2007 6:14 AM
Reply to this message Reply

This post originated from an RSS feed registered with Agile Buzz by James Robertson.
Original Post: Taking control of XML.XMLParser's Network access
Feed Title: Michael Lucas-Smith
Feed URL: http://www.michaellucassmith.com/site.atom
Feed Description: Smalltalk and my misinterpretations of life
Latest Agile Buzz Posts
Latest Agile Buzz Posts by James Robertson
Latest Posts From Michael Lucas-Smith

Advertisement

In Cincom Smalltalk, you can invoke the XML parser on a URI that happens to be network oriented, such as: (XML.XMLParser on: 'http://www.w3.org' asURI) scanDocument.

Most people who use the XML parser are quick to turn off validation in the hopes that it will parse documents faster because it won't have to download all the entity rules to do the validation. If you're like me, you'd then have written the code:

(XML.XMLParser on: 'http://www.w3.org' asURI) validate: false; scanDocument

However, we quickly discover that even with the validation off, the entities are still downloaded off the internet. This can be detrimental to performance, as the base network code has no caching mechanism - and let's face it, we don't really want to hit the internet randomly when we might not even have internet connectivity.

So over the years, I've found myself doing the same patch over and over to the XML parser to not download external entities if validation is off. This was wrong, very wrong. You do need the entities to resolve names like " correctly. So what you really want to do is take control of how the XML parser gets its external resources.

In VisualWorks 7.6 and future versions of ObjectStudio 8, you can now specify a handler for resolving external entities that is block based so that it's nice and pluggable. For example, if we want to deny all internet access, we can write the following code:

(XML.XMLParser on: 'http://www.w3.org' asURI)
	validate: false;
	entityResolver: (XML.PluggableEntityResolver withBlock: [XML.ResolveEmptyResource]);
	scanDocument

Or if we want to log all external access, we can write the following code:

(XML.XMLParser on: 'http://www.w3.org' asURI)
	validate: false;
	entityResolver: (XML.PluggableEntityResolver withBlock: [:publicID :systemID |
		Core.Transcript cr; show: publicID; tab; show: systemID.
		XML.ResolveDefaultResource]);
	scanDocument

We can selectively choose to allow the regular download code or deny it like in the following:

(XML.XMLParser on: 'http://www.w3.org' asURI)
	validate: false;
	entityResolver: (XML.PluggableEntityResolver withBlock: [:publicID :systemID |
		(UI.Dialog confirm: 'Download the resource: ', systemID, ' ?')
			ifTrue:	[XML.ResolveDefaultResource]
			ifFalse:	[XML.ResolveEmptyResource]]);
	scanDocument

And finally, we can fake a download by returning our own stream - so we could make a local copy of the w3 HTML entities and avoid a download while still resulting in the same parse as if we had hit the internet, eg:

(XML.XMLParser on: 'http://www.w3.org' asURI)
	validate: false;
	entityResolver: (XML.PluggableEntityResolver withBlock: [:publicID :systemID | '' readStream]);
	scanDocument

Read: Taking control of XML.XMLParser's Network access

Topic: Meta meta Previous Topic   Next Topic Topic: Smalltalk in Slovenia

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use