This post originated from an RSS feed registered with .NET Buzz
by Scott Hanselman.
Original Post: How to load HTML into mshtml.HTMLDocumentClass with UCOMIPersistFile and my ignorance
Feed Title: Scott Hanselman's ComputerZen.com
Feed URL: http://radio-weblogs.com/0106747/rss.xml
Feed Description: Scott Hanselman's ComputerZen.com is a .NET/WebServices/XML Weblog. I offer details of obscurities (internals of ASP.NET, WebServices, XML, etc) and best practices from real world scenarios.
What a weird one. I'm looking at the source for NDoc.Document.HtmlHelp2.Compiler.HtmlHelpFile.
It uses the Microsoft.mshtml interop Assembly to load an HTML file into the HTMLDocumentClass
for easy parsing.
It's code looks like this (DOESN'T WORK):
private
HTMLDocumentClass GetHtmlDocument( FileInfo f )
{
HTMLDocumentClass doc = null;
try
{
doc = new HTMLDocumentClass();
UCOMIPersistFile persistFile = (UCOMIPersistFile)doc;
persistFile.Load( f.FullName, 0 );
int start = Environment.TickCount;
while( doc.body == null )
{
if (
Environment.TickCount - start > 10000 )
{
throw new Exception( string.Format(
"The document {0} timed out while loading", f.Name ) );
}
}
}
}>
I went searching as
it was taking up 100% CPU for an hour and never completed. Now I know why! :)
What's weird is this, the only way I could get it to
work (as IPersistFile is loading on another Thread) was with this change (NOW
IT WORKS):
private HTMLDocumentClass GetHtmlDocument( FileInfo f )
{
HTMLDocumentClass doc = null;
try
{
doc = new HTMLDocumentClass();
UCOMIPersistFile persistFile = (UCOMIPersistFile)doc;
persistFile.Load( f.FullName, 0 );
int start = Environment.TickCount; while(
doc.readyState != "complete" ) { System.Windows.Forms.Application.DoEvents(); if (
Environment.TickCount - start > 10000 )
{
throw new Exception( string.Format(
"The document {0} timed out while loading", f.Name ) );
}
}
}
}
When I Reflector into DoEvents() I can see
that it's doing more than a Sleep(0) (yield), it's actually running the message pump.
Am I missing something? Apparently IPersistFile needs the message pump?
Well, it works, but it's gross.