What a weird one. I'm looking at the source for NDoc.Document.HtmlHelp2.Compiler.HtmlHelpFile.
It uses the Microsoft.mshtml interop Assembly to load an HTML file into the HTMLDocumentClass
for easy parsing.
It's code looks like this (DOESN'T WORK):
private
HTMLDocumentClass GetHtmlDocument( FileInfo f )
{
HTMLDocumentClass doc = null;
try
{
doc = new HTMLDocumentClass();
UCOMIPersistFile persistFile = (UCOMIPersistFile)doc;
persistFile.Load( f.FullName, 0 );
int start = Environment.TickCount;
while( doc.body == null )
{
if (
Environment.TickCount - start > 10000 )
{
throw new Exception( string.Format(
"The document {0} timed out while loading", f.Name ) );
}
}
}
}>
I went searching as
it was taking up 100% CPU for an hour and never completed. Now I know why! :)
What's weird is this, the only way I could get it to
work (as IPersistFile is loading on another Thread) was with this change (NOW
IT WORKS):
private HTMLDocumentClass GetHtmlDocument( FileInfo f )
{
HTMLDocumentClass doc = null;
try
{
doc = new HTMLDocumentClass();
UCOMIPersistFile persistFile = (UCOMIPersistFile)doc;
persistFile.Load( f.FullName, 0 );
int start = Environment.TickCount;
while(
doc.readyState != "complete" )
{
System.Windows.Forms.Application.DoEvents();
if (
Environment.TickCount - start > 10000 )
{
throw new Exception( string.Format(
"The document {0} timed out while loading", f.Name ) );
}
}
}
}
When I Reflector into DoEvents() I can see
that it's doing more than a Sleep(0) (yield), it's actually running the message pump.
Am I missing something? Apparently IPersistFile needs the message pump?
Well, it works, but it's gross.
>
Read: How to load HTML into mshtml.HTMLDocumentClass with UCOMIPersistFile and my ignorance