Sometimes you may have the need for accessing html
web content from within an application. Why would you
need to do this? Well, suppose you need to have a
Windows Service periodically render a page with dynamic content and
attach it to an email. Your application would need a way to request and
save the remote page. Perhaps you need to grab an
image from a web camera and save it to a file or display it in a
Picture Box within a Winforms application. This technique, often
called “Screen Scraping” is simple (like so many other
thing) with .NET. In fact, 4GuysFromRolla.com have a good article
describing how to do just this: Screen Scrapes in
ASP.NET. As they describe, with a few lines of code, you can request an
Internet resource and work with it’s stream.
The WebClient class also provides three methods for downloading data from
a resource:
DownloadData
downloads data from a resource and returns a byte array.
DownloadFile
downloads data from a resource to a local file.
OpenRead
returns the data from the resource as a Stream.
A WebClient instance does not send optional HTTP headers by default. If
your request requires an optional header, you must add the header to the Headers
collection.
That last sentence basically means that you can only request
un-secured resources. If the site you want to screen scrape requires
cookie-based authentication, you will have to manually add the cookie header to
the outgoing request, in order to be authenticated.
Attaching a Fixed Authentication Cookie
So how do you find out what cookie header you need to send to the
remote request? One way is to use an application like ieHttpHeaders for IE or LiveHttpHeaders for
Firefox, request the resource in your browser and inspect your headers manually. For
example, if you are logged in to CodeBetter.Com, you’ll see that we set a cookie
that looks something like this:
Cookie: CommunityServer-UserCookie [bunch of text to follow]
In order to screen scrape a secure page using the WebClient, you have
to add this cookie to the WebClient’s HttpHeaders manually. The code to do this
is simple, here’s a snippet:
Now, attaching a fixed cookie for requesting secure data is somewhat
brute-force and brittle. If the remote site ever changes it’s machine key, for
example, you’re application will break. Also, if your application needs to make
requests on behalf of different users, this method will not work, because the
cookie will be different for each user.
Generating a Dynamic Authentication Cookie
In a real-world application, to use this for anything useful, your
application must know something about how the authentication cookies are
generated, and it must dynamically generate the cookie to send. In order to do
this, you need to know the remote web application’s machine key (usually an
impossibility, unless you control both ends) and any other custom cookie data
that application sets.
Even if you do happen know the machine key, your application needs
to have web context in order to generate the cookie. Why? Because in
order to generate a cookie you must have some code like the following:
Here’s the gotcha: the constructor for FormsAuthenticationTicket
fails without a web context! What about creating an HttpContext object
manually? Well, it turns out that there are just too many steps required to
create this context manually. The runtime goes through many steps in order to
set this up properly.
So what if your screen scraping is a Winforms application, or
Windows Service?
In this situation, your options for generating a cookie are:
or 2) Connect to a Web application to request the cookie.
In order to do this, the best way is to expose a Web Service method
which does your custom cookie generation, and return the string of the
cookie for which to add to your WebClient.
What about security? If you’re worried about the security risk of
exposing a cookie generator via a Web Service, you could secure this service
using WSE and encrypt the conversation, or better yet don’t expose this service
outside of your firewall. Remember, this service should probably accept the
same credentials as your public site, and your application is generating this
cookie for web users anyhow, so it’s probably not much of a security risk for
you to expose this service.
The Code
So what does this all look like? Here’s an example of the sort of
Web Service code that you’ll need to host to return a cookie (Note:
HttpCookie is not serializable, so you can’t make this the return type of your
WebService.) :
One more thing to note. By default,
your machine key will be set to auto generate, and different applications will
not share these keys. For your Web Service to return the same key as your Web
Site, you’ll need to manually set your key using the element
in your Web.Config files. See Generate Machine Key
Elements for Web Farms for more on manually setting machine keys, which even
contains a web based key generator.
Finally
Setting this up is a bear, but if you’ve done everything correctly, you
should be able to download secure web content effortlessly from windows
applications. This technique can open up a whole host of applications for
providing secure content.