The Artima Developer Community
Sponsored Link

.NET Buzz Forum
Using DTS and Analysis Services to Analyse Blog Logs

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Tim Sneath

Posts: 395
Nickname: timsneath
Registered: Aug, 2003

Tim Sneath is a .NET developer for Microsoft in the UK.
Using DTS and Analysis Services to Analyse Blog Logs Posted: Nov 25, 2003 2:46 AM
Reply to this message Reply

This post originated from an RSS feed registered with .NET Buzz by Tim Sneath.
Original Post: Using DTS and Analysis Services to Analyse Blog Logs
Feed Title: Tim Sneath's Blog
Feed URL: /msdnerror.htm?aspxerrorpath=/tims/Rss.aspx
Feed Description: Random mumblings on Microsoft, .NET, and other topics.
Latest .NET Buzz Posts
Latest .NET Buzz Posts by Tim Sneath
Latest Posts From Tim Sneath's Blog

Advertisement

Following on from the entry I wrote about analysing blog logs, one or two people have asked me how I set everything up to do this. It's a really good use of Analysis Services, actually, and offers a good example of how Microsoft's business intelligence tools work together. Nothing is too complicated, although there are a number of steps involve to get everything working. If you've a general idea of what DTS and Analysis Services are, you should be able to follow these instructions fairly well - drop a note in the comments section if you're struggling and I'll see what I can do to help out. Note that the blogs hosted here use BlogX, and so what I've written here presumes a log format that matches the one on our server.

Step 1: Generating a raw log file
The first step is of course to get hold of the raw log files that contain the information to be analysed. The log files on my server are stored in a separate file per day, so I use a standard command-line utility called wget to pull down these files over HTTP using a batch file. (Incidentally, here's a quick-and-dirty C# version of wget that I wrote.) Once I've got hold of the files, I concatenate them together into a single log file called bloglog.txt using a batch file. This is about 60MB in size, as of the last count.

Step 2: Generating a destination table
Next we need to create a database table into which the cleansed, transformed version of the log files will be inserted. To achieve this, simply execute the following SQL (or similar):

   CREATE TABLE [Hits] (
      [HitID] [bigint] IDENTITY (1, 1) NOT NULL,
      [HitTime] [datetime] NULL,
      [URL] [varchar] (400) NULL,
      [URLReferrer] [varchar] (400) NULL,
      [UserAgent] [varchar] (400) NULL,
      CONSTRAINT [PK_Hits] PRIMARY KEY CLUSTERED ([HitID])  
   ) 

Step 3: Generating a DTS Package
The main thing to do here is to copy across the data from the raw tab-delimited file to the Hits table. You can do this by creating a Transform Data task that copies the appropriate columns across (using Copy Column transforms). For the date column, you need to use an ActiveX Script transform, since the date format is different in each case. Here's the VBScript to slice the input appropriately:

   '  Copy each source column to the destination column
   Function Main()
      DTSDestination("HitTime") = Left(DTSSource("Col003"), 10) & _
         " " & Mid(DTSSource("Col003"), 12, 8)
      Main = DTSTransformStat_OK
   End Function

Test the package, and then execute it. If all goes well, you should now have a SQL Server table containing the log file. (If you'd like to save time in this step, you can download the DTS package I created and modify it to match your own table and source file names.)

(Optional) You might now want to go back and add some steps to the DTS package to clean up the table before each execution, run the batch file created in step 1, and even process the cube we'll create in step 4. This is how the final DTS package should look:

Step 4: Creating an Analysis Services Cube
Within Analysis Manager, create a new database. Set the data source to the SQL database you've just created. Now right-click on the Cubes node and choose the Cube Wizard. In a more complex data source, you'd split the fact table (the values you're browsing) and the dimension tables (the things you use to slice and dice the values) out into separate tables, but here everything is conveniently stored in the one table. The HitID is the measure - although the value has no meaning, we'll set the usage of this later.

Create four dimensions, using the star schema model and accepting all the defaults:

  1. URL (standard dimension type based on the URL column)
  2. Referrer (standard dimension type based on the URLReferrer column)
  3. User Agent (standard dimension type based on the UserAgent column)
  4. Time (time dimension type, year / quarter / month / day hierarchy, based on the HitTime column)

When the wizard dumps you unceremoniously into the Cube Editor, select the HitId measure and click the Properties button in the bottom left hand corner. Set the Aggregate Function to be Count, not Sum. This switches the aggregations to count how many entries there are, not the values contained in them. Lastly, choose Tools / Process Cube, designing storage using the defaults and then executing the process. The cube is now available for browsing.

NB Make sure you choose the option to incrementally update the shared dimensions, if you've created the dimensions above as shared - otherwise the processing step will fail on repeated attempts.

Step 5: Use Excel to Browse the Cube
Now comes the fun part! In Excel, choose Data / PivotTable and PivotChart Report. Select External Data Source and get the data from the OLAP cube (you might have to set up a new query using Microsoft Query). Accept all the other defaults, and you'll wind up with an empty PivotTable embedded in the spreadsheet. You can now drag fields to the PivotTable - make sure you drag HitID to the centre of the table, and then drag the dimensions to rows or columns. The context-sensitive menus allow you to customise sorting, grouping and formatting. Create a PivotChart to see a graphical view, such as this:

Well done if you've got this far! Have fun analysing your blog data and identifying some of the trends. You might be surprised as to which kinds of blog entries are the most popular, or what aggregators are visiting your site...

Read: Using DTS and Analysis Services to Analyse Blog Logs

Topic: CommentRSS Previous Topic   Next Topic Topic: ASP Insiders - Do we need that ?

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use