This post originated from an RSS feed registered with .NET Buzz
by Tim Sneath.
Original Post: Experimenting with Yukon Analysis Services
Feed Title: Tim Sneath's Blog
Feed URL: /msdnerror.htm?aspxerrorpath=/tims/Rss.aspx
Feed Description: Random mumblings on Microsoft, .NET, and other topics.
Following on from the last
blog entry, when I wrote about creating an Analysis Services cube to work with
blog logs, I've been experimenting with doing the same things with Yukon. I hit a
few road bumps along the way, which helped me understand some important differences
in the Yukon Analysis Services architecture.
I should start with a brief eulogy of praise for the new version of Analysis Services.
The Yukon team have put an enormous amount of effort into revamping the architecture,
and the result of their labours is an environment that blows away the traditional
barriers between relational and multidimensional databases. You can now create a cube
based on a fully normalised relational database schema; you can use proactive caching
features to access active data in real-time; and you can do away with some of the
traditional limitations of one fact table per cube and one hierarchy per dimension.
The feature that gets a round of applause every time I demonstrate it, even from a
traditionally cynical British audience, is the "one click cube", more properly known
as IntelliCube. With this feature, you can point Analysis Services at a data source
and it can automatically determine the appropriate fact tables, dimensions, hierarchies
and levels - a wizard that can potentially turn hours of repetitive work into minutes.
Trust me - once you've played with the new stuff, it's painful to go back to the old
way of working, good though the 2000 release of Analysis Services is.
When I used IntelliCube on the website log table, however, I ran into a couple of
difficulties. The wizard ran through seamlessly, but it only detected one dimension.
Not only that, but it didn't find any hierarchies. Even when I tried to create a time
dimension, it didn't allow me to split it by a hierarchy such as year / quarter /
month, meaning that each of the 350,000 rows appeared separately without any aggregation
occurring. To be absolutely clear, the data source here is unusual: both facts and
dimensions occur in the same table. This is something that's not generally recommended
for good scalability, but quite viable for a comparatively small quantity of data
such as this. Given this fact, I posted a provocatively titled mail to the Picasso
team, suggesting that they'd perhaps not considered this particular usage scenario.
The development team were good enough to respond very quickly and a short phone call
later (many thanks, Thierry!) we'd determined what was going wrong. It turned out
to my surprise that Yukon was doing more or less the right thing, and the problem
was a few subtle changes in terminology that were causing me confusion.
Firstly, Yukon introduces the concept of attributes that can form part of a dimension.
An attribute offers an alternative to dimensions for slicing data. For instance, a
Customer dimension might include a traditional location-based hierarchy of country
/ region / city / street, as well as attributes representing (for instance) gender,
household income, and marital status. Wherever you'd traditionally use a dimension,
you can optionally choose to use an attribute instead, or even use multiple attributes
to create a custom dimension. On my cube, the IntelliCube wizard had simply created
each of the slicers (URL, referrer, user agent, time) as attributes rather than the
dimensions as existed in the previous release. So rather than having 1 cube and 4
dimensions, I instead have 1 cube, 1 dimension and 4 attributes in the Yukon version.
The second discrepancy related to the time dimension. Here there is a missing feature
in the Beta 1 release of the IntelliCube wizard, as the wizard only supports time
dimensions where the field has already been exploded into its constituent parts (year
/ month / day etc.) Analysis Services 2000 could automatically handle the separation
of a timestamp into its constituent parts, but it only created dimension members where
a row of data existed. This would mean that if your data only included values for
the 1st,2nd,4th and 5th of a month, you would wind up with holes in the dimension
members - as if the 3rd of the month never even existed! Yukon can automatically create
the members for a particular date period, which resolves this problem, but in Beta
1 this has to be done outside of the wizard.
The bottom line I hope people take away from this - Analysis Services in Yukon is
a very different beast: it's far more powerful and plays even better with the rest
of your operational data, but you'll need to make a few mental leaps if you're already
used to the current release of Analysis Services. Fortunately, we also include an
import utility that enables you to bring existing cubes across without fuss.
I'll write more about Analysis Services in a couple of weeks, and hopefully drill
further into some of the changes that have been made.