Posts Tagged data warehousing
This post continues the series on the Data Warehouse, product retirement project I’m currently leading. The previous post in this series was Descending Into Microsoft Analysis Services.
Update – Where We Have Reached
- The Analysis Services Object model traversing program has made a migration to a Windows Forms Tree Viewer. The program is a “lazy” evaluator, in that it only traverses an object (or collection) when the user selects the type. This makes the thing pretty quick to use. It is also all driven by reflection, so in theory could be applied to any object structure. That is a feature which I’ve not implemented, but I may do some time in the future. I should probably should refactor the code and make a subclass of the TreeView with the reflection code as a “custom control”.
- The define a “Cube” from the XML, and SQL Tables, is working well. A couple of refinements, have been made to it. These have included:
- Getting the display order working better. It’s not perfect, in that to do that we would need to add a presentation order column to all of the SQL tables which are used to build the dimensions.
- Reusing dimensions across “Cubes” which are built from the same SQL table. One little step forward for better data management. There still is a long way to go with improving the data management. But, almost all of those improvements will have to come in subsequent projects.
- The data comparer has progressed, but taken a detour. I’ll cover that in the “dancing goal posts” section (next).
Dancing Goal Posts
There are, or have been, a couple of “extra” requirements for the project. These have included:
- A “data salvage” aspect to the project. There are a number of years worth of data which we are going to salvage from the existing system. Hence, the data compare function, has taken on board an extra requirement to extract all of the base data from the existing data. This is a detour which may lead to a couple of choices needing to be made. These choices are:
- To pour all of the data into one “Cube”. This exposes a number of issues:
- The “cookie cutter” which builds the SSAS cubes would need to be enhanced to add a time dimensions (to hang the different “cubes” of data from).
- There could be revision to the data, which has resulted in different “cubes” being created. Developing a dimensional model which caters for this as well could be a “bridge too far” for this project.
- There have bee changes to the dimensions over time. These changes have just resulted in different “cubes” with different labels sets, in the old system. Building up an approach which implements “Slowly Changing Dimensions”, again could be a “bridge too far” for the project.
- My Analysis Object Viewer many well need some enhancements. To get a better understanding of the scope of the dimensional issues, which are mentioned above. These enhancements could include:
- Creating a cube for each of the salvaged datasets. This would be done by the existing “cookie cutter” we have for creating “Cubes”
- Building something visual to show the dimensionality of each of the cubes. Would need to get into something which gets me a “table like” view of the dimensions. Would need to have hierarchy, and code set information as well. Probably a big job. Or,
- Build something which selects the set of cubes, then uses LINQ over the object model to determine the following:
- Sets the “time 0” cube then runs time forward from there.
- Identifies “New Dimensions”. These would be dimensions which did not exists in the”time 0” Cube.
- Identifies Changed Dimensions. There are a number of classes of change which will need to be detected. These include:
- New Codes, or Labels
- New Levels
- Different Hierarchy
- Differences in the Key pointing to a Label.
- The data presentation layer for the project needs to be sorted through. There are a number of alternatives here which include:
- Excel, and the Pivot table. I’ve already pointed Excel at some of the Cubes we have built. They did work OK. I’m not sure how good an Excel solution will work with some of the low bandwidth sites that this solution will need to support. Hence, the next couple of alternatives.
- SharePoint, with the Excel parts. I’ve never explored how this works, so it will be a “voyage of discovery”. This solution has a number of the benefits:
- it may work better in the low bandwidth sites,
- it may allow delegation of the management of the security of the cubes to clients,
- it may provide a better environment for doing “big” analysis jobs.
- SharePoint, with PowerPivot. A new thing from Microsoft. I’ve yet to see it in action. So, I’ve no idea what it gives above and beyond standard Excel functions.
- Reporting Services. This may need to be part of the mix, just for the routine reporting requirements, or maybe more wide spread. Again, sight unseen, I’m not sure what this will give to the mix.
This is an ongoing “voyage of discover”. I’m positive about the prospect of getting something which satisfies the “project brief”. What will need to be done in the future is something which we will have to put on the “to do list” for sometime in the future.
One of the projects which I’ve on the go at the new job is migration a collection of reporting function into the Microsoft way of doing things. There is a tight “timebox” on this project, and a diversity of reporting “cubes” which need to be migrated into Analysis Services Cubes.
The other constraint on the project is that the new Cubes should be a direct replacement of the original reporting solution. There should be no “enhancement” of the solution (to start with at leats). If we get the project to the finish line early, then there could be an opportunity to do some dimensional modelling on the implementations an “tidy up” the resulting Analysis Cubes.
My plan is a four pronged attack on the problem. Those prongs are:
- Use the xml file, which is used to the data (and metadata) into the current system, to defined the Analysis Cubes. The developer I’ve working on this is making good progress building a program which is defining a Cube programmatically. This is using the .Net object which Analysis Services for .Net (Namespace: Microsoft.AnalysisServices Assembly: Microsoft.AnalysisServices (in microsoft.analysisservices.dll)). Defining the translation between in existing products conceptual model to the Analysis Services model will come next week. Then we will start to see how this could come together.
- A program which will traverse the object model and report all of the properties of a cube. Currently, I’ve a version of this which produce text file. Moving it to a WinForms interface could be an option for next week, or I may “put it to bed” at it stands.
- A program which will compare the results of the same queries into the 2 product programmatically. I expect that the developer building this will make some good inroads into this over the coming week as well.
- “Sweeping up the loose ends”. I’m uncertain for all of the things which will end up in here. There is a bunch of documentation on the continued operation of the solution, already in this bucket.
- The object model for the Analysis Services looks good (so far). It would appear to allow these type of interactions will the product.
- There are a number of issues which the project will need to address as part of the migration which are not immediately part of the translation process, but part of the bigger “making it work well” process. These include:
- Storage management. I’ve not delved into this part of SQL Server and Analysis Services. These Cubes will consume space, and how that space is managed will be an issue to manage.
- Performance tuning. Another part of SQL Server and Analysis Services Cubes which is unknown at present. The generation of aggregates which are commonly used (or maybe all rollups), the choosing an approach and storage management will be an issue.
- Instrumentation. What we currently get out of the systems, and what we should be getting (to make things more robust) is another open question.
- Integration Services. It is unclear at present how much new ETL will need to build to deliver an automated solution. There is some ETL processes in existence which is used in the current implementations. How much the migration will need to modify is yet to be seen.
Yesterday, was the first day in the new job. There as a small bit of the “new boy at the new school” about it, but it was not a big part of the day. Most of the day was consumed with meeting people (I hope I start remembering all those names), and a couple of external activities (a physiotherapy appointment to continue sorting out my back, and another with my GP to puck up some more repeats on some of the prescriptions).
The major achievement for the day was getting a work pass. The feeling that I’ve now established my working identity in the new organisation as definitely there. Which follows on from the “Betwixt and Between” blog post.
The new job, to start with, looks like being a data warehousing project. The design and implementation of data warehouse is a class of IT problem which I’m very well equipped to tackle. class of The replacement of a SuperWeb solution, with a Microsoft SQL Server Analysis Services solution also greatly appeals to me. I’ve not previously had an opportunity to implement a Microsoft solution (my previous organisation being heavily an Oracle shop). Also, the appeal is doubly satisfying as removing SuperWeb is very satisfying (the previous organisation did use it, and it was a choice I never supported, for multiple reasons).
The data warehousing project exposes a number of big question in my data warehousing style, approach and understanding. Which I’ll have to work through with the new organisation and the analytical clients of the data warehouse. These include:
- Data management principles. I’m not sure how mature this organisation is in understanding the benefits of good data management principles. Some of those principles data management principles start to manifest themselves in the design decision which data warehousing solution must make.
- A Variables Dimension. Stylistically, I “like having one of these”. From an data management perspective, it allow the identification of the same measure across all analysis outcomes. The question is “Are the users of the analysis cubes ready for this?”. Another question I have is “How do Analysis Services, and Excel Pivot Table, react to having this form of construct”. I’ll have to wait and see on this one.
- Conformed dimension. These are cornerstone of the “Kimball Warehouse Bus”. They, from an analysis perspective, should allow for the pulling together of multiple cubes, and the standardisation of the label sets which also enables analysis. Another question is “How well does Excel Pivot Table, work with this construct?”.
I’ll continue to blog about these concepts, and the progress of the implementation project. I’ve a bit of evangelising and training to to to spread some understanding of the approaches which I’ve detailed above.
Today, I’m off to a Scrum Project management training course. This training will prove to be interesting. I’ve had an interest in this style of project management for a while. So, some formal training on “How to do scrum” will be useful.