Posts Tagged Visual Studio 2010

TPL Dataflow–First Tests


Introduction

Over the last couple of weeks, i attended Tech Ed Australia. The sessions by Joseph Albahari on what is coming C# 4 and available in the Async CTP now, has finally spurred me to go exploring.

What I wanted to achieve

I set out to start creating some simple examples of data flow networks, which employ the asynchronous elements of the framework. The key elements of my testing, and explorations where:

  • As is my want, I always do my exploring with console applications, rather than more elaborate UI’s. This put me in bit of a bind as most of the examples I would locate were things with Winforms or WPF / ZAML  UI’s. I’d rather not spend time mucking about with the UI.
  • Having a UI thread, makes some of the async examples far simpler, as there is a tread which wants to keep running until the application is closed. This makes doing thing with async CTP data flow much simpler.
  • Most of the examples used external things like web pages to get the async happening. I wanted to just have a data driven, from within the program, approach.

Where Did I Get

Example 1

private static void Test1()
{
    Action<int> fred = (i) =>
    {
        int j = i + 1;
        Debug.WriteLine(j);
    };
    var a = new ActionBlock<int>(fred);
    a.Post(1);
    a.Post(2);
    a.Post(3);
    Debug.WriteLine("Test 1 Done");
}

This seems to work, but there is a hidden fault in the code (more on  that later).

On the up side, getting a simple ActionBlock and posting to it may seem trivial, but proves things are installed correctly

Example 2

private static void Test3()
{
    var actor = new ActionBlock<int>((i) =>
    {
        Debug.WriteLine(i);
    });
    var trans1 = new TransformBlock<int, int>((i) =>
    {
        return i * 2;
    }
    );
    trans1.LinkTo(actor);
    for (int i = 1; i < 10; i++)
    {
        trans1.Post(i);
    }
    Debug.WriteLine("Done Test 3");
}

This is an example of wiring two async processing blocks together (the LinkTo in the code).  Again, trivial but does prove the point that I am on the right track. Again, there is an error in here, which does not become obvious in a trivial example.

Example 3

private static void Test4()
{
    var actor = new ActionBlock<int>((i) =>
    {
        Debug.WriteLine(
            string.Format("Action {0}", i));
    });
    var trans1 = new TransformBlock<int, int>((i) =>
    {
        int res = i * 2;
        Debug.WriteLine(
            String.Format("Transform of {0} to {1} Done", i, res));
        return res;
    }
    );
    trans1.LinkTo(actor);
    for (int i = 1; i < 10; i++)
    {
        if (trans1.Post(i))
            Debug.WriteLine(
                String.Format("Post of {0} Succeeded", i));
        else Debug.WriteLine(
                String.Format("Post of {0} Not Accepted", i));
    }
    Debug.WriteLine("Done Test 4");
}

This is very much like Example 2, but provides nicer output. The benefit of the output is the you can see that things are happening asynchronously.

Example 4

private static BroadcastBlock<int> Test5()
{
    int factor1 = 2;
    var trans1 = new TransformBlock<int, int>((i) =>
    {
        int res = i * factor1;
        Debug.WriteLine(
            String.Format("1 Transform of {0} to {1} Done", i, res));
        return res;
    }
    );
    int factor2 = 3;
    var trans2 = new TransformBlock<int, int>((i) =>
    {
        int res = i * factor2;
        Debug.WriteLine(
            String.Format("2 Transform of {0} to {1} Done", i, res));
        return res;
    }
    );
    var actor1 = new ActionBlock<int>((i) =>
    {
        Debug.WriteLine(
            string.Format("1 Action {0}", i));
    });
    var actor2 = new ActionBlock<int>((i) =>
    {
        Debug.WriteLine(
            string.Format("2 Action {0}", i));
    });
    trans1.LinkTo(actor1);
    trans2.LinkTo(actor2);
    BroadcastBlock<int> bcBlock =
        new BroadcastBlock<int>((i) =>
        {
            return i;
        });
    bcBlock.LinkTo(trans1);
    bcBlock.LinkTo(trans2);
    return bcBlock;
}

This is the part which builds the data flow network, which is getting more elaborate. There are five elements which are linked together to provide a more interesting network.

private static void Test5_Main()
{
    BroadcastBlock<int> bcBlock = Test5();
    for (int i = 0; i <= 10; i++)
    {
        if (bcBlock.Post(i))
            Debug.WriteLine(
                String.Format("Post of {0} Succeeded", i));
        else Debug.WriteLine(
                String.Format("Post of {0} Not Accepted", i));
    }
    bcBlock.Complete();
    while (bcBlock.Completion.IsCompleted == false)
    {
        Debug.WriteLine("Thread Sleeping");
        Thread.Sleep(1000);
    }
    Debug.WriteLine("Done");
    return;
}

This is the main for this example. There are a couple of key point, and a solution to the problem (or bug) I mentioned above. The bug was that sometimes the networks were not getting enough time to complete before the main thread was exiting. There result was that sometimes the output would be incomplete.  There key elements are:

  • Posting the data into the network.
  • The Broadcast block sends that data into both of the two TransformBlock and ActionBlock chains.
  • The BroadcastBlock Complete, tells the BroadcastBlock that there will be no more inputs, and when you are empty, to mark yourself “Completed”.
  • The while loop keeps the main thread running, while the data passes through the network.

Conclusions

  • There are simple to the point of trivial examples, but a place to start.
  • The Async CTP is something which I will continue to play with. The construction of these examples has been an interesting learning journey.
  • I have some ideas for more elaborate networks, and simulations of networks of relationships.
  • Future posts could start to explore more of what is available from the Async Data Flow CPT.
Advertisements

, , , , ,

1 Comment

LINQ Performance Tuning: Using the LookUp Class


Introduction

This blog posts I hope to share with you the benefits of using the Lookup Class and the IEnumerable.ToLookup method for creating the Lookup Object.

The benefits of using this .Net Framework object can be a significant reduction in the time  taken to execute LINQ statements. By significant I reduced the elapse time for execution of a heavily LINQ to Objects and LINQ to XML program from 15+ minutes, to seconds.

If inspiration grabs me I may include some benchmarking test in this post, or in a subsequent posts. I’m always a bit cautious of creating artificial test cases for things like this. The artificial test case can be very misleading, as they really demonstrate the technique in the best light. The real test of the of any technique is not in a “test tube”, but in how it helps resolve real programming and performance (in this case) problems.

The Lookup Class – Overview

This is a very interesting animal in the .Net Framework. It has the following interesting features:

  • It is only created through the application of the Factory Method, which is attached to IEnumerable.ToLookup as the LINQ extension method. Classes_Diag
  • There is no public constructor for the class.
  • The Lookup Class is somewhere in between the Dictionary and List. In that it possess properties of both class, as a mixture. These properties include:
    • Like a Dictionary it is supports keyed access to the data. To use an analogy with SQL the way SQL behaves the Lookup is like an indexed table.
    • Like a List it supports storing as many of a “type” in it as possible (available memory or the CPU architecture which the process is running under being the limiting factor).
    • Unlike either List<T> or Dictionary<T>, the one key can have multiple objects stored under it. So, it is a very like a Dictionary<T1, List<T2>> .  The implementation uses an IGrouping<T2>, but the analogy holds true.

The Lookup Class – The Details

Internal StorageLookUpStructure

The Lookup class is an interesting storage mechanism. The key features are:

  • The storage is by the key, which is the important aspect which is being leveraged when improving the performance of LINQ operations.
  • Unlike the Dictionary Class, the Lookup Class stores more than one object against the key.
  • Like the List Class the order of the set of objects stored against the key, are not in an ordered storage.
  • The opposite diagram is one way to visualise the way the Lookup Class stores the data. The important points to note are:
    • The Keys must be of the same type.
    • The Objects stored must be of the same type.
    • The order of the objects within a key is undetermined. This is vey much the same as SQL tables, where the order of the rows is not guaranteed, unless you use an Order By clause (which imposes an order).
    • There can be any number of objects stored under each key.

Creating An Instance of the Lookup Class

There is not too much in the creation of a Lookup Object, apart from the caveats which are:

  • It can only be created as the output of a LINQ operation.
  • It is an invariant object structure. The Lookup object is effectively “read only” there are no ways to mutate (add or remove elements or the key set) the content.

The following example C# code shows two of the ways of creating a Lookup Object.

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var example2 = (from element in seed
                select new { key = element.Attribute("Test1").Value, value = element })
                    .Union(from element in seed
                           select new { key = element.Attribute("Test2").Value, value = element })
                    .ToLookup(A => A.key, A => A.value);

Example1 will result in a lookup indexed by the XName, containing a set of IGrouping of XElement .

Example2 will result in a lookup indexed by the the string value from the Attributes “Test1” and “Test2”, with the same XElement potentially falling under the different key value.

There are variants of the ToLookup method which take an IEqualityComparer for the type of the Key. These I must admit I’ve not needed use. My use cases for the Lookup Object has not included object types in the Key other than string, and the framework provided equality comparer for that has suited me fine (this far).

Limitations Of The Lookup Class

There are number of limitations which the Lookup Class comes with. Many of these limitations have been mentioned in this blog post already, but I’ll put a list of them in here (just for completeness). The imitations include (and these are the ones I’ve found, or found written about):

  • No public constructor. The only way to create a Lookup Object is to use the ToLookup factory method on the IEnumerable interface. Or, if you prefer, the Lookup Object can only be created as the output from LINQ operations.
  • There are no mutators for the Lookup Object. As an output from LINQ, this output sequence is effectively read only. There are no Add, or Remove, methods available for the Lookup object. If you need a different set of content, you will need to generate that set through LINQ, and create another Lookup Object.
  • The IGrouping structure of the storage which the Lookup Class is a bit of a thing to deal with. There are a couple of way to unravel this structure (I’ve found two thus far, but that’s not to say this is an exhaustive list):
      List<XElement> seed = new List<XElement>();
      var example1 = (from element in seed
                      select element)
                     .ToLookup(A => A.Name);
      // One way to unwrap the IGrouping
      var unwrap1 = example1.SelectMany(A=>A);
      // Another way to unwrap the IGrouping
      var unwarp2 = from step1 in example1
                    from unwrapped in step1
                    select unwrapped;
    • Unwrap1 uses the SelectMany method ( see:  LINQ SelectMany and IGrouping for some more on the use of SelectMany ).  The “A=>A” is required for the method, and simply says flatten the multiple input sequences (one for each key value), into one output sequence.
    • Using a nested from clause in the LINQ syntax. If you’ve done anything with LINQ to XML you would be familiar with using intermediate sequences in LINQ syntax.

Using An Instance Of The Lookup Class In LINQ

This is where the “rubber hits the road”, or more to the point where big performance improvements can be made in the execution of LINQ statements.  As with anything performance related please test these techniques in the context of your application. The following is what I have observed in the context of my application. I’ve been using equal measures of the Stopwatch Class and the Visual Studio Profiler to measure the impacts of my application of the Lookup Class.

The following is the fastest way in LINQ to work with the Lookup object (the best way I’ve found thus far).

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var dict1 = new Dictionary<XName, XElement>();
var join_sample = from lut in example1
                  join dict in dict1 on lut.Key equals dict.Key
                  select lut;

The LINQ join clause seems (this is by observation of the impact in the Visual Studio Profiler) to resolve in the indexes of the Dictionary and Lookup objects. The analogy with the way a SQL execution plan will resolve in the indexes of two tables which are being joined.

Conclusion

The performance boost that LINQ operations can gain through the judicious application of the Lookup Object, makes learning to master the use of the object well worthwhile (if you want fast code).

The DGML generating and pagination process I’m currently building has benefitted significantly from the judicious use of these objects. I’ve taken the process of generating multiple files of DGML from minutes (in the 10 to 15 minutes mark) to seconds (20 to 30 seconds).

, , , , , , , , , , , , , ,

Leave a comment

Linq To SQL – A bit of fun – Cataloguing my CD Collection


I’ve been having a bit of fun with Linq to SQL and building some programs which will generate a catalogue of the CD’s I own. This journey has provided a couple of interesting lessons along the way. The most prominent I’ll share here.

Data Model

image I’ve settled on a very simple data model for the relational schema, which I’m storing in SQL Server Express (which seems to work without problems). The relational data model maps into the following Objects when using the ORM (Object Relational Mapping) features in Visual Studio.

The data model has one big deficiency, it is designed to cater for my musical tastes which are modern (Artists do Albums). If one wanted a “classical” focused, or accommodating, data model, then there are a bunch of fields one would add to the Artist (Conductor maybe as a synonym for FullName in my model). To tell the truth, I’ve no idea what would need to be added. I would need to do some research, and talk with a classical music audiophile.

Building the data model in Visual Studio is a snack. The relational (SQL) project gives all the functionality to build tables, define keys, define the referential integrity constraints (foreign keys). The Linq to SQL class object, is a snack to to work with, just drop the tables from the Server Explorer onto the design surface (the links get sucked up from the relational database tables.

Loading the Data

image

I’ve explored beforehand how Windows Media Player stores albums in the file system. You have a structure as shown. Where the “Shared Music” is the root directory from which all Artists (as sub directories), then Albums (as sub directories) below the Artist, are stored.

The major functional code in this part of the process is in the following two C# methods.

        private List<DirDetail> ReadDirectory(string rootDir)
        {
            List<DirDetail> retVal = new List<DirDetail>();
            foreach (string artistDir in Directory.GetDirectories(rootDir))
            {
                retVal.AddRange(ReadAlbums(rootDir, artistDir)); 
            }
            return retVal;
        }

        private List<DirDetail> ReadAlbums(string rootDir, string artistDir)
        {
            List<DirDetail> retVal = new List<DirDetail>();
            foreach (string albumDir in Directory.GetDirectories(artistDir))
            {
                DateTime created = Directory.GetCreationTime(albumDir);
                ArtistAlbum Details = new ArtistAlbum(albumDir, rootDir);
                retVal.Add(new DirDetail(Details.Artist, Details.Album, created)); 
            }
            return retVal;
        }

The rest of the support object are pretty simple data container object. The use of the List<DirDetails> and AddRange is a particularly elegant way of acuminating, in a type safe manner, a result list of things which need to be squirted into the SQL data tables.

The following two methods show how Linq to SQL makes loading to the database tables so simple.

        internal void Update(List<DirDetail> loadValues)
        {
            Guid unfiledLocation = GetUnfiledLocation();
            foreach (DirDetail LoadAlbum in loadValues)
            {
                Guid ArtistID = GetArtistID(LoadAlbum.Artist);
                InsertAlbum(unfiledLocation, ArtistID, LoadAlbum.Album, LoadAlbum.Loaded);
            }
        }

        private void InsertAlbum(Guid unfiledLocation, Guid ArtistID, string LoadAlbum, DateTime LoadDateTime)
        {
            var AlbumQuery = from Albums in ctx.Albums
                                 where Albums.Title == LoadAlbum 
                                 && Albums.ArtistID == ArtistID
                                 select Albums.AlbumID;
            foreach (var LoadedAlbumID in AlbumQuery)
                return;
            Album newAlbum = new Album();
            newAlbum.AlbumID = Guid.NewGuid();
            newAlbum.ArtistID = ArtistID;
            newAlbum.DateLoadedToLibrary = LoadDateTime;
            newAlbum.LocationID = unfiledLocation;
            newAlbum.Title = LoadAlbum;
            ctx.Albums.InsertOnSubmit(newAlbum);
            ctx.SubmitChanges();
        }

The “foreach/return” is a lazy way of  checking that the Album has not been loaded previously. Propagating the insert by just filling up the properties on the generated class and calling the “InsertOnSubmit”, “SubmitChanges” methods is just too simple. This old programmer appreciates the simplicity. No more building SQL statement, building parameters, getting the data binding right, many lines of code I did not have to write.

 

Next in this series – Using Linq to SQL in a Winforms UI (I promise I’ll write it, there are a couple things which took a bit of figuring out in this end of Linq to SQL)

Digg This

, , , , ,

Leave a comment

%d bloggers like this: