Posts Tagged LINQ to Objects

LINQ Performance Tuning: Using the LookUp Class


Introduction

This blog posts I hope to share with you the benefits of using the Lookup Class and the IEnumerable.ToLookup method for creating the Lookup Object.

The benefits of using this .Net Framework object can be a significant reduction in the time  taken to execute LINQ statements. By significant I reduced the elapse time for execution of a heavily LINQ to Objects and LINQ to XML program from 15+ minutes, to seconds.

If inspiration grabs me I may include some benchmarking test in this post, or in a subsequent posts. I’m always a bit cautious of creating artificial test cases for things like this. The artificial test case can be very misleading, as they really demonstrate the technique in the best light. The real test of the of any technique is not in a “test tube”, but in how it helps resolve real programming and performance (in this case) problems.

The Lookup Class – Overview

This is a very interesting animal in the .Net Framework. It has the following interesting features:

  • It is only created through the application of the Factory Method, which is attached to IEnumerable.ToLookup as the LINQ extension method. Classes_Diag
  • There is no public constructor for the class.
  • The Lookup Class is somewhere in between the Dictionary and List. In that it possess properties of both class, as a mixture. These properties include:
    • Like a Dictionary it is supports keyed access to the data. To use an analogy with SQL the way SQL behaves the Lookup is like an indexed table.
    • Like a List it supports storing as many of a “type” in it as possible (available memory or the CPU architecture which the process is running under being the limiting factor).
    • Unlike either List<T> or Dictionary<T>, the one key can have multiple objects stored under it. So, it is a very like a Dictionary<T1, List<T2>> .  The implementation uses an IGrouping<T2>, but the analogy holds true.

The Lookup Class – The Details

Internal StorageLookUpStructure

The Lookup class is an interesting storage mechanism. The key features are:

  • The storage is by the key, which is the important aspect which is being leveraged when improving the performance of LINQ operations.
  • Unlike the Dictionary Class, the Lookup Class stores more than one object against the key.
  • Like the List Class the order of the set of objects stored against the key, are not in an ordered storage.
  • The opposite diagram is one way to visualise the way the Lookup Class stores the data. The important points to note are:
    • The Keys must be of the same type.
    • The Objects stored must be of the same type.
    • The order of the objects within a key is undetermined. This is vey much the same as SQL tables, where the order of the rows is not guaranteed, unless you use an Order By clause (which imposes an order).
    • There can be any number of objects stored under each key.

Creating An Instance of the Lookup Class

There is not too much in the creation of a Lookup Object, apart from the caveats which are:

  • It can only be created as the output of a LINQ operation.
  • It is an invariant object structure. The Lookup object is effectively “read only” there are no ways to mutate (add or remove elements or the key set) the content.

The following example C# code shows two of the ways of creating a Lookup Object.

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var example2 = (from element in seed
                select new { key = element.Attribute("Test1").Value, value = element })
                    .Union(from element in seed
                           select new { key = element.Attribute("Test2").Value, value = element })
                    .ToLookup(A => A.key, A => A.value);

Example1 will result in a lookup indexed by the XName, containing a set of IGrouping of XElement .

Example2 will result in a lookup indexed by the the string value from the Attributes “Test1” and “Test2”, with the same XElement potentially falling under the different key value.

There are variants of the ToLookup method which take an IEqualityComparer for the type of the Key. These I must admit I’ve not needed use. My use cases for the Lookup Object has not included object types in the Key other than string, and the framework provided equality comparer for that has suited me fine (this far).

Limitations Of The Lookup Class

There are number of limitations which the Lookup Class comes with. Many of these limitations have been mentioned in this blog post already, but I’ll put a list of them in here (just for completeness). The imitations include (and these are the ones I’ve found, or found written about):

  • No public constructor. The only way to create a Lookup Object is to use the ToLookup factory method on the IEnumerable interface. Or, if you prefer, the Lookup Object can only be created as the output from LINQ operations.
  • There are no mutators for the Lookup Object. As an output from LINQ, this output sequence is effectively read only. There are no Add, or Remove, methods available for the Lookup object. If you need a different set of content, you will need to generate that set through LINQ, and create another Lookup Object.
  • The IGrouping structure of the storage which the Lookup Class is a bit of a thing to deal with. There are a couple of way to unravel this structure (I’ve found two thus far, but that’s not to say this is an exhaustive list):
      List<XElement> seed = new List<XElement>();
      var example1 = (from element in seed
                      select element)
                     .ToLookup(A => A.Name);
      // One way to unwrap the IGrouping
      var unwrap1 = example1.SelectMany(A=>A);
      // Another way to unwrap the IGrouping
      var unwarp2 = from step1 in example1
                    from unwrapped in step1
                    select unwrapped;
    • Unwrap1 uses the SelectMany method ( see:  LINQ SelectMany and IGrouping for some more on the use of SelectMany ).  The “A=>A” is required for the method, and simply says flatten the multiple input sequences (one for each key value), into one output sequence.
    • Using a nested from clause in the LINQ syntax. If you’ve done anything with LINQ to XML you would be familiar with using intermediate sequences in LINQ syntax.

Using An Instance Of The Lookup Class In LINQ

This is where the “rubber hits the road”, or more to the point where big performance improvements can be made in the execution of LINQ statements.  As with anything performance related please test these techniques in the context of your application. The following is what I have observed in the context of my application. I’ve been using equal measures of the Stopwatch Class and the Visual Studio Profiler to measure the impacts of my application of the Lookup Class.

The following is the fastest way in LINQ to work with the Lookup object (the best way I’ve found thus far).

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var dict1 = new Dictionary<XName, XElement>();
var join_sample = from lut in example1
                  join dict in dict1 on lut.Key equals dict.Key
                  select lut;

The LINQ join clause seems (this is by observation of the impact in the Visual Studio Profiler) to resolve in the indexes of the Dictionary and Lookup objects. The analogy with the way a SQL execution plan will resolve in the indexes of two tables which are being joined.

Conclusion

The performance boost that LINQ operations can gain through the judicious application of the Lookup Object, makes learning to master the use of the object well worthwhile (if you want fast code).

The DGML generating and pagination process I’m currently building has benefitted significantly from the judicious use of these objects. I’ve taken the process of generating multiple files of DGML from minutes (in the 10 to 15 minutes mark) to seconds (20 to 30 seconds).

Advertisements

, , , , , , , , , , , , , ,

Leave a comment

Some LINQ Performance “Rules of thumb”


Use indexed classes

Use indexed classes, or more precisely use structures which support keyed access (Dictionary Class and Lookup Class). The SQL analogy of these structures being like indexed tables, in comparison to List which behaves like an unindexed SQL table, seems to hold true with LINQ.

There is a tipping point with both SQL and LINQ, where the cost of establishing the indexed storage is less than the performance improvement you gain from using it. With LINQ you will have to find that through experimentation with your own use. Use the Stopwatch Class and measure the impact of one strategy over the other.

The performance impact of using indexed data structures is particularly marked when you doing joins (equijoin) between sequences in LINQ think seriously about feeding the sequences into a structure which supports a key, and then joining on the keys.

Use the Join Syntax When Performing an Equijoin

Use the join syntax in LINQ. By inspections/divination this  syntax

var seed = new List<XElement>();
var seq1 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var seq2 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var example1 = from a in seq1
               join b in seq2 on a.Key equals b.Key
               select a;

is “better” (can be significantly faster) than syntax

var seed = new List<XElement>();
var seq1 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var seq2 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var example2 = from a in seq1
               from b in seq2
               where a.Key == b.Key
               select a;

I am not sure what is happening under the hood, but the execution elapse times are far better when using the join syntax. Part of the improvement probably comes from the fact that in the second version the string “==” operator is firing multiple time (that’s what the Visual Studio Profiler said).

Beware of Implicit Constructors Firing Multiple Times

This one is another which the Visual Studio Profiler highlighted. The following syntax will fire the XName constructor multiple times. The

var seed = new List<XElement>();
// NB, both == and the Attribute
var example3 = from a in seed
               where a.Name == "Fred" && a.Attribute("Fred").Value == "Value"
               select a;

The following is a very simple optimisation, but one which saves “a bucket load” of CPU cycles (multiple calls to the XName constructor).

var seed = new List<XElement>();
// Form the XName once, and use it multiple time
XName Fred = "Fred";
var example3 = from a in seed
               where a.Name == Fred && a.Attribute(Fred).Value == "Value"
               select a;

Beware of || in Where Clauses

This one comes straight from my experience with SQL (both Oracle and SQL Server), and translates into what I’ve observed in the execution of LINQ statements. The or operator (||) in LINQ where clauses can cause very slow execution. .

For simple cases you can get a performance boost just by rewriting the LINQ statement as a Union between the two different sides of the or. I’ll  leave it to you to verify the result you get out are the same. Also, this is another optimisation which using the Stopwatch Class to test the performance of a before and after case. For example:

var seed = new List<XElement>();
XName Fred = "Fred";
XName Joe = "Joe";
var example4 = from a in seed
               where a.Attribute(Fred).Value == "test1" || a.Attribute(Joe).Value == "test2"
               select a;

could be rewritten as:

var example5 = (from a in seed
                where a.Attribute(Fred).Value == "test1"
                select a)
                    .Union(from a in seed
                           where a.Attribute(Joe).Value == "test2"
                           select a);

Conclusion

That’s 4 lessons I’ve learned in the last couple of days performance tuning my  DGML generating and manipulating program. The Stopwatch Class and Visual Studio Profiler have been invaluable in identifying the performance bottlenecks, or hot spots, in the system. The judicious redesign of some of the sequences to use Dictionary and Lookup classes, and joining the sequences on the keys of those collections, yielded big improvements in the performance.

LINQ seems to be “balanced on a knife edge” at time. Minor redesigns can result in the execution time going from minutes to seconds in execution time.

, , , , , , , , , , , , ,

Leave a comment

%d bloggers like this: