Some LINQ Performance “Rules of thumb”


Use indexed classes

Use indexed classes, or more precisely use structures which support keyed access (Dictionary Class and Lookup Class). The SQL analogy of these structures being like indexed tables, in comparison to List which behaves like an unindexed SQL table, seems to hold true with LINQ.

There is a tipping point with both SQL and LINQ, where the cost of establishing the indexed storage is less than the performance improvement you gain from using it. With LINQ you will have to find that through experimentation with your own use. Use the Stopwatch Class and measure the impact of one strategy over the other.

The performance impact of using indexed data structures is particularly marked when you doing joins (equijoin) between sequences in LINQ think seriously about feeding the sequences into a structure which supports a key, and then joining on the keys.

Use the Join Syntax When Performing an Equijoin

Use the join syntax in LINQ. By inspections/divination this  syntax

var seed = new List<XElement>();
var seq1 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var seq2 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var example1 = from a in seq1
               join b in seq2 on a.Key equals b.Key
               select a;

is “better” (can be significantly faster) than syntax

var seed = new List<XElement>();
var seq1 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var seq2 = seed.Select(A => new { key = A.Name, value = A })
    .ToLookup(A => A.key, A => A.value);
var example2 = from a in seq1
               from b in seq2
               where a.Key == b.Key
               select a;

I am not sure what is happening under the hood, but the execution elapse times are far better when using the join syntax. Part of the improvement probably comes from the fact that in the second version the string “==” operator is firing multiple time (that’s what the Visual Studio Profiler said).

Beware of Implicit Constructors Firing Multiple Times

This one is another which the Visual Studio Profiler highlighted. The following syntax will fire the XName constructor multiple times. The

var seed = new List<XElement>();
// NB, both == and the Attribute
var example3 = from a in seed
               where a.Name == "Fred" && a.Attribute("Fred").Value == "Value"
               select a;

The following is a very simple optimisation, but one which saves “a bucket load” of CPU cycles (multiple calls to the XName constructor).

var seed = new List<XElement>();
// Form the XName once, and use it multiple time
XName Fred = "Fred";
var example3 = from a in seed
               where a.Name == Fred && a.Attribute(Fred).Value == "Value"
               select a;

Beware of || in Where Clauses

This one comes straight from my experience with SQL (both Oracle and SQL Server), and translates into what I’ve observed in the execution of LINQ statements. The or operator (||) in LINQ where clauses can cause very slow execution. .

For simple cases you can get a performance boost just by rewriting the LINQ statement as a Union between the two different sides of the or. I’ll  leave it to you to verify the result you get out are the same. Also, this is another optimisation which using the Stopwatch Class to test the performance of a before and after case. For example:

var seed = new List<XElement>();
XName Fred = "Fred";
XName Joe = "Joe";
var example4 = from a in seed
               where a.Attribute(Fred).Value == "test1" || a.Attribute(Joe).Value == "test2"
               select a;

could be rewritten as:

var example5 = (from a in seed
                where a.Attribute(Fred).Value == "test1"
                select a)
                    .Union(from a in seed
                           where a.Attribute(Joe).Value == "test2"
                           select a);

Conclusion

That’s 4 lessons I’ve learned in the last couple of days performance tuning my  DGML generating and manipulating program. The Stopwatch Class and Visual Studio Profiler have been invaluable in identifying the performance bottlenecks, or hot spots, in the system. The judicious redesign of some of the sequences to use Dictionary and Lookup classes, and joining the sequences on the keys of those collections, yielded big improvements in the performance.

LINQ seems to be “balanced on a knife edge” at time. Minor redesigns can result in the execution time going from minutes to seconds in execution time.

Advertisements

, , , , , , , , , , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: