Posts Tagged Linq to SQL

LINQ to SQL Use static CompiledQuery.Compile and static DataContext for BIG Performance Improvements


Introduction

This blog post is describes some of the steps I took to boost the performance of LINQ to SQL an applications (very significantly).

This “train of thought” was prompted by a blog post here on WordPress “LINQ To SQL Very Slow Performance Without Compile (CompileQuery)”. Which “got me thinking” about the data manipulation program I’ve been developing which transforms unstructured text (following a sort of pattern) into structured data, which I’m rendering using DGML (see: “Introduction to Directed Graph Markup Language ( DGML )” for a brief introduction to DGML). I probably post some of the details for generating DGML graphs on this blog (later). This program is using LINQ to SQL to do a bunch of lookups a various points. It had become a bit of a slow process, so was a perfect candidate for a “bit of a tune up” performance wise, and for the applications of some CompiledQuery.Compile enhancements.

The Implementation

There were a couple of steps in the implementation:

  • Move the declaration, and initialisation of the LINQ to SQL DataContext object (which is derived from the System.Data.Linq.DataContext Class ) from the method to a private static field on the class.
    private class DataElementsProcessor
    {
        private static DataElementsDataContext DEctx = new DataElementsDataContext();
  • Next create the declaration of the compiled query. For those reading this who are not familiar with parameterised classes and lambda expressions, I’ll try and explain what is happening in the declaration following the code snippets.
    • What is being replaced (the original LINQ to SQL query statement):
      var found = from de in DEctx.Data_Elements
                  where de.DE_Name == name
                  select de;
    • The replacement declaration:
    • private static Func<DataElementsDataContext, string, IEnumerable>
          GetDeByName = CompiledQuery.Compile((DataElementsDataContext DEctx, string name) =>
                          from de in DEctx.Data_Elements
                          where de.DE_Name == name
                          select de);
    • To explain the declaration:
      • private static: private because only this class should use this, static because I only want one of these available to all instances of the class.
      • Func> : This declares the type which the compiled LINQ to SQL Expression is. It encapusulates a delegate or lambda expression which takes the first two types as parameters and returns the sequence of DataELement objects.
      • GetDeByName: is the name of the delegate being created.
      • CompiledQuery.Compile: the method on which compiles the LINQ to SQL query (which takes the types identified in the Func< part of the declaration) and returned the compiled delegate.
      • (DataElementsDataContext DeCtx, string name) => : Identifies the types and names of the arguments to lambda expression.
      • from .. : the LINQ to SQL query which will be compiled.

Calling the Compiled Version

This is where it gets “tricky” (I’ll cover the “tricky” bits after the simple cases). The simple use cases are straight replacements:

  • Replace the query.
  • //var found = from de in DEctx.Data_Elements
    //            where de.DE_Name == name
    //            select de;
    var found = GetDeByName(DEctx, name);
  • Replace a bit more:
    • Which I think “works” a bit better. There is now no need for the variable “found” in the segment of code, and just testing the Any() on the sequence is all I wanted to do anyway.
  • //var found = GetDeByName(DEctx, name);
    // if(!found.Any())
    if (!GetDeByName(DEctx, name).Any())
    {
    
    }
  • The “tricky” bits:
    • The Sequence you get at runtime is a “iterate once only” sequence. So, the following code will throw an “System.InvalidOperationException” in System.Data.Linq.dll, with the message “<Message>The query results cannot be enumerated more than once.</Message>. In my code the situations where I’d do things which caused this type of exception was like following:
    • if (!found.Any())
      {
      }
      else
      {
          Debug.WriteLine("Rejected update already existed");
          // need to add this when I added changed to a compiled qu
          //found = GetDEByName(DECtx, DE);
          found.ToList().ForEach(A => Debug.WriteLine(String.Format("{0} {1}", A.DE_Name, A.DE_Classification)));
      }
    • This would break with the exception. The “remedy” I’ve applied has been to use a “ToList()“ to pull the results into a collection in one iteration, and then have the flexibility to iterate over the collection as much as I like. This approach cost a bit of memory, but these have been single row result sets, so there is not too much of a penalty.
      • There are probably other, more elegant, ways to “solve” this problem. But, as part of a quick upgrade to get a BIG performance improvement, they would have been more expensive.
    • // Need to pull the result set into a local list
      var found = GetDEByName(DECtx, DE).ToList();
      if (!found.Any())
      {
      }

The Results

On the program which I was working, the following table is the before, and after results.

Run 1 Run 2 Run 3 Run 4
Base Line No Compiles 114451 115143
All Queries Compile and Static Data Contexts 56906 58120 57177 56515
Percentage Improvement (start – end)/start 50.19077

This is a significant improvement in performance for what amounted to “not too much work”. The hard part was getting “my head around” the Func<> template type declaration. The numbers are from the Stopwatch Class and are ElapsedMilliseconds. The stopwatch object is the only way (superior to all other methods) to get elapsed times for executing objects in .Net.

Conclusion:

  • A vote of thanks to author of “Er. alokpandey’s Blog” for writing the article: “LINQ To SQL Very Slow Performance Without Compile (CompileQuery)”, without which I would not been prompted to implement this way of improving the performance of the program (an many more in the future I expect).
  • The conversion is not painless, if you do not understand what the Func<> is doing. Once you see what that piece of syntax is doing, then things start to fall into place.
  • I did not expect such a big gain in performance from the implementation of this change and I achieved. But, I’d not probed the code to discover where the time was being expended.
  • Apologies if you do not like the code snippet approach this post has taken. Most of the time I try and post code segments which can be used without too much modification. This time the topic is really about statements (declarations of things in the code and the syntax), rather being about “a way of doing something” (which a method can illustrate). So, I’ve used snippets of code to demonstrate the syntax used.
  • Again I yearn for the flexibility which Windows Live Spaces gave, where I could put file of code up as well as a blog post. Something which WordPress does not offer, so code snippets ibid is the solution which I’ve adopted.
Advertisements

, , , , , , , , , , , ,

1 Comment

Linq To SQL – A bit of fun – Cataloguing my CD Collection


I’ve been having a bit of fun with Linq to SQL and building some programs which will generate a catalogue of the CD’s I own. This journey has provided a couple of interesting lessons along the way. The most prominent I’ll share here.

Data Model

image I’ve settled on a very simple data model for the relational schema, which I’m storing in SQL Server Express (which seems to work without problems). The relational data model maps into the following Objects when using the ORM (Object Relational Mapping) features in Visual Studio.

The data model has one big deficiency, it is designed to cater for my musical tastes which are modern (Artists do Albums). If one wanted a “classical” focused, or accommodating, data model, then there are a bunch of fields one would add to the Artist (Conductor maybe as a synonym for FullName in my model). To tell the truth, I’ve no idea what would need to be added. I would need to do some research, and talk with a classical music audiophile.

Building the data model in Visual Studio is a snack. The relational (SQL) project gives all the functionality to build tables, define keys, define the referential integrity constraints (foreign keys). The Linq to SQL class object, is a snack to to work with, just drop the tables from the Server Explorer onto the design surface (the links get sucked up from the relational database tables.

Loading the Data

image

I’ve explored beforehand how Windows Media Player stores albums in the file system. You have a structure as shown. Where the “Shared Music” is the root directory from which all Artists (as sub directories), then Albums (as sub directories) below the Artist, are stored.

The major functional code in this part of the process is in the following two C# methods.

        private List<DirDetail> ReadDirectory(string rootDir)
        {
            List<DirDetail> retVal = new List<DirDetail>();
            foreach (string artistDir in Directory.GetDirectories(rootDir))
            {
                retVal.AddRange(ReadAlbums(rootDir, artistDir)); 
            }
            return retVal;
        }

        private List<DirDetail> ReadAlbums(string rootDir, string artistDir)
        {
            List<DirDetail> retVal = new List<DirDetail>();
            foreach (string albumDir in Directory.GetDirectories(artistDir))
            {
                DateTime created = Directory.GetCreationTime(albumDir);
                ArtistAlbum Details = new ArtistAlbum(albumDir, rootDir);
                retVal.Add(new DirDetail(Details.Artist, Details.Album, created)); 
            }
            return retVal;
        }

The rest of the support object are pretty simple data container object. The use of the List<DirDetails> and AddRange is a particularly elegant way of acuminating, in a type safe manner, a result list of things which need to be squirted into the SQL data tables.

The following two methods show how Linq to SQL makes loading to the database tables so simple.

        internal void Update(List<DirDetail> loadValues)
        {
            Guid unfiledLocation = GetUnfiledLocation();
            foreach (DirDetail LoadAlbum in loadValues)
            {
                Guid ArtistID = GetArtistID(LoadAlbum.Artist);
                InsertAlbum(unfiledLocation, ArtistID, LoadAlbum.Album, LoadAlbum.Loaded);
            }
        }

        private void InsertAlbum(Guid unfiledLocation, Guid ArtistID, string LoadAlbum, DateTime LoadDateTime)
        {
            var AlbumQuery = from Albums in ctx.Albums
                                 where Albums.Title == LoadAlbum 
                                 && Albums.ArtistID == ArtistID
                                 select Albums.AlbumID;
            foreach (var LoadedAlbumID in AlbumQuery)
                return;
            Album newAlbum = new Album();
            newAlbum.AlbumID = Guid.NewGuid();
            newAlbum.ArtistID = ArtistID;
            newAlbum.DateLoadedToLibrary = LoadDateTime;
            newAlbum.LocationID = unfiledLocation;
            newAlbum.Title = LoadAlbum;
            ctx.Albums.InsertOnSubmit(newAlbum);
            ctx.SubmitChanges();
        }

The “foreach/return” is a lazy way of  checking that the Album has not been loaded previously. Propagating the insert by just filling up the properties on the generated class and calling the “InsertOnSubmit”, “SubmitChanges” methods is just too simple. This old programmer appreciates the simplicity. No more building SQL statement, building parameters, getting the data binding right, many lines of code I did not have to write.

 

Next in this series – Using Linq to SQL in a Winforms UI (I promise I’ll write it, there are a couple things which took a bit of figuring out in this end of Linq to SQL)

Digg This

, , , , ,

Leave a comment

%d bloggers like this: