Posts Tagged Component Frameworks

Parallel Programming in .Net – Resources


Introduction

This may be the first of a series of blog post on Parallel Programming with .Net 4.0. It is a “maybe” at this point in time, in that I intended to get into this topic over the coming weeks, but one never knows what else may crop up.

This started off being a look at the synchronised collections support in .Net 4.0, but I’ve decided that I should include some other topics. The other big inclusions are intended to be:

  • Parallel Looping constructs,
  • Parallel LINQ, which the parallel looping leads into, and
  • The Task Class addition to .Net 4.0 Framework.

When you include those, you pretty much have a Parallel programming .Net 4.0 “in a nutshell”. This series may have the impetus to look into some of the more “esoteric”, “subtle” and all-round deeper issues associated with the development, implementation, testing, and deployment of parallel solutions in .Net. But, wait and see, I could come up with some “gems”, and “pearls of wisdom” on the way.

Resources

This is the start of a list of “good” resources on parallel programming in .Net. I’ll try and post further blog posts on the topic as I find more resources (I’ve some PFD’s salted away on the topic as well I’ll try and dig up).

Parallel Programming with Microsoft .NET This is a MSDN Patterns and Practices publication. It contains very good information on parallel programming.

System.Collections.Concurrent Namespace This is the .Net namespace which contains some very useful “pre- cooked” collections which can be used in concurrent (parallel) programming. My preference is to always look at these collections first, and build my own concurrent collection as a last resort.

System.Threading.Tasks Namespace This is the >net namespace which contains the Task object, and the associated infrastructure built into .Net for running parallel Tasks. Again, my preference is to look here first, and only resort to using the Thread Class as a last resort when things do not fit with supplied .Net support.

Parallel Programming in the .NET Framework This is a “top level” page, which has links to much of the .Net Parallel support.

Threading in C# – Part 5 – Parallel Programming This looks to be very good on LINQ and Parallel aspects which can be invoked in the LINQ context. There is also some details on the Task, and associated .Net Framework classes. I’ll be reading this very carefully (again, it will not for the first time I refer to these pages).

Patterns Of Parallel Programming This is a link to a PFD which contains “much” detail on C# .Net Parallel Programming. I’ve yet to digest this one (118 pages is a bit much for one sitting, I’ll have a couple of “bites at this cherry”).

This is just a start of a list of resources. I’ll post further lists of resources, as I dig into the topics I wish to cover.

Advertisements

, , , , , , ,

4 Comments

LINQ Performance Tuning: Using the LookUp Class


Introduction

This blog posts I hope to share with you the benefits of using the Lookup Class and the IEnumerable.ToLookup method for creating the Lookup Object.

The benefits of using this .Net Framework object can be a significant reduction in the time  taken to execute LINQ statements. By significant I reduced the elapse time for execution of a heavily LINQ to Objects and LINQ to XML program from 15+ minutes, to seconds.

If inspiration grabs me I may include some benchmarking test in this post, or in a subsequent posts. I’m always a bit cautious of creating artificial test cases for things like this. The artificial test case can be very misleading, as they really demonstrate the technique in the best light. The real test of the of any technique is not in a “test tube”, but in how it helps resolve real programming and performance (in this case) problems.

The Lookup Class – Overview

This is a very interesting animal in the .Net Framework. It has the following interesting features:

  • It is only created through the application of the Factory Method, which is attached to IEnumerable.ToLookup as the LINQ extension method. Classes_Diag
  • There is no public constructor for the class.
  • The Lookup Class is somewhere in between the Dictionary and List. In that it possess properties of both class, as a mixture. These properties include:
    • Like a Dictionary it is supports keyed access to the data. To use an analogy with SQL the way SQL behaves the Lookup is like an indexed table.
    • Like a List it supports storing as many of a “type” in it as possible (available memory or the CPU architecture which the process is running under being the limiting factor).
    • Unlike either List<T> or Dictionary<T>, the one key can have multiple objects stored under it. So, it is a very like a Dictionary<T1, List<T2>> .  The implementation uses an IGrouping<T2>, but the analogy holds true.

The Lookup Class – The Details

Internal StorageLookUpStructure

The Lookup class is an interesting storage mechanism. The key features are:

  • The storage is by the key, which is the important aspect which is being leveraged when improving the performance of LINQ operations.
  • Unlike the Dictionary Class, the Lookup Class stores more than one object against the key.
  • Like the List Class the order of the set of objects stored against the key, are not in an ordered storage.
  • The opposite diagram is one way to visualise the way the Lookup Class stores the data. The important points to note are:
    • The Keys must be of the same type.
    • The Objects stored must be of the same type.
    • The order of the objects within a key is undetermined. This is vey much the same as SQL tables, where the order of the rows is not guaranteed, unless you use an Order By clause (which imposes an order).
    • There can be any number of objects stored under each key.

Creating An Instance of the Lookup Class

There is not too much in the creation of a Lookup Object, apart from the caveats which are:

  • It can only be created as the output of a LINQ operation.
  • It is an invariant object structure. The Lookup object is effectively “read only” there are no ways to mutate (add or remove elements or the key set) the content.

The following example C# code shows two of the ways of creating a Lookup Object.

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var example2 = (from element in seed
                select new { key = element.Attribute("Test1").Value, value = element })
                    .Union(from element in seed
                           select new { key = element.Attribute("Test2").Value, value = element })
                    .ToLookup(A => A.key, A => A.value);

Example1 will result in a lookup indexed by the XName, containing a set of IGrouping of XElement .

Example2 will result in a lookup indexed by the the string value from the Attributes “Test1” and “Test2”, with the same XElement potentially falling under the different key value.

There are variants of the ToLookup method which take an IEqualityComparer for the type of the Key. These I must admit I’ve not needed use. My use cases for the Lookup Object has not included object types in the Key other than string, and the framework provided equality comparer for that has suited me fine (this far).

Limitations Of The Lookup Class

There are number of limitations which the Lookup Class comes with. Many of these limitations have been mentioned in this blog post already, but I’ll put a list of them in here (just for completeness). The imitations include (and these are the ones I’ve found, or found written about):

  • No public constructor. The only way to create a Lookup Object is to use the ToLookup factory method on the IEnumerable interface. Or, if you prefer, the Lookup Object can only be created as the output from LINQ operations.
  • There are no mutators for the Lookup Object. As an output from LINQ, this output sequence is effectively read only. There are no Add, or Remove, methods available for the Lookup object. If you need a different set of content, you will need to generate that set through LINQ, and create another Lookup Object.
  • The IGrouping structure of the storage which the Lookup Class is a bit of a thing to deal with. There are a couple of way to unravel this structure (I’ve found two thus far, but that’s not to say this is an exhaustive list):
      List<XElement> seed = new List<XElement>();
      var example1 = (from element in seed
                      select element)
                     .ToLookup(A => A.Name);
      // One way to unwrap the IGrouping
      var unwrap1 = example1.SelectMany(A=>A);
      // Another way to unwrap the IGrouping
      var unwarp2 = from step1 in example1
                    from unwrapped in step1
                    select unwrapped;
    • Unwrap1 uses the SelectMany method ( see:  LINQ SelectMany and IGrouping for some more on the use of SelectMany ).  The “A=>A” is required for the method, and simply says flatten the multiple input sequences (one for each key value), into one output sequence.
    • Using a nested from clause in the LINQ syntax. If you’ve done anything with LINQ to XML you would be familiar with using intermediate sequences in LINQ syntax.

Using An Instance Of The Lookup Class In LINQ

This is where the “rubber hits the road”, or more to the point where big performance improvements can be made in the execution of LINQ statements.  As with anything performance related please test these techniques in the context of your application. The following is what I have observed in the context of my application. I’ve been using equal measures of the Stopwatch Class and the Visual Studio Profiler to measure the impacts of my application of the Lookup Class.

The following is the fastest way in LINQ to work with the Lookup object (the best way I’ve found thus far).

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var dict1 = new Dictionary<XName, XElement>();
var join_sample = from lut in example1
                  join dict in dict1 on lut.Key equals dict.Key
                  select lut;

The LINQ join clause seems (this is by observation of the impact in the Visual Studio Profiler) to resolve in the indexes of the Dictionary and Lookup objects. The analogy with the way a SQL execution plan will resolve in the indexes of two tables which are being joined.

Conclusion

The performance boost that LINQ operations can gain through the judicious application of the Lookup Object, makes learning to master the use of the object well worthwhile (if you want fast code).

The DGML generating and pagination process I’m currently building has benefitted significantly from the judicious use of these objects. I’ve taken the process of generating multiple files of DGML from minutes (in the 10 to 15 minutes mark) to seconds (20 to 30 seconds).

, , , , , , , , , , , , , ,

Leave a comment

CSV, ExpandoObject and LINQ


Introduction

I have been spurred to write this post is for a number of reasons which include:

  • In response as a response to JP’s Convert a CSV to XML using C# and the idea that converting CSV to XML is good way to handle (not so)arbitrary CSV data. Which I have to agree is a good way to handle the use case. But, I had an inkling that using Framework Object was another way to tackle the use case.
  • As a personal follow up on an MSDN Magazine Article Expando Objects in C# 4 by Dino Esposito I read a while ago. Which I’d filed away, in the back of my mind, as an article that I should get back to and have a read of again.
  • In part, because I wanted to have a “play” with the ExpandoObject Class in C#,  and
  • In part, because the idea of dynamic object, within the .Net Framework, was intriguing.

That’s were this post started. So, as you will see below I’ve a quick bit of demonstration code which does some interesting things with the ExpandoObject Class.

CSV File Reading Detour

Along the way I was reminded of the important lessons about reading CSV files with .Net programs. That lesson is that regardless of your preferred approach to parse the lines of the CSV file. Be that approach using String.Split, Text.RegularExpressions.Regex, or hand coded parsing. The best way to read a CSV file is using the built-in functionality in the .Net Framework.

So, what in .Net  reads CSV files.  There are the two broad-brush areas in the .Net Framework which can read CSV files. These are the ODBC, and OLEDB data access classes.

Technology .Net Namespace Classes
ODBC System.Data.Odbc Namespace OdbcDataAdapter Class
OdbcDataReader Class
OLEDB System.Data.OleDb Namespace OleDbDataAdapter Class
OleDbDataReader Class

I’ve included the IDataReader Interface presenting classes as well because they can be very useful as well.

One of uses of the IDataReader Interface is as an input into the  SqlBulkCopy Class. The SqlBulkCopy Class is the way to load data into SQL Server with .Net. Why? It is a bulk loader, which is far more efficient than using a SQL Insert statement to load data into SQL Server. (I’ll put on my To Do list to post an example of using the SqlBulkCopy Class, I’ve one I wrote a while ago, for a work project).

Back To Reading CSV into an ExpandoObject

The acronyms reading. The following demonstrates using the ExpandoObject as the container for the fields of a CSV file.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Dynamic;      // Needed for the ExpandoObject
using System.IO;           // Needed for the StreamReader
using System.Diagnostics;  // Needed for the Debug object, my standard dump location

namespace CSV_With_ExpandoObjects
{
    /// <summary>
    /// Default class generated for Console applications
    /// </summary>
    class Program
    {
        /// <summary>
        /// Main method, invoked by the execution framework.
        ///
        ///
args">unused</param>
        static void Main(string[] args)
        {
            // Test file name to be loaded
            string FileName = @"..\..\..\Acronyms1.csv";
            // My CSV reader and load into ExpandoObject
            Acronyms acronyms = new Acronyms();
            List<ExpandoObject> acroynmList = acronyms.LoadCSV(FileName);
            // LINQ using ExpandoObjects
            // LINQ with some "fancy dancing" to cast the dynamic object in such a way that the proprties can be used.
            // Not a very robust way of dealing with things, the cast expression could break if the property is not there.
            var a = from acroynm in acroynmList
                    where String.IsNullOrEmpty((string)((IDictionary<string, object>)acroynm)["Meaning"])
                    select (string)((IDictionary<string, object>)acroynm)["Acrynom"];
            // Just to check the LINQ statement worked
            Debug.WriteLine(String.Format("Acronyms without meanings {0}", a.Count()));
        }
    }
    /// <summary>
    /// Class which is used read the CSV file of acronyms
    /// </summary>
    internal class Acronyms
    {
        /// <summary>
        /// CSV delimiters - Yes I know I'm using a very simple CSV form
        /// </summary>
        private char[] delimiters = { ',' };
        /// <summary>
        /// List of expando object created
        /// </summary>
        private List<ExpandoObject> loadedAcronyms;
        /// <summary>
        /// Default constructor
        /// </summary>
        public Acronyms()
        {
        }
        /// <summary>
        /// Method reads the filename as a CSV file, and creates the List of ExpandObjects
        /// </summary>
        /// <param name="fileName">File of CSV to be read</param>
        /// <returns>List of ExpandoObject, one object for each line in the CSV file</returns>
        internal List<ExpandoObject> LoadCSV(string fileName)
        {
            List<string> Fields = ReadHeader(fileName);
            loadedAcronyms = ReadFileData(fileName, Fields);
            Debug.WriteLine(loadedAcronyms.Count);
            return loadedAcronyms;
        }
        /// <summary>
        /// Reads the data file of CSV.
        /// Skips the first line, as that has the header row.
        /// Very simple file read and parse the CSV.
        /// </summary>
        /// <param name="fileName">File of CSV to be read</param>
        /// <param name="Fields">The Header Row parsed to identify the columns</param>
        /// <returns>List of ExpandoObject, one object for each line in the CSV file</returns>
        private List<ExpandoObject> ReadFileData(string fileName, List<string> Fields)
        {
            List<ExpandoObject> result = new List<ExpandoObject>();
            using (StreamReader rdr = new StreamReader(fileName))
            {
                string line = rdr.ReadLine();
                while (!rdr.EndOfStream)
                {
                    line = rdr.ReadLine();
                    string[] tokens = line.Split(delimiters);
                    dynamic add = new ExpandoObject();
                    for (int i = 0; i < tokens.Length; i++)
                    {
                        if (i >= Fields.Count())
                            continue;
                        if (Fields[i] == "Acrynom")
                        {
                            add.Acrynom = tokens[i].Trim();
                            continue;
                        }
                        if (Fields[i] == "Meaning")
                        {
                            add.Meaning = tokens[i].Trim();
                            continue;
                        }
                        ((IDictionary<string, object>)add).Add(new KeyValuePair<string, object>(Fields[i], tokens[i]));
                        // Demonstrates, in a limited context that a Dynamic Object "plays"
                        //  plays like a real object with dynamically added properties.
                        if (add.Acrynom == String.Empty)
                        {
                        }
                        if (!((IDictionary<string, object>)add).ContainsKey("Meaning"))
                        {
                            add.Meaning = string.Empty;
                        }
                    }
                    result.Add(add);
                }
            }
            return result;
        }
        /// <summary>
        /// Reads the header line from a CSV file, and use that line as the field names for reading that file.
        /// </summary>
        /// <param name="fileName">Filename of the CSV to be read</param>
        /// <returns>List of strings, one string for each column in the header record</returns>
        private List<string> ReadHeader(string fileName)
        {
            string fieldList;
            using (StreamReader rdr = new StreamReader(fileName))
            {
                fieldList = rdr.ReadLine();
            }
            return fieldList.Split(delimiters, StringSplitOptions.RemoveEmptyEntries).Select(A=>A.Trim()).ToList();
        }
    }
}

Conclusion

An interesting way to read a CSV file. The ExpandObject is probably not demonstrated in the best light (there are bound to be better ways to use it).

Reading CSV, as I comments above, should be done with the .Net Framework technology, when reading is required. For quick or one off bits of code, with simple parsing requirements the do-it-yourself  approach is OK.

Some of my previous LINQ Posts

, , , , , , , , , ,

1 Comment

%d bloggers like this: