LINQ Short Takes – Number 4 –Make Union into a UnionAll


Introduction

One thing that has exasperated and puzzled me for some time is the absence of a UnionAll from the Linq default implementation. Granted pure relational algebra that was the basis of the initial SQL implementations only defined Union. Now modern SQL implementations have had the UnionAll variant added. SQL Server has the Union All (see: UNION (Transact-SQL)). Why the designers and implementers of Linq ignored the UnionAll variant remains a mystery. Maybe it is because it is relatively easy to make Union implementation operate in the way UnionAll operates.

The Solution – Design

The Linq Union extension method comes in two overloads, one of these overloads (see: Enumerable.Union(TSource) Method (IEnumerable(TSource), IEnumerable(TSource), IEqualityComparer(TSource)) (System.Linq)) takes an IEqualityComparer Interface.

The way to get the Union extension method to operate like a UnionAll is to provide an implementation of this interface that says ‘everything is different’. This causes the distinct phase of the Union to keep everything, rather than default behaviour of throwing way the second, and subsequent, of any duplicate.

The Solution – Implementation – EverythingDifferentEqualityComaperer Class

There is not much to the implementation. Simply explained, the understanding that you want nothing to be equal, and hence have all inputs in the output is the key to this implementation. The Equals method returning false in all cases achieves this requirement.

/// <summary> /// A class which implements the IEqualityComparer Interface. /// The implementation makes all elements not the same. /// Currently, this is class used in the UnionAll extension method /// to achieve the all rows preserved. /// /// <typeparam name="TSource">The type of objects being compared.typeparam> internal class EverythingDifferentEqualityComaperer<TSource>
    : IEqualityComparer<TSource>
{
    /// <summary> /// Public constructor for the class. /// </summary> public EverythingDifferentEqualityComaperer( )
    {

    }

    #region IEqualityComparer<TSource> Members
    /// <summary> /// This method says 'everything is different' /// </summary> /// <param name="x">One of the objects of <typeparamref name="TSource"/> to be compared. /// The implementation ignores the object.</param> /// <param name="y">One of the objects of <typeparamref name="TSource"/> to be compared. /// The implementation ignores the object.</param> /// <returns>false for all values.</returns> bool IEqualityComparer<TSource>.Equals(TSource x, TSource y)
    {
        return false;
    }
    /// <summary> /// Returns the hash code of the object. /// </summary> /// <param name="obj">The object of <typeparamref name="TSource"/> /// Hash code of the object.</returns> int IEqualityComparer.GetHashCode(TSource obj)
    {
        if (Object.ReferenceEquals(obj, null))
            return 0;
        return obj.GetHashCode( );
    }

    #endregion }

The Solution – Implementation – UnionAll Extension Method

The following is the implementation of the UnionAll. It simply creates an instance of the EverythingDifferentEqualityComaperer class and calls the Linq Union extension method.

/// <summary> /// This method implements the UnionAll extension method. /// This extension method results in all of the rows in <paramref name="Source1"/> /// having all the rows in <paramref name="Source2"/> appended.<br/> /// This method makes use of the Linq extension method /// <seealso cref= /// "System.Linq.Enumerable.Union(IEnumerable , /// IEnumerable , IEqualityComparer )"/> /// with and implementation of the /// <seealso cref="System.Collections.Generic.IEqualityComparer< T>"/> /// ///  /// The type of objects in both <paramref name="Source1"/> and <paramref name="Source2"/>. /// </typeparam> /// <param name="Source1"> /// The input sequence which is used as the first set of rows in the output. /// These rows are of <typeparamref name="TSource"/> type. /// </param> /// <param name="Source2"> /// The input sequence which is used as the second set of rows in the output. /// These rows are of <typeparamref name="TSource"/> type. /// </param> /// <returns>An output IEnumerable of type <typeparamref name="TSource"/> /// that contains all the rows from /// <paramref name="Source1"/> and <paramref name="Source2"/>. /// </returns> /// <remarks> /// This implementation used the Linq extension method <seealso cref= /// "System.Linq.Enumerable.Union<TSource>(IEnumerable<TSource> , /// IEnumerable<TSource> , IEqualityComparer<TSource> )"/> /// with and implementation of the /// <seealso cref="System.Collections.Generic.IEqualityComparer< T>"/> /// that makes all objects not equal. This forces the distinct phase of /// the union process to maintain all of the rows in the inputs. /// </remarks> public static IEnumerable<TSource> UnionAll<TSource>(
    this IEnumerable<TSource> Source1,
    IEnumerable<TSource> Source2)
{
    EverythingDifferentEqualityComaperer AllDifferent =
        new EverythingDifferentEqualityComaperer<TSource>( );
    return Source1.Union(Source2, AllDifferent);
}

The Source Code

The following URLS contain the source code presented above. The UnionAll code needs to be in a class with the following attributes (the class name is something you can choose):

    public static class LINQ_Extension_Methods 

https://craigwatson1962.files.wordpress.com/2012/02/linq-short-takes-4-unionall-code.docx
https://craigwatson1962.files.wordpress.com/2012/02/linq-short-takes-4-unionall-code.pdf

Why It Works?

The supplying an object that implements the IEqualityComparer Interface allows you to control the way that the distinct phase of the standard (built-in into the .Net Framework [see MSDN documentation: Overview of the .NET Framework]) Union Linq Extension Method (see the MSDN Documentation: Enumerable.Union(TSource) Method (IEnumerable(TSource), IEnumerable(TSource), IEqualityComparer(TSource)) (System.Linq)) operates. The way, I have found, to make the Union extension method to operate like a UnionAll is to provide an implementation of this interface that says ‘everything is different’. The ‘everything is different’ implementation of IEqualityComparer causes the distinct phase of the standard Union extension method to keep all input rows from both. This has the effect of overriding the default behaviour of the Union extension method. The default behaviour of the Union extension method is to keep only the distinct values from the input sources.

I suspect that the intention of the designers, and implementers, of the Linq extensions to the .Net Framework had a very different intention for the usage of the IEqualityComparer argument to the Union extension method. That intention, I speculate, would have been to allow users to implement the semantic meaning of equality for user-defined objects. The use of IEqualityComparer argument’s implementation to ‘switch off’ the distinct process in the Union extension method would, I speculate, come as a bit of a surprise to them.

Future Versions

The UnionAll implementation, for me now, is all that I want. Hence, I do not foresee making any changes or enhancements to the implementation. My satisfaction with the implementation is not entirely complete though, I may include some argument null checking later. I will leave that for my experimentation with the Unit Testing Framework though.

This implementation does highlight, for me at least, another gap in the standard Linq implementation. This gap is an extension method that appends a single value to an existing IEnumerable, would seem like a natural next step.

Conclusions

The implementation of the UnionAll extension method demonstrates that some of the standard Linq extension methods are very flexible. The flexibility of standard extension methods allows bending them into implementations of new transformations that are new, and very useful.

The more I develop extension methods, the more I appreciate this addition to the C# Language. The combination of generics (see MSDN Documentation: Introduction to Generics (C# Programming Guide) and Generic Type Parameters (C# Programming Guide) ), and extension methods results in very powerful code, and code which saves the developer ‘miles of code’.

, , , , , , , , , , , , ,

3 Comments

C# Short Takes – 1 – XML Comments syntax for the cref attribute to a Generic Type Method


Introduction

For the last couple of blog posts I have been using the XML Documentation feature of C# and Sandcastle to generate chm help files. This has resulted in help for methods I have been developing being available in many parts of the Visual Studio environment. Enabling Intelescensecontext sensitive help have proved to quite useful, and is a good thing to include with the source code I have been posting on the blog.

Things have been going quite well with this approach, until I ran into a problem with the <see/> tag and the cref attribute. This tag generates a hyperlink within the output files, after Sandcastle has processed them. These generated hyperlinks to methods in the documentation were a feature I wanted to enable in a couple of places within the generated documentation.

The Problem

The situation which caused me a headache was trying to make pair of <see/> tag elements in the XML Documentation which referenced each of a pair of generic methods. The method signatures were:

public static IEnumerable<TSource> ToIEnumerable(TSource val)
public static IEnumerable<TSource> ToIEnumerable(TSource val1, TSource val2)

The Solution

After trying a number of things and searching on the net, I discovered the following on stackoverflowC#, XML-Doc: Refering to a generic type of a generic type in C# XML documentation?. The last answer in to the question is the one that gave me the way to solve the problem.

Simply, what is required is the xml escaping of the <>I n the function name. For a full (I would expect) list of the xml escape sequences see: List of XML and HTML character entity references.

The resulting references in the <see/> tag end up as:

        /// <see cref="ToIEnumerable&lt;TSource&gt;(TSource)"/> 
 /// <see cref="ToIEnumerable&lt;TSource&gt;(TSource, TSource)"/> 

Reasoning Why It Works

This makes a degree of logical sense when you consider the resulting context of this attribute. The C# compiler translates the XML Documentation into a XML file. Within an XML file an attribute, or element, which contains a < or > character needs to have those characters translated into XML escape equivalents. Thus, the target of the link will be in an XML escaped form, so also the reference to the target should comply with the same XML escape sequences.

A Visual Studio Handy Hint

There is a built-in paste function, Edit.PasteAlterernate. The following blog post ‘What is Paste Alternate?’ describes what this function does. If you use this to paste a function signature from a C# file into a html file, you get the prototype with xml escapes included, plus a bunch of html. Using this paste variant may prove useful when you need a quick way of generating xml escapes .

In my version of Visual Studio, this function was unbound to a keystroke combination. Yu can use the Tools -> Options menu items to get up the dialogue box that allows setting the keystroke combination. The keystroke combinations are set in the Environment -> Keyboard section of this dialogue box.

Conclusions

I trust that reading to here, you have found blog post useful, or that it has helped solves you XML Documentation problems.

, , , , , , , , ,

5 Comments

LINQ Short Takes – Number 3 –LINQ over Multiple Dimension Arrays and Lists


Introduction

This blog post presents an alternative approach to using LINQ against List<Of T> and multidimensional arrays (and other types). This approach builds on the preceding blog posts LINQ Short Takes – Number 1 – Enumerable.Range() and LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product.

An Alternative Approach to Multiple Dimension Arrays and Lists

A core of this approach is to use two features of the C# language. These features are:

  1. The indexer which is available on some object types (see MSDN Article: Indexers (C# Programming Guide)), and the indexed access which the array objects support. In the case of object collections which do not support the IList(of T) interface, the LINQ extension methods ElementAt provides an equivalent, and very useable, alternative.
  2. The use of LINQ to generate the index values that are used against the collection objects supporting indexers, and array objects.

The use of LINQ to generate the index values into collections or arrays is not the approach that developers normally use with these objects. This approach has benefits in terms of clarity of the resulting code. This approach is only applicable to certain classes of problems that a developer may encounter.

Example Notes:

There are a couple of points to note about the presented examples:

  1. I have used my LINQ extension method ToOutput to produce a dump of the result sequences. This extension method was the subject of my blog post Dumping a formatted IEnumerable to Output. This blog post also includes to the source code for the extension method. Additionally, there are links to docx and pdf files of the source code.
  2. The source code for the methods which the following examples contain are available in docx and pdf files from the following URLS:
    https://craigwatson1962.files.wordpress.com/2012/02/linq-over-multiple-dimension-arrays-and-lists.docx
    https://craigwatson1962.files.wordpress.com/2012/02/linq-over-multiple-dimension-arrays-and-lists.pdf

Examples of Approach on List Collections

The following are examples of LINQ query syntax examples of using LINQ to generate and enumerate the index into a List. By using the array, or indexer, syntax to access the elements of the List(Of T) Class makes calculating the difference between adjacent dates in the List(Of T) Class trivial. The pure LINQ way to achieve this type of calculation between members of a List is not as simple, clear, or concise, as this approach.

private void LINQ_ALternative_List()
{
    List<DateTime> Dates = new List<DateTime>()
    {
        new DateTime(2012,1,1), new DateTime(2012,2,29),
        new DateTime(2012,8,13), new DateTime(2012,9,10)
    };
    var DaysGaps = from idx in Enumerable.Range(0, Dates.Count-1)
                   select Dates[idx + 1] - Dates[idx];
    Debug.WriteLine("Dumping DaysGaps");
    DaysGaps.ToOutput(FormatFunction:
        (val, position) => string.Format("[{0}]={1}\n", position, val.ToString("%d")));

    var DaysGaps1 = from idx in Enumerable.Range(0, Dates.Count - 1)
                    select (Dates[idx + 1] - Dates[idx]).Days;
    Debug.WriteLine("Dumping DaysGaps1");
    DaysGaps1.ToOutput(FormatFunction:
        (val, position) => string.Format("[{0}]={1}\n", position, val));
    return; // Allows a breakpoint at the end of the method. }

Example of Approach on Dictionary Collection

The following example demonstrates the use of ElementAt method to achieve the same days difference calculation as the preceding example. This example is manipulating the values in the Value property of the KeyValuePair contained in a Dictionary(Of TKey, TValue) Class.

private void LINQ_Alternative_Dictionary()
{
    Dictionary<int, DateTime> Dict1 = new Dictionary<int, DateTime>()
    {
        {1, new DateTime(2012, 1,  1)}, {5, new DateTime(2012, 3, 15)},
        {7, new DateTime(2012, 4, 21)}, {9, new DateTime(2012, 12, 25)}
    };
    var a = Dict1.ElementAt(2);
    var DateDiffs = from idx in Enumerable.Range(0, Dict1.Count - 1)
                    select (Dict1.ElementAt(idx + 1).Value - Dict1.ElementAt(idx).Value).Days;
    Debug.WriteLine("Dumping DateDiffs");
    DateDiffs.ToOutput(FormatFunction:
        (val, position) => string.Format("[{0}]={1}\n", position, val));

    return; // Allows a breakpoint at the end of the method. }

Examples of Approach on Multidimensional Arrays

The following is an example of using LINQ query syntax to access the elements of a 2-dimensional array. The code forms a Cartesian product between the sequences idx1 and idx2. This Cartesian product is used to access each of the elements of the array Multi1. The example creates an anonymous type (see MSDN article: Anonymous Types (C# Programming Guide) for further information) which contains the indexes and array element value.

There are a couple of other notable points:

  1. The use of the Array.GetLength Method to discover the number of elements that are required in the array index sequences. The use of the Array.GetLength Method results in code that is more robust.
  2. This example demonstrates accessing a 2-dimensional array. A similar pattern will work for any number of array dimensions.
  3. 3. Arrays with more than one dimension do not support the IEnumerable(Of T) Interface. This results in multidimensional arrays not natively, or supported by the implementation of the .Net Framework (see MSDN article: Overview of the .NET Framework for further information). This results in multidimensional arrays not being usable as a LINQ data source. If you attempt to use a multidimensional array as a data source (from ) results in the error CS1935. The following is the full error message
    error CS1935: Could not find an implementation of the query pattern for source type ‘int[*,*]’. ‘Select’ not found. Are you missing a reference to ‘System.Core.dll’ or a using directive for ‘System.Linq’?
private void LINQ_ALternative_Array()
{
    int[,] Multi1 = new int[,]
    {
        {1,2,3}, {4,5,6}, {7,8,9}
    };
    var dump1 = from idx1 in Enumerable.Range(0, Multi1.GetLength(0))
                from idx2 in Enumerable.Range(0, Multi1.GetLength(1))
                select new { idx1, idx2, val = Multi1[idx1, idx2] };
    Debug.WriteLine("Dumping dump1");
    dump1.ToOutput(FormatFunction:
        (val, position) => string.Format("[{0}] [{1},{2}]={3}\n", position, val.idx1, val.idx2, val.val));

    return; // Allows a breakpoint at the end of the method. }

Conclusion

I hope that this blog post and the preceding two blog posts in this series LINQ Short Takes – Number 1 – Enumerable.Range() and LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product have added something useful to your kit bag of LINQ tools and techniques.

The approach to working with collections and arrays through an index (or indexes) is one that is applicable to a class of problems that I have not seen used elsewhere. That is not to says that I have conducted an exhaustively search of the web for other examples of this approach.

For those readers who have read LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product, and are interested. I indicted in that blog post I would think about an extension method implementation of a general solution to creation of Cartesian products using the Enumerable.Join Method. Well, I wrote that extension method this morning, and will ‘clean it up’, write some XML documentation, and post the solution on this blog sometime soon.

, , , , , , , , , , , , ,

3 Comments

%d bloggers like this: