Archive for February, 2012

C# Short Takes – 2 – String from IEnumerable


Introduction

I am posting this quick tip in response to a question, which seems to on some people’s minds who are reading my blog. The search terms ‘create string from ienumerable’ appear as hits on my blog statistics quite frequently. The way I create a string from an IEnumerable is in the Example Code section below.

The Method

The method I use to create a string from an IEnumerableis to use the string constructor that accepts a char[] (see the MSDN documentation: String Constructor (Char()) ). To get the IEnumerable into a char[] I use the Linq Extension Method ToArray() (see the MSDN documentation: Enumerable.ToArray(Of TSource) Method).

These Linq method call and string constructor results in the code in the ‘Example Code’ section.

The Example Code

The following shows a number of ways to create an IEnumerableobject. These IEnumerableobjects are then converted into char[] object. The char [] objects are then used in the string constructor.

private void String_Create_From_IEnumerable_char( )
{
    string TestString = "The quick red fox jumped over the lazy brown cow";
    IEnumerable<char> EnumerableChar= TestString.AsEnumerable();
    string Result1 = new string(EnumerableChar.ToArray( ));
    char[] TestCharArray = new char[] { 'T', 'h', 'e', ' ', 't', 'e', 's', 't', ',', 'v', 'a', 'l', 'u', 'e', '.' };
    string Result2 = new string(TestCharArray);
    IEnumerable<char> EnumerableChar1 = TestCharArray.AsEnumerable( );
    string Result3 = new string(EnumerableChar1.ToArray());
    IEnumerable<char> EnumerableChar2 = LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable('A')
        .UnionAll(LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable('B'))
        .UnionAll(LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable('C'))
        .UnionAll(LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable('D'))
        .UnionAll(LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable('E'));
    string Result4 = new string(EnumerableChar2.ToArray( ));
    Func<char, IEnumerable<char>> LocalToIEnumerable1 = LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable<char>;
    IEnumerable<char> EnumerableChar3 = LocalToIEnumerable1('Z')
        .UnionAll(LocalToIEnumerable1('Y')).UnionAll(LocalToIEnumerable1('X')).UnionAll(LocalToIEnumerable1('W'))
        .UnionAll(LocalToIEnumerable1('V')).UnionAll(LocalToIEnumerable1('U')).UnionAll(LocalToIEnumerable1('T'));
    string Result5 = new string(EnumerableChar3.ToArray( ));
    Func<char, char, IEnumerable<char>> LocalToIEnumerable2 = LINQ_Extension_Methods.LINQ_Extension_Methods.ToIEnumerable<char>;
    IEnumerable<char> EnumerableChar4 = LocalToIEnumerable2('A', ' ')
        .UnionAll(LocalToIEnumerable2('B', ' ')).UnionAll(LocalToIEnumerable2('C', ' '))
        .UnionAll(LocalToIEnumerable2('D', ' ')).UnionAll(LocalToIEnumerable2('E', ' '))
        .UnionAll(LocalToIEnumerable2('F', ' ')).UnionAll(LocalToIEnumerable2('G', ' '))
        .UnionAll(LocalToIEnumerable2('H', ' ')).UnionAll(LocalToIEnumerable2('I', ' '));
    string Result6 = new string(EnumerableChar4.ToArray( ));
    return;
}

Methods Referenced That Are Not In The.Net Framework

Method Blog Post That Describes The Method
ToIEnumerable LINQ Extension Method to Generate n-way Cartesian Product
UnionAll LINQ Short Takes – Number 4 –Make Union into a UnionAll

Conclusion

I trust that those readers who have been looking for a solution to this transformation find this blog post helpful answers the question you had.

, , , , , , , ,

3 Comments

LINQ Extension Method to Generate n-way Cartesian Product


Introduction

This blog post presents a LINQ extension method that implements the generation of Cartesian productfor any number of input sequences.

This implementation builds on the approach that I posted in LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product. Specifically, it uses the Enumerable Join Method to join the input IEnumerable sequences into the resulting Cartesian product.

Special Votes of Thanks and Referenced Articles:

Eric Lippert for Computing a Cartesian Product with LINQ, and
Ian Griffiths for LINQ and N-ary Cartesian Products

Whilst, the solution to the problem I present here is not the same as these articles present, these articles did push me down some of the correct paths. One thing that these articles helped me with was getting the ‘shape’ of the returned Cartesian product ‘right’ (correct by my judgement). The ‘shape’ I settled on was an IEnumerable, or a sequence of elements, each element being an IEnumerable with on element from each of the input IEnumerable objects.

The CartesianProduct Extension Method

In the following sections, I present the design, implementation, and some sample code, that describes the development process resulting in the CartesianProduct extension method.

CartesianProduct – Design Goals

There were numerous of design features that I wished to include in the implementation of CartesianProduct method. These features included:

  • The method must create a Cartesian product. This is a mandatory design criterion.
  • The order of the resulting Cartesian product elements must be in the same order as the input sequences. This may well be a defining attribute of a Cartesian product, but I include this characteristic as a design goal anyway. This is a mandatory design criterion.
  • The returned set of Cartesian product elements should be as generic as possible. This is a mandatory design criterion.
  • The returned set of Cartesian product elements should be easily processed using Linq. This is a mandatory design criterion.
  • The method’s implementation must be as an extension method. This is a mandatory design criterion.
  • The method’s implementation must be an extension method that operates like any other Linq extension method (see: IEnumerable(Of T) Interface). This allows the users of the extension method to chain it within a set of other calls to Linq extension methods calls. This allows the user to assemble Linq transformations that they require. This is a mandatory design criterion.
  • The method should accept, as generically as possible, the set of sequences that used to generate the Cartesian product. This is a mandatory design criterion.
  • The method should use the Linq Join extension method to build the Cartesian product. If had to resort to the Linq Aggregate extension method, then the two articles I have referenced above demonstrate that approach. Plagiarising those works, and representing there content here, was not an acceptable outcome. This was a highly desirable, leaning heavily to mandatory, design criterion.

CartesianProduct – Function Signature

The following is the function signature for the CartesianProduct extension method. Many features of the function signature directly address the design goals.

public static IEnumerable<IEnumerable<TSource>> CartesianProduct<TSource>
    (this IEnumerable<TSource> source,
    IEnumerable<IEnumerable<TSource>> inputs)
  • The return type IEnumerable<IEnumerable<TSource>> addresses the design goals for a generic return type, and a return type which can be processed with Linq. If it makes it easier to understand, you can interpret this type as an outer list of inner lists, each inner list being list made up of type the TSource objects. The following diagram attempts to show graphically what this ‘looks like’.

Figure 1 – IEnumerable of IEnumerable of T

Cartesian_Product_Output

 

  • The CartesianProduct<TSource> part of the method signature identifies this method as a parameterised type method (see MSDN article: Generic Methods (C# Programming Guide) for further information).declaration serves the design goal of requiring the method to accept as generically as possible inputs, and produce a generic output.
  • The argument this IEnumerable<TSource> source serves a couple of the design goals. Primarily, it flags this method as an extension method that should be attached, or associated, with any object that implements the IEnumerable(of T) Interface. Secondly, this argument identifies the first sequence used to assemble the Cartesian product.
  • The argument IEnumerable<IEnumerable<TSource>> inputs identifies the set of sequences to be processed to form the output Cartesian product. As with the return type, a loose interpretation is an outer list that contains inner list of objects. This satisfies the any number of input sequences design goal.

CartesianProduct – Implementation

The following is the implementation of the CartesianProduct extension method. I will detail many of the significant features after the source code.

public static IEnumerable<IEnumerable<TSource>> CartesianProduct<TSource>
    (this IEnumerable<TSource> source,
    IEnumerable<IEnumerable<TSource>> inputs)
{
    // Build the remaining work to be done var RemaimingJoinTarget = inputs.Skip(1);
    // Recursion end condition, no more to be done. if (RemaimingJoinTarget.Count( ) == 0)
    {
        // At the end of the inputs, return the Cartesian of two elements return source.Join(inputs.First( ),
            (outer) => 1, (inner) => 1,
            (outer, inner) =>
                ToIEnumerable(outer).UnionAll(ToIEnumerable(inner))
            //(new TSource[2] {outer, inner}).AsEnumerable<TSource>() );
    }
    else {
        // Recursive call down the list of inputs. // When coming back up add the source to the results already built. return source.Join(
            inputs.First( ).CartesianProduct(RemaimingJoinTarget),
            (outer) => 1, (inner) => 1,
            (outer, inner)
                => ToIEnumerable(outer).UnionAll(inner));
    }
}

The following are some of the significant features of the implementation:

  • Firstly, let me say that I find this implementation aesthetically pleasing. It is short, and concise, deceptively simple.
  • This implementation satisfies the design gaol of using the Linq Join extension method to form the Cartesian product.
  • This implementation satisfies the design gaol of does preserve of the order of the input sequences and reproduces that order in output Cartesian product.
  • This implementation satisfies the design gaol of being generic and catering for any input type. There are some nasty hoops the caller needs to jump through when mixing objects a with C# value types. Although, it may be nasty it does work. I have an example that shows what is required.
  • This is an implementation uses recursion, see the call to CartesianProduct in the Join contained in the else case. The version of recursion that is utilised here is tail recursion. The removal an element from the inputs parameter at each level of recursion controls the recursion. This technique of removing an element from a control sequence until nothing is left is an approach to controlling recursion I have used frequently.
  • There are two overloads of the helper function ToIEnumerable. These simply wrap an IEnumerable around the objects passed in. Using these functions allows the simple assembly of the result IEnumerable of objects with the Linq Union extension method.
  • The Cartesian product is formed within the Linq Join extension method. The same technique as described in LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product of setting the inner and outer join keys to 1.
  • The UnionAll LINQ extension method is described in LINQ Short Takes – Number 4 –Make Union into a UnionAll. The source code for the UnionAll extension method is in this blog post as well.

CartesianProduct – Helper Methods

There are two helper methods used in the implementation. These helper methods support the implementation of the CartesianProduct extension method. The source code for the helper methods is below.

public static IEnumerable<TSource> ToIEnumerable<TSource>(TSource val)
{
    yield return val;
}

And

public static IEnumerable<TSource> ToIEnumerable<TSource>(TSource val1, TSource val2)
{
    yield return val1;
    yield return val2;
}

There is one point worth noting about these methods. The implementation may not be the most efficient way to achieve the goal. The use of the yield return results in C# compiler implementing a ‘bunch of code’ under the covers. My justification for the implementation using yield return is simple. It results in very simple code for anyone who needs to maintain, or modify, the code.

CartesianProduct – Source Code

The following URL’s have the full source code, including XML Comments, for CartesianProduct and the two overloads of IEnumerable.
https://craigwatson1962.files.wordpress.com/2012/02/cartesianproduct-source-c-ode.docx
https://craigwatson1962.files.wordpress.com/2012/02/cartesianproduct-source-c-ode.pdf

Referenced LINQ Methods

The following are the Linq extension methods that are utilised in the implementation of the CartesianProduct extension method.

Join<(Of <<‘(TOuter, TInner, TKey, TResult>)>>)(IEnumerable<(Of <<‘(TOuter>)>>), IEnumerable<(Of <<‘(TInner>)>>), Func<(Of <<‘(TOuter, TKey>)>>), Func<(Of <<‘(TInner, TKey>)>>), Func<(Of <<‘(TOuter, TInner, TResult>)>>))

Skip<(Of <<‘(TSource>)>>)(IEnumerable<(Of <<‘(TSource>)>>), Int32)

Union<(Of <<‘(TSource>)>>)(IEnumerable<(Of <<‘(TSource>)>>), IEnumerable<(Of <<‘(TSource>)>>))

First<(Of <<‘(TSource>)>>)(IEnumerable<(Of <<‘(TSource>)>>))

CartesianProduct – Example Code

The following code assembles a range of different Cartesian products, and dumps them to the debug output window.

private void CartesianProducts_Extensions( )
{
    IEnumerable<int>[] SequsInt1 = new IEnumerable<int>[]
    {
        Enumerable.Range(0,5)
    };
    IEnumerable<int>[] SequsInt2 = new IEnumerable<int>[]
    {
        Enumerable.Range(0,5),
        Enumerable.Range(0,3)
    };
    IEnumerable<int>[] SequsInt3 = new IEnumerable<int>[]
    {
        Enumerable.Range(0,10),
        Enumerable.Range(0,5),
        Enumerable.Range(0,3)
    };
    //IEnumerable<IEnumerable<object>> ExpandedSequs1 = // LINQ_Extensions.ToIEnumerable(Enumerable.Range(0, 10).Cast<IEnumerable<object>>()); string[] SampleStrings = new string[]
    { "The", "quick", "red", "fox", "jumped",
        "over", "the", "lazy", "brown", "cow" };

    Action<IEnumerable<int>> IntDumper = (InnerSequence) =>
        {
            int column = 0;
            foreach (int innerVal in InnerSequence)
            {
                Debug.Write(string.Format("[{0}]={1}, ", column, innerVal));
                ++column;
            }
            Debug.WriteLine("");
        };

    var OuterInSource = Enumerable.Range(0, 10);
    var IntProd1 = Enumerable.Range(0, 10).CartesianProduct(SequsInt1);
    DumpCartesianProduct(IntProd1,
        (InnerSequence) =>
        {
            int column = 0;
            foreach (int innerVal in InnerSequence)
            {
                Debug.Write(string.Format("[{0}]={1}, ", column, innerVal));
                ++column;
            }
            Debug.WriteLine("");
        }
        );

    var IntProd1Vers2 = OuterInSource.CartesianProduct(SequsInt1);
    DumpCartesianProduct(IntProd1Vers2, IntDumper);

    var IntProd2 = Enumerable.Range(0, 10).CartesianProduct(SequsInt2);
    DumpCartesianProduct(IntProd2, IntDumper);

    var IntProd2Vers2 = OuterInSource.CartesianProduct(SequsInt2);
    DumpCartesianProduct(IntProd2Vers2, IntDumper);

    var IntProd3 = Enumerable.Range(0, 10).CartesianProduct(SequsInt3);
    DumpCartesianProduct(IntProd3, IntDumper);

    var IntProd3Vers2 = OuterInSource.CartesianProduct(SequsInt3);
    DumpCartesianProduct(IntProd3Vers2, IntDumper);

    // <IEnumerable<object>> Action<IEnumerable<object>> objDumper = (InnerSequence) =>
    {
        int column = 0;
        foreach (object innerVal in InnerSequence)
        {
            Debug.Write(string.Format("[{0}]={1}, ", column, innerVal));
            ++column;
        }
        Debug.WriteLine("");
    };
    var objStrings = SampleStrings.Cast<object>( );
    var objTransSequs1 = SequsInt1.ToList( ).Select(a => a.Select(b => (object) b));
    var MixedObjectProduct = objStrings.CartesianProduct(objTransSequs1);
    DumpCartesianProduct<object>(MixedObjectProduct, objDumper);

    var MixedObjectProduct2 = SampleStrings.CartesianProduct(objTransSequs1);
    DumpCartesianProduct<object>(MixedObjectProduct2, objDumper);

    IEnumerable<IEnumerable<object>> MixedSource =
        new IEnumerable<object>[]
        {
            from intVal in Enumerable.Range(0,5) select (object) intVal,
            from intVal in Enumerable.Range(0,5) select (object) new DateTime(2012, 1, 1).AddDays(intVal)
        };
    var MixedObjectProduct3 = SampleStrings.CartesianProduct(MixedSource);
    DumpCartesianProduct<object>(MixedObjectProduct3, objDumper);

    return;
}

private void DumpCartesianProduct<TSource>(
    IEnumerable<IEnumerable<TSource>> ResultSequence,
    Action<IEnumerable<TSource>> DumpLogic = null)
{
    if (DumpLogic != null)
    {
        foreach (IEnumerable<TSource> outputSeq in ResultSequence)
        {
            DumpLogic(outputSeq);
        }

    }
    return;
}

Possible Changes, Enhancements, Additional Overloads, Or Extra Features

I believe that additional versions of the CartesianProduct could be useful. These additional versions include:

· The creation of a variant of the Cartesian product method that accepts a variable number of arguments, as opposed to IEnumerable<IEnumerable<TSource> argument, could be useful.

Conclusion

For an implementation which started out as a ‘is it possible to do?’ question, the resulting implementation is quite effective and usable.

, , , , , , , , , , ,

1 Comment

LINQ Short Takes – Number 4 –Make Union into a UnionAll


Introduction

One thing that has exasperated and puzzled me for some time is the absence of a UnionAll from the Linq default implementation. Granted pure relational algebra that was the basis of the initial SQL implementations only defined Union. Now modern SQL implementations have had the UnionAll variant added. SQL Server has the Union All (see: UNION (Transact-SQL)). Why the designers and implementers of Linq ignored the UnionAll variant remains a mystery. Maybe it is because it is relatively easy to make Union implementation operate in the way UnionAll operates.

The Solution – Design

The Linq Union extension method comes in two overloads, one of these overloads (see: Enumerable.Union(TSource) Method (IEnumerable(TSource), IEnumerable(TSource), IEqualityComparer(TSource)) (System.Linq)) takes an IEqualityComparer Interface.

The way to get the Union extension method to operate like a UnionAll is to provide an implementation of this interface that says ‘everything is different’. This causes the distinct phase of the Union to keep everything, rather than default behaviour of throwing way the second, and subsequent, of any duplicate.

The Solution – Implementation – EverythingDifferentEqualityComaperer Class

There is not much to the implementation. Simply explained, the understanding that you want nothing to be equal, and hence have all inputs in the output is the key to this implementation. The Equals method returning false in all cases achieves this requirement.

/// <summary> /// A class which implements the IEqualityComparer Interface. /// The implementation makes all elements not the same. /// Currently, this is class used in the UnionAll extension method /// to achieve the all rows preserved. /// /// <typeparam name="TSource">The type of objects being compared.typeparam> internal class EverythingDifferentEqualityComaperer<TSource>
    : IEqualityComparer<TSource>
{
    /// <summary> /// Public constructor for the class. /// </summary> public EverythingDifferentEqualityComaperer( )
    {

    }

    #region IEqualityComparer<TSource> Members
    /// <summary> /// This method says 'everything is different' /// </summary> /// <param name="x">One of the objects of <typeparamref name="TSource"/> to be compared. /// The implementation ignores the object.</param> /// <param name="y">One of the objects of <typeparamref name="TSource"/> to be compared. /// The implementation ignores the object.</param> /// <returns>false for all values.</returns> bool IEqualityComparer<TSource>.Equals(TSource x, TSource y)
    {
        return false;
    }
    /// <summary> /// Returns the hash code of the object. /// </summary> /// <param name="obj">The object of <typeparamref name="TSource"/> /// Hash code of the object.</returns> int IEqualityComparer.GetHashCode(TSource obj)
    {
        if (Object.ReferenceEquals(obj, null))
            return 0;
        return obj.GetHashCode( );
    }

    #endregion }

The Solution – Implementation – UnionAll Extension Method

The following is the implementation of the UnionAll. It simply creates an instance of the EverythingDifferentEqualityComaperer class and calls the Linq Union extension method.

/// <summary> /// This method implements the UnionAll extension method. /// This extension method results in all of the rows in <paramref name="Source1"/> /// having all the rows in <paramref name="Source2"/> appended.<br/> /// This method makes use of the Linq extension method /// <seealso cref= /// "System.Linq.Enumerable.Union(IEnumerable , /// IEnumerable , IEqualityComparer )"/> /// with and implementation of the /// <seealso cref="System.Collections.Generic.IEqualityComparer< T>"/> /// ///  /// The type of objects in both <paramref name="Source1"/> and <paramref name="Source2"/>. /// </typeparam> /// <param name="Source1"> /// The input sequence which is used as the first set of rows in the output. /// These rows are of <typeparamref name="TSource"/> type. /// </param> /// <param name="Source2"> /// The input sequence which is used as the second set of rows in the output. /// These rows are of <typeparamref name="TSource"/> type. /// </param> /// <returns>An output IEnumerable of type <typeparamref name="TSource"/> /// that contains all the rows from /// <paramref name="Source1"/> and <paramref name="Source2"/>. /// </returns> /// <remarks> /// This implementation used the Linq extension method <seealso cref= /// "System.Linq.Enumerable.Union<TSource>(IEnumerable<TSource> , /// IEnumerable<TSource> , IEqualityComparer<TSource> )"/> /// with and implementation of the /// <seealso cref="System.Collections.Generic.IEqualityComparer< T>"/> /// that makes all objects not equal. This forces the distinct phase of /// the union process to maintain all of the rows in the inputs. /// </remarks> public static IEnumerable<TSource> UnionAll<TSource>(
    this IEnumerable<TSource> Source1,
    IEnumerable<TSource> Source2)
{
    EverythingDifferentEqualityComaperer AllDifferent =
        new EverythingDifferentEqualityComaperer<TSource>( );
    return Source1.Union(Source2, AllDifferent);
}

The Source Code

The following URLS contain the source code presented above. The UnionAll code needs to be in a class with the following attributes (the class name is something you can choose):

    public static class LINQ_Extension_Methods 

https://craigwatson1962.files.wordpress.com/2012/02/linq-short-takes-4-unionall-code.docx
https://craigwatson1962.files.wordpress.com/2012/02/linq-short-takes-4-unionall-code.pdf

Why It Works?

The supplying an object that implements the IEqualityComparer Interface allows you to control the way that the distinct phase of the standard (built-in into the .Net Framework [see MSDN documentation: Overview of the .NET Framework]) Union Linq Extension Method (see the MSDN Documentation: Enumerable.Union(TSource) Method (IEnumerable(TSource), IEnumerable(TSource), IEqualityComparer(TSource)) (System.Linq)) operates. The way, I have found, to make the Union extension method to operate like a UnionAll is to provide an implementation of this interface that says ‘everything is different’. The ‘everything is different’ implementation of IEqualityComparer causes the distinct phase of the standard Union extension method to keep all input rows from both. This has the effect of overriding the default behaviour of the Union extension method. The default behaviour of the Union extension method is to keep only the distinct values from the input sources.

I suspect that the intention of the designers, and implementers, of the Linq extensions to the .Net Framework had a very different intention for the usage of the IEqualityComparer argument to the Union extension method. That intention, I speculate, would have been to allow users to implement the semantic meaning of equality for user-defined objects. The use of IEqualityComparer argument’s implementation to ‘switch off’ the distinct process in the Union extension method would, I speculate, come as a bit of a surprise to them.

Future Versions

The UnionAll implementation, for me now, is all that I want. Hence, I do not foresee making any changes or enhancements to the implementation. My satisfaction with the implementation is not entirely complete though, I may include some argument null checking later. I will leave that for my experimentation with the Unit Testing Framework though.

This implementation does highlight, for me at least, another gap in the standard Linq implementation. This gap is an extension method that appends a single value to an existing IEnumerable, would seem like a natural next step.

Conclusions

The implementation of the UnionAll extension method demonstrates that some of the standard Linq extension methods are very flexible. The flexibility of standard extension methods allows bending them into implementations of new transformations that are new, and very useful.

The more I develop extension methods, the more I appreciate this addition to the C# Language. The combination of generics (see MSDN Documentation: Introduction to Generics (C# Programming Guide) and Generic Type Parameters (C# Programming Guide) ), and extension methods results in very powerful code, and code which saves the developer ‘miles of code’.

, , , , , , , , , , , , ,

3 Comments

%d bloggers like this: