Archive for February 18th, 2012

LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product


Introduction

This is the second post in a series (I have no idea how many blog posts this series may comprise) of LINQ Short Takes. The first post was LINQ Short Takes – Number 1 – Enumerable.Range() which presented the Enumerable.Range() method. This blog post utilises the Enumerable.Range() method to form a Cartesian product utilising the LINQ method syntax (see: MSDN (Microsoft Developer Network Web Site) Article – LINQ Query Syntax versus Method Syntax (C#)for further discussion of the topic).

The motivation for developing this approach was twofold:

1. I did need a sequence of integers that formed a Cartesian product.

2. I experienced a degree of exasperation, frustration and downright ‘that’s wrong and very tacky’ when I looked for LINQ Cartesian Products using Method syntax. The solutions I read used a SelectMany()and lacked clarity that any good code should have.

So, I decided that there must be a simpler way to form a Cartesian product from two sequences using LINQ method calls. What has resulted is a solution to the problem that, in my opinion, is clear and concise. The solutions implementation uses a LINQ Join() method call to build the Cartesian product.

Cartesian Products

Please excuse the diversion into a little bit set theory. This diversion explains why the LINQ Join() method succeeds in forming a Cartesian product.

A Cartesian productis a resulting sequence of two input sequences. The output sequence has that all elements in the first sequence join with each element of the second sequence. The resulting sequence is series of element pairs from both input sequences. The length of the output sequence is the product of the lengths of the two input sequences.

The join operation between the two sequences that form the Cartesian producthas the following properties. If all of the elements in the two input sequences match each other, we have a type of join with a special property. This type of join’s special property implies that the join key for all of the elements in the two sequences is the same value.

Armed with the conclusion that a Cartesian product has the join key for both inputs sequences should be the same value. The process of forming a C# statement which utilises the Join() method to form a Cartesian product between the two input sequences becomes a simple process.

Example Code

The following C# method contains three examples of forming a Cartesian product. I will comment a little more on the three examples after the code.

The code also utilises the LINQ extension method ToOutput. This was the subject of a previous blog post: Dumping a formatted IEnumerable to Output.

I have loaded this code in Work docx and pfd formats as well. The URL’s are:
https://craigwatson1962.files.wordpress.com/2012/02/cartesianproducts.docx
https://craigwatson1962.files.wordpress.com/2012/02/cartesianproducts.pdf

private void CartesianProducts()
{
    // Query Syntax Cartesian Product var Prod1 = from num1 in Enumerable.Range(0, 10)
                from num2 in Enumerable.Range(0, 10)
                select new { num1, num2 };
    Debug.WriteLine("/nDump of Prod1");
    // Dump the contents of Prod1 Prod1.ToOutput(FormatFunction:
        (objValue, Position) =>
        {
            return string.Format("[{0}] {1},{2}\n",
                Position, objValue.num1, objValue.num2);
        });

    // Generate a Cartesian product using LINQ method syntax from two sequences var Prod2 = Enumerable.Range(0, 10).Join(Enumerable.Range(0,10),
        (iVal1) => 1, // Key selectors make all elements in first sequence (iVal2) => 1, // match all elements in the second sequence. (iVal1, iVal2) =>
            new {iVal1, iVal2}); // Create an object with the two values. Debug.WriteLine("/nDump of Prod2");
    // Dump the contents of Prod2 Prod2.ToOutput(FormatFunction:
        (objValue, Position) =>
        {
            return string.Format("[{0}] {1},{2}\n",
                Position, objValue.iVal1, objValue.iVal2);
        });

    string[] SampleStrings = new string[]
    { "The", "quick", "red", "fox", "jumped",
        "over", "the", "lazy", "brown", "cow" };
    // Generate a 3 way Cartesian product var Prod3 = Enumerable.Range(0, 10)
                    .Join(Enumerable.Range(0, 5), iVal1 => 1, iVal2 => 1,
                        (iVal1, iVal2) => new { iVal1, iVal2 })
                    .Join(SampleStrings, iVals => 1, strVals => 1,
                        (iVals, strVal) =>
                            new {
                                intValue1 = iVals.iVal1,
                                intValue2 = iVals.iVal2,
                                strValue = strVal });
    Debug.WriteLine("/nDump of Prod3");
    // Dump the contents of Prod3 Prod3.ToOutput(FormatFunction:
        (objVal, Position) =>
            string.Format("[0] {1},{2},'{3}'\n",
                Position, objVal.intValue1, objVal.intValue2, objVal.strValue));
    Debug.WriteLine(string.Format(
        "Double Check\n\tSequence1 Length {0} Sequence2 Length {1}" +
        "SampleStrings Length{2}\n\t" +
        "Input Sequence Product Value {3} Result Length {4}",
        Enumerable.Range(0, 10).Count(),
        Enumerable.Range(0, 5).Count(),
        SampleStrings.Count(),
        Enumerable.Range(0, 10).Count()
            * Enumerable.Range(0, 5).Count()
            * SampleStrings.Count(),
        Prod3.Count()));

    return;
}

The examples of forming a Cartesian product with LINQ are:

  1. This example uses the LINQ query syntax to create a Cartesian product. It uses an unconstrained join between the two sequences. This is exactly how SQL operates when used to form a Cartesian product.
  2. The second example uses the LINQ method syntax to form a Cartesian product between two integer sequences. These sequences are generated using the Enumerable.Range () method. The Cartesian product is formed using the LINQ method Join(). The arguments to the overload of the Join() methods are: the second IEnumerable, the outer (first sequence) key selector, the inner (the second sequence) key selector, and the result selector. The key selectors in each part of the join are just 1, which ‘says’ this is the join key for each of the same sequences.
  3. The third example demonstrates the scalability of the approach, by joining three sequences in a Cartesian product. If you can do this with two, and three sequences, then I would expect the approach would work with as many sequences as you require. This example also demonstrates the joining of different types of object using this approach.

If you missed it above, this example method also uses the ToOutput LINQ extension method. The ToOutput extension method was the subject of a previous blog post: Dumping a formatted IEnumerable to Output, and the code for the extension method is included there.

Conclusions

  • I believe that the approach presented is a clear, clean, and concise, approach to generation of a Cartesian product using the LINQ method syntax.
  • I suspect that I will generalise this approach into another LINQ extension method. The may well become another blog post. There are a couple of interesting deign decisions which I need to make when designing and implementing this approach as an extension method. I will leave those discussions for that blog post.
Advertisements

, , , , , , , ,

3 Comments

LINQ Short Takes – Number 1 – Enumerable.Range()


Introduction

This blog post is the first of a series of shorter posts that will cover a number of brief LINQ topics.

This post will cover the Enumerable.Rangemethod. This is a ‘for free’ (part of the .Net Framework) method that I have found very useful. There have been a number of circumstances where I have used this method. This method offers a simple data source vital to addressing a number of classes of problems. With the use of this simple data source, some problems become trivial, rather than extremely complex with ‘vanilla’ LINQ.

There are a couple of subsequent posts, I will write, which build on the use of the Enumerable.Range method.

The Enumerable Range Method

This is quite a simple method. The MSDN documentation (see: Enumerable.Range) describes the method as follows.

“Generates a sequence of integral numbers within a specified range.”

This statement is almost dismissive of the many ways that this method provides a very flexible data source.

Some Examples Using Enumerable.Range

The following are a couple of simple examples of the use of the Enumerable.Range.

private void Simple_Range_Demos()
{
    // A Simple Range Example var Test1 = from seq in Enumerable.Range(0, 10)
                select seq;
    Test1.ToOutput();

    // Build a year calendar using Range DateTime Jan1 = new DateTime(2012, 1, 1);
    int Days = 0;
    if (DateTime.IsLeapYear(2012)) Days = 366;
    else Days = 365;
    var YearCalendar = from seq in Enumerable.Range(0, Days)
                       select Jan1.AddDays(seq);
    return;
}

Conclusions

These presented examples may seem trite and overly simple. In forthcoming post I will build upon the use of the Enumerable.Range to achieve some interesting, and useful LINQ techniques.

, , , , , , ,

3 Comments

Dumping a formatted IEnumerable to Output


Introduction

This blog post is a continuation from the preceding post LINQ Extension Method To Dump any IEnumerable. This blog post presents another LINQ extension method, which addresses, and provides a solution to, the limitations of the preceding approach. The significant limitation of the preceding approach was that the return type of the method was a string. This returned string runs the risk of overflowing a strings capacity when the IEnumerablehas a large number of elements, and/or the object to be output generates large amounts of formatted output.

This blog post presents a method that has the intention of taking an IEnumerable and pushing the contained objects to an output facility. The types of output facility utilised should be user defined. Personally, I use the Debug Output Window (see: Diagnostic Messages in the Output Window), and text files. I expect that this method will support, through the design and implementation of the extension method, support any output facility.

The ToOutput Extension Method

Introduction

Design Goals

This extension method attempts to achieve two sets of design goals. These design goals were:

Design Goals from the ToPrintString Implementation
The method uses an unconstrained type parameter to describe the objects in the input IEnumerable.
The method uses an unconstrained type parameter to describe the objects in the input IEnumerable,
The method consumes Lambda Expressions, or delegates, to supply user-defined functionality.
The method used optional parameters for the parameters that supply used defined functionality.
The method supplies meaningful default actions for any omitted parameter.
  • Improve on the efficiency and flexibility of the previous method (the ToPrintString extension method presented in by blog post LINQ Extension Method To Dump any IEnumerable ) and remove the potential limitation on the maximum size of a string. The design goals to achieve this aim were:
Enable the emitting of each object in the IEnumerable directly to the output target.
Enable an output function is as flexible as possible.
The output function must accept a Lambda Expression, or a delegate.
Have a return from the function that will not suffer any possible overflow, or out of memory, system runtime exceptions.

Method Implementation – The Function Signature

The implementation of the ToOutput extension method has the following function signature.

public static long ToOutput<TSource>(this IEnumerable<TSource> Input,
    Func<TSource, int, bool> WherePredicate = null,
    Func<TSource, int, string> FormatFunction = null,
    Func<string, bool> OutputFunction = null,
    bool SummaryCount = true)

The function signature has the following features that address the design goals:

  • As with the ToPrintString extension method, the type that the type parameter TSource will accept has no constraints. This maximises the type of object that this extension method will accept.
  • The argument Input is the attribute of the function signature that makes this method an extension method. Specifically, allows the compiler to make the extension method association between this extension method and the interface type IEnumerable of T. The interface IEnumerable of T is the interface that the LINQ extension methods manipulate.
  • All of the extension method arguments are options, apart from the mandatory Input argument.
  • The arguments FormatFunction and OutputFunction supply a meaningful default implementation if the argument is null.
  • The arguments WherePerdicate and FormatFunction use an int argument. This int value is the position of the object in the input IEnumerable.
  • As with the ToPrintString extension method, this extension accepts a WherePerdicate argument. This argument allows the methods caller to specify a Lambda Expression, or delegate, which will select the objects in the Input IEnumerable processed by the extension method.
  • The argument OutputFunction is one of the additions to the ToOutput over the ToPrintString extension method. The LINQ extension method LongCount invokes argument OutputFunction. The use of the LongCount LINQ extension method is why the return type of the Lambda Expression, or delegate, is a bool. This result is indicating success, or failure, of the output operation.
  • The function argument SummaryCount controls the writing of a summary count through the OutputFunction. This may be useful if the output destination is a UI control.

Method Implementation – The Full Method Implementation

The following is the implementation of the ToOutput extension method.

For those who would like the source code in a file, I have loaded the code as a Word Document (.docx format) and a PDF to:
https://craigwatson1962.files.wordpress.com/2012/02/tooutput_cs1.docx , and
https://craigwatson1962.files.wordpress.com/2012/02/tooutput_cs1.pdf .
I hope that readers who want to use the source code for this method will find this an acceptable way of sharing the source. It should be a simple copy and paste operation to load this code into Visual Studio, or your C# preferred development environment. These files also include the xml comments (see: XML Documentation Comments (C# Programming Guide) for further information) for the method. These xml comments will enable IntelliSense (see: Using IntelliSense for further explanation) for the method extension.

public static long ToOutput<TSource>(this IEnumerable<TSource> Input,
    Func<TSource, int, bool> WherePredicate = null,
    Func<TSource, int, string> FormatFunction = null,
    Func<string, bool> OutputFunction = null,
    bool SummaryCount = true)
{
    if (FormatFunction == null)
        FormatFunction = (InputObject, Position) =>
            string.Format("[{1}] {0}, ", Position, InputObject);
    if (OutputFunction == null)
        OutputFunction = (StringRepesentation) =>
        {
            Debug.Write(StringRepesentation);
            return true;
        };
    long ElementsDone = 0L;
    if (WherePredicate == null)
    {
        ElementsDone = Input
            .Select((InputObject, Position) => FormatFunction(InputObject, Position))
            .LongCount((StringRepesentation) => OutputFunction(StringRepesentation));
    }
    else {
        ElementsDone = Input
            .Where((InputObject, Position) => WherePredicate(InputObject, Position))
            .Select((InputObject, Position) => FormatFunction(InputObject, Position))
            .LongCount((StringRepesentation) => OutputFunction(StringRepesentation));
    }

    if (SummaryCount)
    {
        string OutputCount = string.Format("\nObjects Processed = {0:#,#}\n", ElementsDone);
        bool done = OutputFunction(OutputCount);
    }
    return ElementsDone;
}

Method Implementation – Noteworthy Features

The following are notable elements in the implementation:

  • This extension method uses the LINQ method syntax to invoke extension methods the processing pattern.
  • The diagram below shows the LINQ methods composed to form the processing pattern used by this extension method. This extension method’s processing pattern is a series of LINQ method calls that pass the callers delegates to the LINQ implementation.

  • The use of Debug.Write as a default action for the OutputFunction is my choice for the most appropriate default action for my development efforts. If the reader has other thoughts on the matter, they are more than welcome to install their preference.

Sample Code Invoking the ToOutput Extension Method

The sample code demonstrates a series of calls to the ToOutput extension method. The sample code demonstrates calls to the ToOutput extension method that all use the LINQ method call syntax.

At present, I cannot think of how to use this extension method in the LINQ query syntax. The reason for that inability is that in the design of this method. This method’s design characterises a ‘data sink’. I would call something a ‘data sink’ if it is a process just that consumes an IEnumerable(of T) and does not produce a sequence.

The pattern of consume an IEnumerable(of T) and produce an IEnumerable(of T) would characterise a ‘transformation’. Methods that conform to the ‘transformation’ pattern are inherently linkable into chains of processes.

The following are links to the sample code stored in docx and PDF formats.
https://craigwatson1962.files.wordpress.com/2012/02/simpletoprintstringtests_cs.docx
https://craigwatson1962.files.wordpress.com/2012/02/simpletoprintstringtests_cs.pdf

Conclusions

There a couple of thoughts I will raise in conclusion of the presentation of the ToOutput extension method. The thoughts include:

  • The development of LINQ extension method may appear daunting when you initially look at the topic. This is probably due to the syntax that the C# language requires to build an extension method is unfamiliar. I believe, and hopefully this and the proceeding blog posts demonstrates, that when you overcome that initial hurdle, implementing extension methods is not a significantly difficult process.
  • The use of Lambda Expressions or delegates to arguments to the extension methods is a very powerful and flexible implementation approach. This approach allows the abstraction between the pattern of processing and the actions applied at each step in the process. The extension method supplies the pattern of processing. The Lambda Expressions or delegates supply the caller-defined actions. This approach has much in common with functional programming approaches (see: Functional Programming topic in Wikipedia for an introduction to the topic).
  • This extension method does implement a solution that addresses, and resolves, the issue identified with the preceding blog post (LINQ Extension Method To Dump any IEnumerable). The strategy of writing directly to the output facility through the user supplied delegate, results in an implementation with a minimal memory requirement. Specifically, there is no dependence on the string object. This results in an implementation that cannot exceed the string objects finite capacity constraints.
    I have not evaluated whether this implementation is thread safe. A significant part the multi-thread safety of this extension method is dependent on the user’s implementation of the delegates passed into the extension method. The input delegates will need to be multi-thread safe to invoke this extension method the following fashion. The following is just one alternative for invoking this method in a parallel fashion.
    NB: This uses a “sledgehammer” to force parallel execution. The inclusion of the calls to AsUnordered() and WithDegreeOfParallelism(8) forces parallel execution, which is not what the ‘smarts’ in the implementation wants to do.
var Test1 = from counter1 in Enumerable.Range(0, 10000)
            select counter1;
var List1 = Test1.ToList();
var a = List1.AsParallel<int>().AsUnordered().WithDegreeOfParallelism(8).ToOutput(
    FormatFunction:
        (objValue, position) =>
            string.Format("[{0}] {1}\n", position, objValue));

The following the Parallel Stacks screen snippet, shows the multiple treads. In addition, the output comes out in a nicely unordered sequence, indicating parallel threads are in action.

, , , , , , , , , , , ,

4 Comments

%d bloggers like this: