# Archive for March 7th, 2012

### LINQ Pivot, or Crosstab, Extension Method

Posted by aussiecraig in CSharp, LINQ, Programming on March 7, 2012

# Introduction

This blog post is a continuation of the series of LINQ Short takes. In this blog post, I present an implementation of a pivot, or crosstab, transformation in Linq. I have used an implementation as a Linq extension method (specifically an extension method to IEnumerable(Of T) Interface).

This blog post builds on a number of preceding posts:

- Enumerable Range: LINQ Short Takes – Number 1 – Enumerable.Range()
- SelectMany: Linq SelectMany and IGrouping

## The Problem

The pivot, or crosstab, transformation is notoriously difficult to achieve in SQL. The original definition of SQL used as foundations concepts from relational algebra and tuple relational calculus. Linq having some of its root in SQL, shares the difficulty with SQL formulation of a pivot, or crosstab, transformation. Fortunately, there are enough of the procedural, and functional, programming aspects of C# that are included into the Linq design and implementation. The aspects of functional and procedural, programming present in Linq make expressing a generic solution to the pivot, or crosstab, transformation possible.

The following diagram attempts to show what the pivot transformation does to the data. At its core, the pivot transformation is just the swapping of the columns to rows, with the data following the transposition of the row and column vector. The difficulty comes in expressing this in a generic way.

## The Solution – Design

The following are the design considerations and design criteria. These design criterions, then informed the implementation of the pivot transformation. These were the mandatory criteria.

- The implementation must be an extension method (see MSDN documentation: Extension Methods (C# Programming Guide)for further details). This allows the implementation of Pivot to be applied to the just like any other Linq transformation.
- The extension method must use a generic type parameter (see MSDN documentation: Generics (C# Programming Guide)) for the object type being manipulated. This allows the broadest range of objects as inputs to the pivot transform.
- The extension method must be an extension to IEnumerable(Of T) Interface. This is the other half, to a broadest application criterion, for the pivot transformation. This criterion allows application of the pivot transformation to the broadest range of collections.
- The return type of the pivot transformation must be as generic as possible. This implies using the IEnumerable(Of T) Interface and a generic type parameter.
- The return type should preserve the type of object from the input sequence to the transformation. Failure to do this would mean that the pivot transformation is arbitrarily changing the input data. It is my belief that this would break the Linq transformation model the .Net Framework provides. Failure to adhere to this criterion would make the transformation far less usable in general.
- The implementation must be correct. I define correctness, as the pivot transformation should result in the generally accepted output of a pivot transformation.
- The implantation of the pivot transformation should be efficient. Here I mean that it should be relatively quick for reasonable volumes of data. I did not want an implementation that showed up as a performance hot spot in any program that used the function.

The following were the desirable design criteria:

- The implementation should be reasonable robust in dealing with ragged data. What I mean by ragged, or jagged, data is that not all rows of data have the same number of columns. This may be potentially difficult, and expensive, design criterion to address.
- The implementation should be able to cater for null objects in the input, transferring the null into the output.

#### The Solution – Implementation

The following is the method signature of the implementation.

public static IEnumerable<IEnumerable<TSource>> Pivot( this IEnumerable<IEnumerable<TSource>> pivotSource)

The Pivot<TSource> element of the function signature addresses the following design criteria:

- This introduces the generic type argument to the method. This then addresses ability to process any type of object with the transformation.
- The return type of IEnumerable<ienumerable<TSource>> addresses the following elements of the design criteria:
- The type argument TSource to the IEnumerable<IEnumerable<>> addresses the requirement to return a generic output.
- The use of the IEnumerable(Of T) Interface addresses the requirement to support the broadest range of collections.

The signature element this IEnumerable<ienumerable> pivotSource argument to the method addresses a number of the design criteria. These design criteria are:

- The use of this as the preface to the first argument in the argument list addresses, in part, the requirement for the implementation to be an extension method.
- The use of the IEnumerable(Of T) Interface address the requirement for the implementation to process the broadest range of collections.
- The use of the TSource type parameter enables the use of the method against the broadest range of objects possible.

In combination, the complete function signature addresses the following design criterion:

- The function signature addresses the requirement for type stability, or not arbitrarily change the type of the object processed, implemented through the usage of the same type argument TSource as part of the definition of the input and return types.

## The Source Code For The Pivot Implementation

Below is the source code for the implementation of the pivot extension method.

The bulk of the code is devoted to the checking of the argument. These argument checks protect the implementation from invalid argument that could cause exceptions from the underlying Linq implementation.

In the following

public static IEnumerable<IEnumerable<TSource>> Pivot<TSource>( this IEnumerable<IEnumerable> pivotSource) { // Validation of the input arguments, and structure of those arguments. if (Object.ReferenceEquals(pivotSource, null)) throw new ArgumentNullException("pivotSource", "The source IEnumerable cannot be null."); if (pivotSource.Count( ) == 0) throw new ArgumentOutOfRangeException("pivotSource", "The outer IEnumerable cannot be an empty sequence"); if (pivotSource.Any(A => Object.Equals(A, null))) throw new ArgumentOutOfRangeException("pivotSource", "None of any inner IEnumerables in pivotSource can be null"); if (pivotSource.All(A => A.Count( ) == 0)) throw new ArgumentOutOfRangeException("pivotSource", "All of the input inner sequences have no columns of data."); // Get the row lengths to check if the data needs squaring out int maxRowLen = pivotSource.Select(a => a.Count( )).Max( ); int minRowLen = pivotSource.Select(a => a.Count( )).Min( ); // Set up the input to the Pivot IEnumerable<IEnumerable> squared = pivotSource; // If a square out is required if (maxRowLen != minRowLen) // Fill the tail of short rows with the default value for the type squared = pivotSource.Select(row => row.Concat( Enumerable.Repeat(default(TSource), maxRowLen - row.Count( )))); // Perform the Pivot operation on squared out data var result = Enumerable.Range(0, maxRowLen) .Select((ColumnNumber) => { return squared.SelectMany (row => row .Where((Column, ColumnPosition) => ColumnPosition == ColumnNumber) ); }); return result; }

### Key Features in the Implementation

The following features of the implementation are worthwhile highlighting. These include:

- To manage ragged data the extension method makes all of the rows the same length (squares out the data). The square out process is done by the following method logic. The Concat Linq Extension Method appends the data required to fill out the input rows. The call to the Enumerable Repeat Method, builds the sequence that contains the number of elements required to fill out the row. The default(TSource) obtains the default value for the type of object in the sequence.
- The implementation uses the Enumerable.Range Method to generate the sequence of column numbers. The sequence of column numbers is then used to drive the transformation of each of the columns in the pivotSource argument inner IEnumerable into rows. I have written blog posts previously that describe using the Enumerable.Range Method (see: BLOG LINQ Short Takes – Number 1 – Enumerable.Range() )
- The implementation uses an upper bound for the Enumerable.Range Method that is the longest row in the inner IEnumerable of the pivotSource argument. The code fragment which achieves the determination of the longest row is: ‘pivotSource.Select(a => a.Count( )).Max( )‘. The use of the longest row in the pivotSource, in part, addresses the design criterion requirement for the pivot implementation to deal appropriately with ragged data.
- The implementation uses the Linq SelectMany extension method to flatten sequence of column values from each row of the input rows into a single IEnumerable for each output row.

### Downloadable Versions of the code

The following URLs contain the source code, including the XML Documentation Comments, for the implementation of the Pivot Linq extension method. There are pdf and docx versions of the code. These are the only types of text (rich text) files that Word Press allows to be loaded to the web site.

- https://craigwatson1962.files.wordpress.com/2012/03/pivot_extension_method.docx
- https://craigwatson1962.files.wordpress.com/2012/03/pivot_extension_method.pdf

## How and Why the Pivot Transformation Works

The following sections describe the logical operations, and implementation details, of the pivot transformation. I have broken this into three segments, and Overview, The Square Out Process, and The Data Pivot Process.

### Overview

The following diagram describes the broad logical flow of the implementation of the pivot extension method.

The validate arguments phase of the implementation is a relatively simply trying set of checks. The checks protect the remained of the implementation from inputs that would cause an exception to thrown. Since the implementation uses many Linq extension methods, these exceptions may originate from deep within the .Net Framework (see MSDN article: .NET Framework Conceptual Overview). Diagnosis of the reason why an exception was thrown, when thrown from within the .Net Framework can prove to be very difficult process. This would be even harder if the developer is using the pivot extension method supplied as of a class library (see MSDN article: Assemblies and the Global Assembly Cache (C# and Visual Basic)).

There are a couple of feature of the implementation of the input validation phase of the pivot transformation worthwhile noting. These features include:

- Many of the validation tests use the Object.ReferenceEquals method. Using the ReferenceEquals avoids possibility that Object.Equals method and the == operator could have been overridden. What the overridden versions implement could be inappropriate for the argument testing implementation.
- The argument validations use a Linq extension method worth mentioning. This is the check validates that there are no rows in the input that are object with a value of null. This check employs the Linq extension method ‘Any’ (see: Enumerable.Any(Of TSource) Method ). The Any Linq extension method is one of the Linq extension methods that I have overlooked in the past. Therefore, I make note of it, just in case the reader has not seen the ‘Any’ extension in action previously.
- There is another argument validation that uses a Linq extension method that is worthy of mentioning. This is the argument validation that all of the rows in the input that are not zero elements in length. This check employs the Linq extension method ‘All’ (see: Enumerable.All(Of TSource) Method). The ‘All’ Linq extension method is another Linq extension methods that I have overlooked in the past. Therefore, I make note of ‘All’ Linq extension method here, just in case the reader has not seen it in action previously.

Pivot Transform Top Level Processes

There are two processes in the implementation that will describe in further depth. These are:

- I will describe in more detail the Linq process of squaring out the input data. The square out process makes all of the rows in the inner IEnumerable the same length. Only ragged input data has this transformation applied.
- I will also describe in more detail the Linq process of performing the pivot transformation. The only input to the pivot transformation is the squared out data. The output from the pivot transform is then the return value of the extension method.

### The Square Out Process

The following diagram illustrates the transformation to the row from the input to form the squared output. The transformation actually uses the default keyword (see the MSDN article: default Keyword in Generic Code (C# Programming Guide) for further details) to obtain the fill value used in the square out process.

The flowing diagram describes (attempts to) the logical process flow of the Square Out process.

#### The Components of the Square Out Process

#### The Code for the Square Out Process

The following is the code used in the pivot transform to generate the squared out data.

// Get the row lengths to check if the data needs squaring out int maxRowLen = pivotSource.Select(a => a.Count( )).Max( ); int minRowLen = pivotSource.Select(a => a.Count( )).Min( ); // Set up the input to the Pivot IEnumerable<IEnumerable<TSource>> squared = pivotSource; // If a square out is required if (maxRowLen != minRowLen) // Fill the tail of short rows with the default value for the type squared = pivotSource.Select(row => row.Concat( Enumerable.Repeat(default(TSource), maxRowLen - row.Count( ))));

#### The Square Out Process in Prose

There are only a couple of steps it the square out process. These steps are:

- Get the minimum and maximum column count from the input sequence. These are the Count().Min() and Count().Max().
- If the column count minimum value and column count maximum value, are not the same. Then data needs squaring out, and proceeds into the square out process. Otherwise, the data is already square and the data proceeds to the pivot transformation.
- For each row in the input sequence. The code: pivotSource.Select(row =>
- Concatenate onto the row, for output, the required filler. The Enumerable.Concat() call for each of the rows, does this.
- Generate the correct number, and correct type, of filler elements. The code:Enumerable.Repeat(default(TSource), maxRowLen – row.Count( )).

### The Data Pivot Process

The diagram Linq Pivot Transformation (blow) tries to express visually what the Linq statement that implements the Pivot does. I hope that the readers of this blog post find the diagram helps understand how the pivot transformation works.

#### The Code For the Pivot Transformation

// Perform the Pivot operation on squared out data var result = Enumerable.Range(0, maxRowLen) .Select((ColumnNumber) => { return squared.SelectMany (row => row .Where((Column, ColumnPosition) => ColumnPosition == ColumnNumber) ); }); return result;

#### The Pivot Process Logical Process

The keys to the operation of the pivot process are the way use of the following:

- The use of the Enumerable.Range() that is used to generate the sequence of columns numbers. The pivot transformation is one that takes each of the columns of the input sequence. Then the transformation makes a row from that set of values that then forms the result.
- The SelectMany() which squashes the set of values from each column, into a row for the output.
- The Where(value, column) variant that exposed the column number. This allows the selection of the correct column in the row.

## Future Versions or Enhancements

The following are the potential enhancements I may make to the Pivot Linq Extension method.

- As I describe below, the question ‘Should the square out process be optional within the pivot?’, I have yet to satisfactorily resolve. The enhancement to enable this would be to include a Boolean argument to the Pivot method, with a default value of true. It is my belief that most of the use cases for the pivot transformation would require squared out data. Nevertheless, I cannot completely dismiss the possibility that there could be use cases where the output of a pivot without squared out data is required.
- In addition, as described below, I still have the question ‘Should the square out process be a separate Linq extension method?” unresolved. Refactoring the pivot method to make the square out process a separate Linq extension method, or a private support method within the library, I will leave until sometime in the future.

# Making an IEnumerable<IEnumerable<of Source Type>>

## Introduction

It has become a concern to me that I have been using the generic data structure IEnumerable<ienumerable>, but have omitted any hints, or code examples, on how to build such a structure. This section will address that concern and present some of the ways I have used to create this type of data structure. Much of what I will present is from my testing unit of the pivot transformation. Although these will be short code snippets, I trust that they will give the reader, some clues for creating this type of data structure.

## Code Examples

I have decided to include a link to the code examples in pdf, and docx format. The pasting of the code for the examples in here would only increase the size of this post, which I believe is getting too big anyway.

- https://craigwatson1962.files.wordpress.com/2012/03/ienumerable-examples.docx
- https://craigwatson1962.files.wordpress.com/2012/03/ienumerable-examples.pdf

## Summary and Conclusions

There are many different ways to achieve the formation of an IEnumerable<ienumerable> structure. The above links present two of the basic strategies:

- The first set of examples shows an approach to achieve the structure that centre around using an array, and the IEnumerable interface that the System.Array class presents.
- There is a second set of examples included above. These links show the approach of implementing the IEnumerable interface on a user defined class. Object instances, within an array definition, achieve the required structure.

As an aside, these objects also build random test data for a variety of data types. These classes built the test data for the pivot process. The inheritance hierarchy allowed creation of ragged data.

### Conclusions and Observations

There are a number of areas which are worthy of noting in my final remarks.

## The Pivot and Square Out Processes

I am still to reach satisfactory conclusions to the following questions:

- Should the square out operation within the Pivot process be optional? As I note below, the Pivot on ragged data without the square out process, gives a results that, I would describe as unacceptable (or plain ‘wrong’). This conclusion is strictly a matter of one man’s, that man being I, opinion. There may be use cases where the type of results of a pivot transformation on unpadded (squared out) data is what the caller of the method requires. This is one of the many eternal dilemmas that library designer consistently wrestle with, and ruminate upon. The question, put succinctly, for the library designer to resolve is, ‘Does this design decision precluding some real use cases?’ The normal response to this dilemma is to make the option an argument option in the method signature and allow the user to make the choice between the behaviours. Fortunately, C# provides optional arguments with a default value, which provides one resolution to the dilemma. The library designer can provide the library user with the option in the method signature, and provide some guidance by using an optional argument, with the default value that satisfies the expected normal, and most common, use case. Before I publish this blog post, I may yet, make switching on the square out process an optional Boolean argument, with a default value of true.
- Is the square out transformation useful as a standalone Linq extension method? Here again I am uncertain that there are use cases where this would be needed. In this case, I doubt that I will refactor out the square out process into a new extension method any time soon. I will wait and see if I find a need for the square out process as separate Linq extension method.

## Conclusions, and Observations, from the Design and Development Process

During the writing of this blog post, I have modified the code for the Pivot extension method a number of times. There have been two major sources for the modifications.

- One source of modifications was the building a set of unit tests (see MSDN article: Verifying Code by Using Unit Tests) for Pivot extension method. The process of designing, and implementing, the unit tests made me think more clearly on the validation of input arguments.
- Another source of modifications was writing this blog post. Writing this article made me think more clearly about ragged data as an input.

The most notable modifications were:

- The tightening up, and greater clarity, in the validation checks performed on the input data sequence.
- The addition of the square out process came from the thinking more clearly, and deeply, on the processing of ragged input data. Specifically, I realised that what the transformation resulted in when processing ragged data was unacceptable. The output from a ragged input would have broken the rows swapped to columns conceptual model of the transform. The pivot transformation applied to input data without the square out resulted in data from row ending up in the wrong columns. This realisation resulted in the implementation of the square out process.

The result of these modifications is a ‘better’ implementation of the pivot transformation. The formulation of the input argument is clearer. The output from the extension method is significantly more robust. This increase in robustness comes from the introduction of the square out process that addresses the problems that ragged input data brought.

## Observations from the Implementation

There were a number of ‘discoveries’ (they were always there, but I found them and had a use for them) in the standard Linq extension methods. These Linq extension methods included:

- The Enumerable.Any(Of TSource) Method is a method is a method that is very useful for argument validation. I would expect the Any Linq extension method would make more appearances in my code.
- The Enumerable.All(Of TSource) Method is another method that is very useful for argument validation. Again, I will be using it more frequently.
- The Enumerable Repeat Method is another very handy Linq extension method. For a simple, and built-in to the .Net Framework, way to build an IEnumerable, this would appear to be the method to use.
- The Enumerable.Concat Method is yet another discovery. I previously developed a UnionAll extension method (see: LINQ Short Takes – Number 4 –Make Union into a UnionAll). The Enumerable Concat method does exactly the same thing that the UnionAll implementation achieved.

The conclusion and observation I would draw, is that no matter how well you think you know the .Net Framework library, there is still more to useful classes and methods to find. Microsoft unremittingly is adding to my ‘voyage of discovery’ through the .Net Framework. Microsoft keeps C# .Net environment relevant to the emerging challenges in the IT industry, by adding new features to the .Net Framework library, and the C# language, with every new version of the C# .Net environment. For me, this only adds to the ‘fun’ of using C# .Net, there seems that there is always something new to learn how to use effectively.

###### Related articles

- LINQ Short Takes – Number 4 – Make Union into a UnionAll (craigwatson1962.wordpress.com)
- LINQ Extension Method to Generate n-way Cartesian Product (craigwatson1962.wordpress.com)
- C# Short Takes – 2 – String from IEnumerable (craigwatson1962.wordpress.com)
- Dumping a formatted IEnumerable to Output (craigwatson1962.wordpress.com)
- LINQ Extension Method To Dump any IEnumerable (craigwatson1962.wordpress.com)
- LINQ Short Takes – Number 2 – Using Method Syntax to Create a Cartesian Product (craigwatson1962.wordpress.com)
- LINQ Short Takes – Number 3 – LINQ over Multiple Dimension Arrays and Lists (craigwatson1962.wordpress.com)
- C# Short Takes – 1 – XML Comments syntax for the cref attribute to a Generic Type Method (craigwatson1962.wordpress.com)
- Get previous and next item in a IEnumerable using LINQ (stackoverflow.com)
- LINQ to DataGridViewRowCollection (stackoverflow.com)