Posts Tagged Class (computer programming)

LINQ Short Takes – Number 4 –Make Union into a UnionAll


Introduction

One thing that has exasperated and puzzled me for some time is the absence of a UnionAll from the Linq default implementation. Granted pure relational algebra that was the basis of the initial SQL implementations only defined Union. Now modern SQL implementations have had the UnionAll variant added. SQL Server has the Union All (see: UNION (Transact-SQL)). Why the designers and implementers of Linq ignored the UnionAll variant remains a mystery. Maybe it is because it is relatively easy to make Union implementation operate in the way UnionAll operates.

The Solution – Design

The Linq Union extension method comes in two overloads, one of these overloads (see: Enumerable.Union(TSource) Method (IEnumerable(TSource), IEnumerable(TSource), IEqualityComparer(TSource)) (System.Linq)) takes an IEqualityComparer Interface.

The way to get the Union extension method to operate like a UnionAll is to provide an implementation of this interface that says ‘everything is different’. This causes the distinct phase of the Union to keep everything, rather than default behaviour of throwing way the second, and subsequent, of any duplicate.

The Solution – Implementation – EverythingDifferentEqualityComaperer Class

There is not much to the implementation. Simply explained, the understanding that you want nothing to be equal, and hence have all inputs in the output is the key to this implementation. The Equals method returning false in all cases achieves this requirement.

/// <summary> /// A class which implements the IEqualityComparer Interface. /// The implementation makes all elements not the same. /// Currently, this is class used in the UnionAll extension method /// to achieve the all rows preserved. /// /// <typeparam name="TSource">The type of objects being compared.typeparam> internal class EverythingDifferentEqualityComaperer<TSource>
    : IEqualityComparer<TSource>
{
    /// <summary> /// Public constructor for the class. /// </summary> public EverythingDifferentEqualityComaperer( )
    {

    }

    #region IEqualityComparer<TSource> Members
    /// <summary> /// This method says 'everything is different' /// </summary> /// <param name="x">One of the objects of <typeparamref name="TSource"/> to be compared. /// The implementation ignores the object.</param> /// <param name="y">One of the objects of <typeparamref name="TSource"/> to be compared. /// The implementation ignores the object.</param> /// <returns>false for all values.</returns> bool IEqualityComparer<TSource>.Equals(TSource x, TSource y)
    {
        return false;
    }
    /// <summary> /// Returns the hash code of the object. /// </summary> /// <param name="obj">The object of <typeparamref name="TSource"/> /// Hash code of the object.</returns> int IEqualityComparer.GetHashCode(TSource obj)
    {
        if (Object.ReferenceEquals(obj, null))
            return 0;
        return obj.GetHashCode( );
    }

    #endregion }

The Solution – Implementation – UnionAll Extension Method

The following is the implementation of the UnionAll. It simply creates an instance of the EverythingDifferentEqualityComaperer class and calls the Linq Union extension method.

/// <summary> /// This method implements the UnionAll extension method. /// This extension method results in all of the rows in <paramref name="Source1"/> /// having all the rows in <paramref name="Source2"/> appended.<br/> /// This method makes use of the Linq extension method /// <seealso cref= /// "System.Linq.Enumerable.Union(IEnumerable , /// IEnumerable , IEqualityComparer )"/> /// with and implementation of the /// <seealso cref="System.Collections.Generic.IEqualityComparer< T>"/> /// ///  /// The type of objects in both <paramref name="Source1"/> and <paramref name="Source2"/>. /// </typeparam> /// <param name="Source1"> /// The input sequence which is used as the first set of rows in the output. /// These rows are of <typeparamref name="TSource"/> type. /// </param> /// <param name="Source2"> /// The input sequence which is used as the second set of rows in the output. /// These rows are of <typeparamref name="TSource"/> type. /// </param> /// <returns>An output IEnumerable of type <typeparamref name="TSource"/> /// that contains all the rows from /// <paramref name="Source1"/> and <paramref name="Source2"/>. /// </returns> /// <remarks> /// This implementation used the Linq extension method <seealso cref= /// "System.Linq.Enumerable.Union<TSource>(IEnumerable<TSource> , /// IEnumerable<TSource> , IEqualityComparer<TSource> )"/> /// with and implementation of the /// <seealso cref="System.Collections.Generic.IEqualityComparer< T>"/> /// that makes all objects not equal. This forces the distinct phase of /// the union process to maintain all of the rows in the inputs. /// </remarks> public static IEnumerable<TSource> UnionAll<TSource>(
    this IEnumerable<TSource> Source1,
    IEnumerable<TSource> Source2)
{
    EverythingDifferentEqualityComaperer AllDifferent =
        new EverythingDifferentEqualityComaperer<TSource>( );
    return Source1.Union(Source2, AllDifferent);
}

The Source Code

The following URLS contain the source code presented above. The UnionAll code needs to be in a class with the following attributes (the class name is something you can choose):

    public static class LINQ_Extension_Methods 

https://craigwatson1962.files.wordpress.com/2012/02/linq-short-takes-4-unionall-code.docx
https://craigwatson1962.files.wordpress.com/2012/02/linq-short-takes-4-unionall-code.pdf

Why It Works?

The supplying an object that implements the IEqualityComparer Interface allows you to control the way that the distinct phase of the standard (built-in into the .Net Framework [see MSDN documentation: Overview of the .NET Framework]) Union Linq Extension Method (see the MSDN Documentation: Enumerable.Union(TSource) Method (IEnumerable(TSource), IEnumerable(TSource), IEqualityComparer(TSource)) (System.Linq)) operates. The way, I have found, to make the Union extension method to operate like a UnionAll is to provide an implementation of this interface that says ‘everything is different’. The ‘everything is different’ implementation of IEqualityComparer causes the distinct phase of the standard Union extension method to keep all input rows from both. This has the effect of overriding the default behaviour of the Union extension method. The default behaviour of the Union extension method is to keep only the distinct values from the input sources.

I suspect that the intention of the designers, and implementers, of the Linq extensions to the .Net Framework had a very different intention for the usage of the IEqualityComparer argument to the Union extension method. That intention, I speculate, would have been to allow users to implement the semantic meaning of equality for user-defined objects. The use of IEqualityComparer argument’s implementation to ‘switch off’ the distinct process in the Union extension method would, I speculate, come as a bit of a surprise to them.

Future Versions

The UnionAll implementation, for me now, is all that I want. Hence, I do not foresee making any changes or enhancements to the implementation. My satisfaction with the implementation is not entirely complete though, I may include some argument null checking later. I will leave that for my experimentation with the Unit Testing Framework though.

This implementation does highlight, for me at least, another gap in the standard Linq implementation. This gap is an extension method that appends a single value to an existing IEnumerable, would seem like a natural next step.

Conclusions

The implementation of the UnionAll extension method demonstrates that some of the standard Linq extension methods are very flexible. The flexibility of standard extension methods allows bending them into implementations of new transformations that are new, and very useful.

The more I develop extension methods, the more I appreciate this addition to the C# Language. The combination of generics (see MSDN Documentation: Introduction to Generics (C# Programming Guide) and Generic Type Parameters (C# Programming Guide) ), and extension methods results in very powerful code, and code which saves the developer ‘miles of code’.

Advertisements

, , , , , , , , , , , , ,

3 Comments

LINQ Extension Method To Dump any IEnumerable


Introduction

I have been doing some development with LINQ recently, and will present some of the generally useful LINQ Extension methods in this (and some forthcoming blog posts).

This post will focus on the most generally useful LINQ extension methods I have developed. These methods produce a formatted dump the contents of a sequence (IEnumerableto be precise).

These methods have evolved to their current through the application of the DRY Principle (Don’t Repeat Yourself). I was finding that I was writing very similar code to dump the contents of LINQ result sequences repeatedly. The repetitions of very similar code through the project lead me to developing these extension methods.

The Class Defining the Extension Methods

The C# compiler that implements the rules for defining the implementation of Extension Methods is very pedantic. The containing class must be marked as static. The requirements for the implementation of extension methods are described in Extension Methods (C# Programming Guide) .The following is the class definition which I have been using to contain the LINQ extension method.

    public static class LINQ_Extensions 

The full version of the class definition, with comments, which result in IntelliSense context sensitive help being generated is as follows:

    /// <summary> /// Class which supplies LINQ extension Methods. /// Extension methods are: /// <see cref="Window"/>, /// <see cref="AllValuesDistinct"/>, /// ToPrintString"/>, /// <see cref="ToIntegralValue"/>, /// <see cref="ToBigIntValue"/>. /// </summary> public static class LINQ_Extensions 

Failure to mark the class as static will result in the compiler error CS1106.

error CS1106: Extension method must be defined in a non-generic static class

Introducing The ToPrintString LINQ Extension Method

This is the first of a pair of LINQ extension methods I will present. This method is the one I first implemented. The second extension method I will resent here is very similar to this method, but addresses some of the limitations that this method contains. The most significant limitations of this implementation I will detail further on in this bog post.

ToPrintString – Design Decisions

The design of this extension method needed to enable a number of features. These design features included:

1) I wanted external control over a couple of points in the processing of this extension method. These points of control accept Lambda Expressions, enabling the caller to supply the functionality that is required.

2) I wanted the extension method to work with any type of object. Hence, the use of a generic type parameter to the implementation (see: Introduction to Generics (C# Programming Guide) for further information).

3) The type parameter for this implementation should not be constrained. Hence, it will work with declared stucts, declared classes, and anonymous classes (see Anonymous Types (C# Programming Guide) for further information).

4) I wanted a simple signature to the extension method. This desire lead me to implement the method using Optional Parameters (see: Named and Optional Arguments (C# Programming Guide)) with Default Values that provide a useful (in my opinion) implementation. This desire should (and has) resulted in an implementation that can be invokes with no arguments.

5) I wanted the flexibility to select the elements of the sequence are dumped. To achieve this wanted to have a where predicate, like the LINQ Where method, as part of the implementation.

6) I wanted to have all of the information which LINQ can supply available. The particular piece of information that I wanted available was the position in the sequence each object occupies.

ToPrintString – Implementation Decisions

The decisions made in the implementation fall into two groups. These groups are the implementation decisions that support, or implement, the design goals, and those which support an efficient and effective implementation. The following is the signature for the implementation of the extension method.

public static string ToPrintString<TSource>(
    this IEnumerable<TSource> InputSequence,
    Func<TSource, int, bool> WherePredicate = null,
    Func<TSource, int, string> FormatFunction = null,
    Func<StringBuilder, string, StringBuilder> ConcatenateFunction = null)

Implementation Features Supporting The Design Goals

The signature of the extension method implements a number of the design goals for the method. These design goals and implementations include:

1) The three Func< arguments expose the points in the method where the caller of the method can supply custom functionality. This satisfies the goal of allowing the caller to supply functionality that is required.

2) The parts of the function’s signature that utilises the <TSource> type parameter enables the flexibility to apply the extension method to any type of object contained in a sequence.

3) The function signature does not contain any type parameter constraints (see: Constraints on Type Parameters (C# Programming Guide) ). This further enables the flexibility of the method, allowing application to any type.

4) The three Func< arguments to the method are declared with a default a value of null. This allows the method to be invoked using .ToPrintString() call. I will write more about the use of null as a default value further in this blog post.

5) The method argument Func<TSource, int, bool> WherePredicate = null, enables the capability to apply a logical expression to select objects from the sequence. The signature that is utilised for the WherePredicate is same signature as the LINQ Where extension method.

6) The int arguments to the Func<TSource, int, bool> WherePredicate and Func<TSource, int, string> FormatFunction is position in the sequence that the object occupies. This is a base zero number.

Implementation Features Decisions Supporting An Effective Implementation

There are a couple of implementation choices within the implementation. These choices attempt to achieve the most efficient, and effective, implementation. These choices include:

· The FormatFunction and ConcatenateFunction provide useable, and for me useful default values, if the argument is null. See below (The Implementation of ToPrintString) for the default values implemented.

· The extension method uses the LINQ Extension Method Aggregate to perform the output, or resulting, string concatenation. The Aggregate extension method seems to be the clearest expression of intent in forming the output from the extension method.

· The Aggregate method uses a StringBuilder object to assemble the result. The use of the StringBuilder object results in a more efficient implementation when compared with just concatenating String Objects (in general, and when the size of the output string can get large).

The Implementation of ToPrintString

The following is the implementation of the ToPrintString extension method.

/// <summary> /// Builds a printable string from the enumerable. /// /// <typeparam name="TSource">Type of the source object contained in the enumerable. ///InputSequence">The enumerable which is formatted for printing. /// ///WherePredicate">[Optional] A where predicate used to select the objects to be output.<br/> /// Predicate signature: /// <code>FuncInputObject, int PositionInSquence, bool ReturnValue></code> /// Predicate Arguments: /// <list type="number"> /// <item><description>[Input Parameter] <br/> /// The object from the input sequence. <br/> /// The type of the object is the same as the declaration of the sequence.<br/> /// For compound sequences like Dictionary the object is a KeyValuePair. /// </description></item> /// <item><description>[Input Parameter] <br/> /// The position in the input sequence which object occupies. /// This is a base zero number.</description></item> /// <item>><description>[Return Parameter] <br/> /// Indicates if the object should be selected (true case) or excluded (false case). /// /// /// /// ///FormatFunction">Optional. A formatting function which will convert /// the position in the sequence and the object into a string value.<br/> /// Predicate Signature: /// <code>FuncInputObject, int PositionInSquence, bool ReturnValue></code> /// Predicate Arguments: /// <list type="number"> /// <item>[Input Parameter] The object from the input sequence. <br/> /// The type of the object is the same as the declaration of the sequence.<br/> /// For compound sequences like Dictionary the object is a KeyValuePair.</item> /// <item>[Input Parameter] The position in the input sequence which object occupies. /// This is a base zero number.<br/> /// If a where clause is used and rejects objects, /// then this number is the position in the result of the where clause sequence. /// </item> /// <item>[Return Parameter] The required string representation of object .</item> /// <item>[Default Value}<br/>The following is used if FormatFunction argument is null. /// <code>(Source, Position) => string.Format("[{0}] {1}", Position, Source);  /// /// ///ConcatenateFunction">[Optional]<br/> /// A function which concatenates the string versions of the object into one string.<br/> /// Predicate Signature: /// <code>Func<StringBuilder ResultString, string ObjectStringValue, StringBuilder ReturnValue></code> /// Arguments: /// <list type="number"> /// <item>[Input Parameter]<br/> /// The StringBuilder object which is used to collect the input object formatted strings.</item> /// <item>[Input Parameter]<br/>T /// he string representation of the object, generated by the FormatFunction.</item> /// <item>[Returns Parameter]<br/>StringBuilder result from the ConcatenationFnuction.</item> /// <item>[Default Value]<br/>This is used if ConcatenationFunction argument is null. /// <code>ConcatenateFunction = (Result, Value) => Result.AppendFormat(" {0}", Value);</code> /// </item> /// </list> /// </param> /// <returns>String of formatted and concatenated values.</returns> /// <remarks> /// This extension method can potentially exceed the maximum capacity of the string object. /// </remarks> /// <example> /// <code>// Simple tests case - no function arguments /// string output = "Test1".ToPrintString(); /// Debug.WriteLine( /// string.Format("Characters in Test1 = {0}", output)); /// </code> /// </example> public static string ToPrintString<TSource>(
    this IEnumerable<TSource> InputSequence,
    Func<TSource, int, bool> WherePredicate = null,
    Func<TSource, int, string> FormatFunction = null,
    Func<StringBuilder, string, StringBuilder> ConcatenateFunction = null)
{
    if (FormatFunction == null)
        FormatFunction = (Source, Position) => string.Format("[{0}] {1}", Position, Source);
    if (ConcatenateFunction == null)
        ConcatenateFunction = (Result, Value) => Result.AppendFormat(" {0}", Value);
    StringBuilder retVal;
    if (WherePredicate == null)
    {
        retVal = InputSequence
            .Select((a, pos) => FormatFunction(a, pos))
            .Aggregate(new StringBuilder(),
            (ReturnString, Value) => ConcatenateFunction(ReturnString, Value));
    }
    else {
        retVal = InputSequence
            .Where((InputObject, Position) => WherePredicate(InputObject, Position))
            .Select((InputObject, Position) => FormatFunction(InputObject, Position))
            .Aggregate(new StringBuilder(),
            (ReturnString, Value) => ConcatenateFunction(ReturnString, Value));
    }
    return retVal.ToString();
}

Examples of code calling the ToPrintString Extension Method

The following is a method that demonstrates a number of invocations of the extension method ToPrintString. I hope that it shows the main ways that this method could be invoked.

private void SimpleToPrintStringTests()
{
    // Simple tests case - no function arguments string output = "Test1".ToPrintString();
    Debug.WriteLine(
        string.Format("Characters in Test1 = {0}", output));

    // Declaring a simple array int[] SimpleArray = new int[] { 1, 2, 3, 4 };
    // Test Against an array with no arguments. output = SimpleArray.ToPrintString();
    Debug.WriteLine(
        string.Format("Simple 4 Element int array {0}", output));
    // Supplying a where clause output = SimpleArray.ToPrintString((val, pos) => val % 2 == 0);
    Debug.WriteLine(
        string.Format("Simple 4 Element int array, with a where clause {0}", output)); ;
    // Supplying an optional argument for the Format Function output = SimpleArray.ToPrintString(FormatFunction: (val, pos) => val.ToString());
    Debug.WriteLine(
        string.Format(
        "Simple 4 Element int array, with an optional Format Function argument\n{0}" , output)); ;
    // Supplying a optional argument for the Concatenation Function output = SimpleArray.ToPrintString(
        ConcatenateFunction: (carry, val) => carry.AppendFormat("{0}, ", val));
    Debug.WriteLine(
        string.Format(
        "Simple 4 Element int array, with an optional Concatenation Function argument\n{0}" , output));
    // Declaring a where predicate Func<int, int, bool> WherePredicate =
        (InputObject, Position) =>
        {
            if (Position % 2 == 0)
                return false;
            return true;
        };
    // Supplying a where predicate as a externally declared function output = SimpleArray.ToPrintString(WherePredicate);
    Debug.WriteLine(
        string.Format(
        "Simple 4 Element int array, with a Where Predicate argument declared externally\n{0}" , output));

    // Declaring a list of objects List<Tuple<int, string>> SimpleObjects = new List<Tuple<int, string>>()
    {
        Tuple.Create(1, "Test"),        Tuple.Create(200, "String Test"),
        Tuple.Create(-21, "Testing"),   Tuple.Create(0, "the quick brown")
    };
    // External Where Predicate Func<Tuple<int, string>, int, bool> Where1 =
        (obj, pos) =>
        {
            if (obj.Item1 >= 0) return true;
            else return false;
        };
    // External Format Predicate Func<Tuple<int, string>, int, string> Format1 =
        (obj, pos) => string.Format(
            "[{0}] int value={1} string value ={2}\n" , pos, obj.Item1, obj.Item2);
    // Calling using external declared predicates output = SimpleObjects.ToPrintString(Where1, Format1);
    Debug.WriteLine(
        string.Format("Processing a list of object with external predicates\n{0}" , output));
    // Another where predicate Func<Tuple<int, string>, int, bool> Where2 =
        (obj, pos) =>
        {
            if (obj.Item1 < 0) return true;
            else return false;
        };
    // Calling using external declared predicates output = SimpleObjects.ToPrintString(Where2, Format1);
    Debug.WriteLine(
        string.Format("Processing a list of object with external predicates 2\n{0}" , output));

    // A more complex object collection to test against Dictionary<long, int?> DictTest = new Dictionary<long, int?>()
    {
        { 234L, null },         {-44345L, 65742 },
        { -5644, null },        {6799032L, 8765464 }
    };
    // Supplying where and format as more complex inline lambda expressions output = DictTest.ToPrintString((dictObj, pos) => dictObj.Value.HasValue,
        (dictObj, pos) =>
            string.Format("Key={0} Value={1}\n", dictObj.Key, dictObj.Value.Value));
    Debug.WriteLine(
        string.Format("Processing a dictionary with inline lambda expressions\n{0}" , output));
    // A format function for a dictionary. // NB: You need to unwrap the Dictionary into passed KeyValuePair objects. // Also, lambda capture of the Dictionary Object to supply the Count property. Func<KeyValuePair<long, int?>, int, string> Format2 =
        (dictObj, position) =>
        {
            if (dictObj.Value.HasValue)
                return string.Format("Object {0} of {1} Key = {2} Value = {3}\n",
                    position, DictTest.Count(), dictObj.Key, dictObj.Value);
            else return string.Format("Object {0} of {1} Key = {2} Value = null\n",
                    position, DictTest.Count(), dictObj.Key);
        };
    output = DictTest.ToPrintString(FormatFunction: Format2);
    Debug.WriteLine(
        string.Format(
        "Processing the KeyValPair objects with named external function\n{0}" , output));
    return;
}

Limitations Of The Implementation

There is one significant limitation of is implementation the use of a string as the return type. The string object has a finite (but quite large) limit on the length of the string that can be stored in the object. A sequence with many elements, and/or a large amount of information formatted per object, could exceed the maximum size of a string. The System.String documentation says that this limit is about 2GB (or about 2 billion characters); a character count is not possible because the characters of the string are stored as Unicode characters (which can be multiple bytes per character).

Mechanically, or within the string class, the finite limit on the size of the string is probably the maximum positive value of an Int32, or 2,147,483,647 characters, and 2GB of memory. The System.String uses Int32 as arguments to many methods and properties, and probably internally as well. This dependence on the Int32 is why I would conclude the finite limit for the class would be the value of Int32.MaxValue (or 2,147,483,647).

In a subsequent blog post, probably the next blog post, I will detail another extension method that addresses this limitation.

Conclusions

There are a number of points that I should are worthy of noting. These concluding remarks include:

· Building this extension method was not particularly difficult.

· The use of the Func<> object takes a bit of getting used to. There are plenty of examples showing how to use it in the .Net framework library.

· The Func<> object does not allow for more information that in the method signature than shown above. The use and meaning of the arguments has to come in the supporting documentation for the method.

· The use of a default value of null for the optional Func<> arguments is all C# seems to allow. This is probably a design decision made in the definition of language.

, , , , , , , , , ,

1 Comment

An Object Which Contains Variant Data – Handling Arrays


Introduction

I struck this one working with WMI (Windows Management Instrumentation) and the C# classes in the System.Management Namespace.

The property value is just and object which can contain a singular value of an array of a values.

This post shows how I addressed the problem, and an interesting feature of C#.

The Object I was working with

clip_image004clip_image002clip_image006

What an Array Looks Like

clip_image008

The Code to Dump the Values in the Array

        private static void DumpArrayValues(PropertyData PropArray, int Indent)
        {
            Array x1 = (Array)PropArray.Value;
            String TabIndent = new string('\t', Indent);
            for (int i = 0; i < x1.Length; i++)
            {
                Debug.WriteLine(String.Format("{0} [{1}] {2}", TabIndent, i, x1.GetValue(i)));
            }
        }

Interesting Features

image

  • Value inherits (I have shown it as a multiple inheritance, I know C# does not support multiple inheritance but that is what it looks like to me) from both System.Array and System.String.
  • Casting the property array object as an Array object, allows you to get the length (number of elements in the array) of the array.
  • String.Format takes care of the different object types.

, , , ,

Leave a comment

%d bloggers like this: