LINQ over FileInfo And Using Let for Clarity


Introduction

I was reading on StackOverflow a couple of days ago, a question on dealing with the FileInfo class and LINQ. It just so happened I had occasion to delve into this “neck of the woods” myself in the last couple of days. I wanted to calculate some metrics on the relative merits of various compression formats for graphics files, or the minimum size achieved for each format, of the same image stored in a number of image formats (bmp, jpeg, pgn, tiff and wdp). This then lead me into using LINQ over the directory structures and using the the FileInfo class to get the file size.

The Solution

The following is my way of sorting through a directory with a large number of files all by the same name, apart from extensions, and determining which is the smallest. Then, summarising those results into a one line average of all the files processed.

There are a couple of features worth noting:

  • The use of a functions in the first LINQ statement. This makes pulling the file name part so much easier. When I was building the solution, I was putting a breakpoint in the function and making sure I had all of the manipulations happening correctly. This was before I add part of the LINQ statement which created the the FileInfo objects for each of the files.
  • The use of the let statement in LINQ. For me the presence of the let clause is a real sanity saver. I find it so much simpler to work my way through building the LINQ statement when I have some local variable to work with, and really helps me keep my logic clear.
  • The use of the Tuple class. This is a new class in .Net 4.0 Framework. I’m finding the Tuple class a real boon. It is particularly useful for those occasions where I need to tie a couple of values (or instances of classes) together and treat them as one for some process. Sure, you could create a class which does the same thing as well. But, for internal processes, the “just for a moment” as I do something case, the Tuple class is really handy.
  • The use of the 1.0 * in the Best Ratio function. I’m showing my old fashioned roots which come from working with systems based on FORTRAN years ago. But the 1.0 * in the calculation “kicks” the calculation into floating point out of long/integer arithmetic calculations.
class Results
{
    private string _dir;

    public string Dir
    {
        get { return _dir; }
        set { _dir = value; }
    }
    private string _namePart;

    public string NamePart
    {
        get { return _namePart; }
        set { _namePart = value; }
    }
    private long _bmpSize;

    public long BmpSize
    {
        get { return _bmpSize; }
        set { _bmpSize = value; }
    }
    private long _jpegSize;

    public long JpegSize
    {
        get { return _jpegSize; }
        set { _jpegSize = value; }
    }
    private long _tiffSize;

    public long TiffSize
    {
        get { return _tiffSize; }
        set { _tiffSize = value; }
    }
    private long _gifSize;

    public long GifSize
    {
        get { return _gifSize; }
        set { _gifSize = value; }
    }
    private long _wdpSize;

    public long WdpSize
    {
        get { return _wdpSize; }
        set { _wdpSize = value; }
    }
    private string _bestType;

    public string BestType
    {
        get { return _bestType; }
        set { _bestType = value; }
    }
    private double _bestRatio;

    public double BestRatio
    {
        get { return _bestRatio; }
        set { _bestRatio = value; }
    }
    public Results()
    {

    }
    public Results(string Dir, string NamePart,
        long BmpSize, long JpegSize, long TiffSize, long GifSize, long WdpSize,
        string BestType, double BestRatio)
    {
        this._dir = Dir;
        this._namePart = NamePart;
        this._bmpSize = BmpSize;
        this._jpegSize = JpegSize;
        this._tiffSize = TiffSize;
        this._gifSize = GifSize;
        this._wdpSize = WdpSize;
        this._bestType = BestType;
        this._bestRatio = BestRatio;
    }

}

class CheckFilesSizes
{
    public CheckFilesSizes()
    {

    }

    internal void Analysis(int AnalysisVersion, string DirectoryRoot)
    {
        List<Results> results = new List<Results>();
        foreach (var dir in Directory.EnumerateDirectories(DirectoryRoot))
        {
            var analysis1 = from names in Directory.EnumerateFiles(dir, "*.bmp")
                            let namePart = ExtractName(names)
                            let fiBmp = new FileInfo(dir + '\\' + namePart + ".bmp")
                            let fiJpeg = new FileInfo(dir + "\\" + namePart + ".jpg")
                            let fiTiff = new FileInfo(dir + "\\" + namePart + ".tiff")
                            let fiGif = new FileInfo(dir + "\\" + namePart + ".gif")
                            let fiWdp = new FileInfo(dir + "\\" + namePart + ".wdp")
                            select new Results(dir, namePart,
                                fiBmp.Length, fiJpeg.Length, fiTiff.Length, fiGif.Length, fiWdp.Length,
                                PickBestType(fiBmp.Length, fiJpeg.Length, fiTiff.Length, fiGif.Length, fiWdp.Length),
                                BestRatio(fiBmp.Length, fiJpeg.Length, fiTiff.Length, fiGif.Length, fiWdp.Length));
            Debug.WriteLine(analysis1.Count());
            results.AddRange(analysis1);
        }
        var bestTypes = (from result in results
                         group result by result.BestType into groups
                         select new {
                             Key = groups.Key,
                             Freq = groups.Count(),
                             Average = groups.Average(A => A.BestRatio)
                         }).OrderBy(A => A.Freq);
        Debug.WriteLine("The answer is {0} {1} times,  by {2}%",
            bestTypes.Select(A => A.Key).First(),
            bestTypes.Select(A => A.Freq).First(),
            bestTypes.Select(A => A.Average).First());
    }

    private double BestRatio(long bmp, long jpeg, long tiff, long gif, long wdp)
    {
        List<long> sizes = new List<long>() { bmp, jpeg, tiff, gif, wdp };
        long best = sizes.Min();
        long worst = sizes.Max();
        double percentageChange = ((1.0 * (worst - best)) / (1.0 * worst)) * 100.0;
        return percentageChange;
    }

    private string PickBestType(long bmp, long jpeg, long tiff, long gif, long wdp)
    {
        List<Tuple<long, string>> test = new List<Tuple<long, string>>()
            {
                Tuple.Create(bmp, "Bmp"), Tuple.Create(jpeg, "Jpeg"), Tuple.Create(tiff, "Tiff"),
                Tuple.Create(gif, "Gif"), Tuple.Create(wdp, "Wdp")
            };
        return test.Where(A => A.Item1 == test.Min(B => B.Item1)).Select(A => A.Item2).First();
    }

    private string ExtractName(string fileName)
    {
        int iStart = fileName.LastIndexOf('\\');
        int iDot = fileName.LastIndexOf('.');
        string res = fileName.Substring(iStart + 1, iDot - iStart - 1);
        return res;
    }
}

Conclusions

There is a lot one can do with LINQ. Frequently I find that things start to “look tacky”, and coming at the problem with a fresh approach can be very beneficial. There are two main points I wish to make here:

  • Use the let clause in LINQ statements. The judicious use of the let clause can make complex LINQ far more readable, and improve performance of LINQ statements (See my blog post on let: Craig’s Eclectic Blog » LINQ to XML: using let, yield return and SelectMany ).
  • The Tuple class is something which is invaluable. In those cases where you need to have a couple of properties together and use them like a class in a list (List<class x>), the use of a tuple could prove handy. The syntax List<Tuple<type1, type2>> is very convenient, compared to creating a class which has a very limited life expectancy (being use in one place, for one case).
Advertisements

, , , , ,

  1. Leave a comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: