Posts Tagged Languages

Pagination of XPS files into Graphics format files in A3 or A4 Pages


Introduction

This is a continuation of the previous post (Craig’s Eclectic Blog » Convert an XPS to JPEG, PNG, TIFF or BMP in A4 Pages ) in which I presented a solution to producing A3 of A4 pages from a large XPS file. There were a number of things in that solution which I was not entirely happy with, which this post addresses.

Things which were “not to my satisfaction”

There were a couple of problems I had with the solution presented. These included:

  • The Paper size occurred twice in the functions arguments. It was there once as a size, and once as a string for encoding into the file name. This looked like a classic case of a class which should have existed which was overlooked.
  • There were a number of places where the string for the encoding was passed into API class, or used as part of the file name generation. Again, this looked like a case of an enumeration which should have existed in the solution which was missing. One could also argue that the .Net Framework API’s should be using a similar enumeration.
  • There was a better way to handle the encoding of page parts. This solution is a “bit” more efficient.

Also there were a couple of things which should have been included in the previous post which I forgot to put in. These included:

  • The main needs to be [STAThreadAttribute]. The attribute (see the following on MSDN for an explanation of attributes in C#: Attributes (C# and Visual Basic) ) on the main STAThread is vital (the program will throw an exception without it). The main also give a clue as to why I cleaned up some of the memory management in the source code which this started from. I had over 500 XPS files to tile into workable, and all user consumable, files.
  • There was an optimisation of the tiling process which I should have included. This optimisation is to align the longest edge of the paper with the longest edge of the bitmap to be tiled. This, should and I’ve only my empirical “feel” for the subject which suggests it should be so, should minimise the number of tiles produces.

The Classes in the Final Solution

The  following diagram is the class call structure, generated by Visual Studio

XPS_to_Graphic_Structure

The following the class diagram for the final solution (final being loosely used term – in that it is final only until I thinks of another improvement, or a new requirement, for the solution).

ClassDiagram1

The Main Program

There are a couple of key point to note here:

  • The STAThread attribute is necessary. The underlying API’s used by the program require the STA (Single Threaded Apartment)Threading model.
  • The redesign of the interface into the pagination process has yielded a far more flexible API. The ability too specify multiple:
    • Encodings
    • Paper sizes
    • Output file types (one big one, multiple pages in one file and multiple encoded by paper size files)
    • The “skip done” provides a rudimentary restart facility.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;

namespace XpsConverter
{
    /// <summary>
    /// Program for the conversion of XPS files to graphics files in:
    ///     Multiple output formats
    ///     Multiple tilings of the image (A3 or A4 currently)
    /// </summary>
    class Program
    {
        /// <summary>
        /// Where the XPS files are found
        /// </summary>
        private static string XPS_Path =
            @"H:\Visual Studio 2010\Projects\ProduceDGML_from_XML\DGML_From_LINQ\XPS";
        /// <summary>
        /// Where the graphics files are written
        /// </summary>
        private static string JPG_Path =
            @"H:\Visual Studio 2010\Projects\Convert_XPS_to_BPM\JPG_Files\";

        /// <summary>
        /// Main for the execution of the program.
        /// 
        /// args">
        [STAThread]
        static void Main(string[] args)
        {
            List<PaperSize> papers = new List<PaperSize>()
            {
                new PaperSize(PaperSizes.A3),
                new PaperSize(PaperSizes.A4)
            };

            List<EncoderTypes> singleFileOuptut = new List<EncoderTypes>
            {
                EncoderTypes.gif, EncoderTypes.png, EncoderTypes.jpg,
                EncoderTypes.bmp, EncoderTypes.tiff, EncoderTypes.wdp
                //EncoderTypes.gif
            };
            var filesList = Directory.GetFiles(XPS_Path, "*.xps");
            foreach (string fileName in filesList)
            {
                //XPS_Outputs_Producer producer = new XPS_Outputs_Producer();
                XPS_Outputs_Producer.XPS_To_Pages(fileName, JPG_Path, false, true,
                    papers, singleFileOuptut, true,
                    EncoderTypes.tiff, papers);
            }
        }
    }
}

The Encoder Encapsulation

This is simply a very thin wrapper class around the Framework’s Encoder classes. Simply it allows the creation of a specific encoder on the basis of the Encoder Types enumeration. My API design philosophy would have the Framework working in this manner, through a factory class. One day, I may convert this into an extension method, but for now it does the job.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows.Media.Imaging;

namespace XpsConverter
{
    /// <summary>
    /// A very thing wrapper class over the native framework bit map
    /// encoding classes.
    /// </summary>
    public class EncoderEncapsulation
    {
        private EncoderTypes _encoderType;
        private BitmapEncoder _encoder;
        public EncoderEncapsulation(EncoderTypes requiredType)
        {
            this._encoderType = requiredType;
            this._encoder = null;
        }
        public BitmapEncoder Encoder
        {
            get
            {
                if (this._encoder != null)
                    return this._encoder;
                switch (this._encoderType)
                {
                    case EncoderTypes.png:
                        this._encoder = new PngBitmapEncoder();
                        return this._encoder;
                    case EncoderTypes.jpg:
                        this._encoder = new JpegBitmapEncoder();
                        return this._encoder;
                    case EncoderTypes.tiff:
                        this._encoder = new TiffBitmapEncoder();
                        return this._encoder;
                    case EncoderTypes.gif:
                        this._encoder = new GifBitmapEncoder();
                        return this._encoder;
                    case EncoderTypes.bmp:
                        this._encoder = new BmpBitmapEncoder();
                        return this._encoder;
                    case EncoderTypes.wdp:
                        this._encoder = new WmpBitmapEncoder();
                        return this._encoder;
                    default:
                        this._encoder = new BmpBitmapEncoder();
                        return this._encoder;
                }
            }
        }
        public EncoderTypes EncoderType
        {
            get
            {
                return this._encoderType;
            }
        }
    }
}

The File Namer Class

This is simply a file name generator class. It works with the pages in the XPS file and the tiles being generated.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace XpsConverter
{
    /// <summary>
    /// This class is just a file name generating object.
    /// There is a degree of structure in the input names,
    /// and the generated names embellish on that with extra metadata.
    /// </summary>
    public class FileNamer
    {
        private string _filename;

        private FileNamer()
        {
            this._filename = null;
        }

        public FileNamer(string FilePath, string fileNamePart,
            int pageNo, int iTotalPages, EncoderTypes FileType):this()
        {
            StringBuilder res = new StringBuilder(FilePath);
            if(FilePath[FilePath.Length -1] != '\\')
                res.Append('\\');
            int iPage = pageNo + 1;
            int iPageTotal = iTotalPages;
            res.AppendFormat("{0}_Page_{1}_of_{3}.{2}",
                fileNamePart, iPage,
                Enum.GetName(typeof(EncoderTypes), FileType), iPageTotal);
            this._filename = res.ToString();
        }

        public FileNamer(string RootOutputPath, string fileNamePart,
            int pageNum, int iTotalPages,
            int iHrozTiles, int iVertTiles, int i, int j,
            PaperSize paperSize, EncoderTypes encoderType) : this()
        {
            int iPart = ((i * (iVertTiles)) + j) + 1;
            int iTotal = (iHrozTiles) * (iVertTiles);
            int iPage = pageNum + 1;

            StringBuilder result = new StringBuilder(RootOutputPath);
            if (RootOutputPath[RootOutputPath.Length - 1] != '\\')
                result.Append('\\');
            result.AppendFormat("{0}_Page_{1}_of_{6}_Part_{2}_of_{3}_{5}.{4}",
                fileNamePart, iPage, iPart, iTotal,
                Enum.GetName(typeof(EncoderTypes), encoderType),
                Enum.GetName(typeof(PaperSizes), paperSize.PaperSizeEnum),
                iTotalPages);
            this._filename = result.ToString();
        }

        public    FileNamer(string rootOutputPath,
            string fileNamePart,EncoderTypes MultiPageEncoder,PaperSize paper)
        {
            StringBuilder result = new StringBuilder(rootOutputPath);
            if (rootOutputPath[rootOutputPath.Length - 1] != '\\')
                result.Append('\\');
            result.AppendFormat("{0}_{2}.{1}", fileNamePart,
                Enum.GetName(typeof(EncoderTypes), MultiPageEncoder),
                Enum.GetName(typeof(PaperSizes), paper.PaperSizeEnum));
            this._filename = result.ToString();
        }

        public string FileName
        {
            get
            {
                return this._filename;
            }
        }
    }
}

Bitmap Metadata Generation

The following is major function in my bitmap metadata generation method. There was a bit of “trial and exception” to determine which metadata elements are valid for each of the encoding types.

public BitmapMetadata MakeMetadata(EncoderTypes encoderType)
{
    string encoder = System.Enum.GetName(typeof(EncoderTypes), encoderType);
    if (encoderType == EncoderTypes.wdp || encoderType == EncoderTypes.gif || encoderType == EncoderTypes.bmp)
        return null;
    BitmapMetadata metadata = new BitmapMetadata(encoder);
    switch (encoderType)
    {
        case EncoderTypes.png:
            metadata.DateTaken = this.Taken;
            break;
        case EncoderTypes.jpg:
            metadata.ApplicationName = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
            metadata.Author = this.Author;
            metadata.Comment = this.Comment;
            metadata.Copyright = this.Copyright;
            metadata.DateTaken = this.Taken;
            metadata.Keywords = this.Keywords;
            metadata.Subject = this.Subject;
            metadata.Title = this.Title;
            break;
        case EncoderTypes.tiff:
            metadata.ApplicationName = System.Diagnostics.Process.GetCurrentProcess().MainModule.FileName;
            metadata.Author = this.Author;
            metadata.Comment = this.Comment;
            metadata.Copyright = this.Copyright;
            metadata.DateTaken = this.Taken;
            metadata.Keywords = this.Keywords;
            metadata.Subject = this.Subject;
            metadata.Title = this.Title;
            break;
        case EncoderTypes.gif:
            Debug.WriteLine("GIF files do not appear to support metadata. Should not get here");
            break;
        case EncoderTypes.bmp:
            Debug.WriteLine("BMP files do not appear to support metadata. Should not get here!");
            break;
        case EncoderTypes.wdp:
            Debug.WriteLine("WDP files do not appear to support metadata. Should not get here!");
            break;
        default:
            Debug.WriteLine("Should not get here");
            break;
    }
    return metadata;
}

Paper Size Encapsulation

The class, and enumeration, wrap the details of paper size into a usable class. Also, the class includes overrides of Equals and GetHashCode.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Windows;

namespace XpsConverter
{
    /// <summary>
    /// Enumeration which contains the paper types for which size information is kept
    /// </summary>
    public enum PaperSizes { A4, A3 }

    /// <summary>
    /// Encapsulation of the Paper size information.
    /// </summary>
    public class PaperSize
    {
        private Size _paperSize;
        private PaperSizes _paperSizeEnum;
        private PaperSize()
        {

        }

        public PaperSize(PaperSizes paperSize)
        {
            this._paperSizeEnum = paperSize;
            switch (paperSize)
            {
                case PaperSizes.A4:
                    this._paperSize = new Size(780, 1100);
                    break;
                case PaperSizes.A3:
                    this._paperSize = new Size(1560, 2200);
                    break;
                default:
                    this._paperSize = new Size(780, 1100);
                    break;
            }
        }
        public PaperSizes PaperSizeEnum
        {
            get
            {
                return this._paperSizeEnum;
            }
        }
        public Size CurrentSize
        {
            get
            {
                return this._paperSize;
            }
        }
        public int Height
        {
            get
            {
                return (int)this._paperSize.Height;
            }
        }
        public int Width
        {
            get
            {
                return (int) this._paperSize.Width;
            }
        }

        public override bool Equals(object obj)
        {
            if (obj == null)
                return false;
            PaperSize temp = obj as PaperSize;
            if (temp == null)
                return false;
            if (temp._paperSizeEnum == this._paperSizeEnum)
                return true;
            return false;
        }

        public override int GetHashCode()
        {
            return this._paperSizeEnum.GetHashCode();
        }
    }
}

The Pagination Process

This is where most of the work happens.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.IO;
using System.Windows.Xps.Packaging;
using System.Windows.Documents;
using System.Windows.Media.Imaging;
using System.Diagnostics;
using System.Windows;

namespace XpsConverter
{
    public static class XPS_Outputs_Producer
    {
        /// <summary>
        /// This is the main entry point in to the process of converting an
        /// XPS file into graphics files
        /// </summary>
        /// <param name="fileName">The name of the file to be processed.
        /// It should include all of the path information to find the file
        /// RootOutputPath">The path to the output root
        /// SkipDone">Skip files which have already been created
        /// OneBigOne">Generate a graphics file which is not tiled into parts</param>
        /// <param name="papers">The list of paper sizes to tile the XPS file into
        /// singleFileOuptut">The list of encodings which the XPS file
        /// is to be converted into
        /// multiplePageOutput">The flag used to indicate that one file
        /// with multiple pages (frames in graphics file terms) is to be created
        /// MultiPageEncoder">The list of encodings which are to be used
        /// for the multiple page output
        /// multiplePapers">The list of papers which the XPS is to be tiled into.</param>
        internal static void XPS_To_Pages(string fileName, string RootOutputPath,
            bool SkipDone ,
            bool OneBigOne,
            List<PaperSize> papers,
            List<EncoderTypes> singleFileOuptut,
            bool multiplePageOutput ,
            EncoderTypes MultiPageEncoder,
            List<PaperSize> multiplePapers)
        {
            string fileNamePart = Path.GetFileNameWithoutExtension(fileName);
            FileNameDecoder decoder = new FileNameDecoder(fileNamePart);
            string DirectoryPart = decoder.MakeDirectoryPartsFromName(RootOutputPath);
            RootOutputPath = DirectoryPart;
            FileNameDecoder.copyXPSFile(fileName, RootOutputPath, fileNamePart);
            if (SkipDone &&
                AlreadyDone(new FileNamer(RootOutputPath, fileNamePart, 0, 0, singleFileOuptut[0])))
                return;

            MultiplePageFilePaperAndEncoder multiPageOutput = null;
            if(multiplePageOutput)
            {
                multiPageOutput = new MultiplePageFilePaperAndEncoder();
                foreach(PaperSize paper in multiplePapers)
                {
                    multiPageOutput.Add(new EncoderEncapsulation(MultiPageEncoder),  paper,
                        new FileNamer(RootOutputPath, fileNamePart, MultiPageEncoder, paper));
                }
            }
            XpsDocument xpsDoc = null;
            try
            {
                xpsDoc = ProcessXPSDocument(fileName, RootOutputPath, OneBigOne, papers,
                    singleFileOuptut, multiplePageOutput, fileNamePart, decoder,
                    multiPageOutput, xpsDoc);
                xpsDoc = null;
            }
            finally
            {
                if (xpsDoc != null)
                    xpsDoc.Close();
            }
        }

        /// <summary>
        /// This method processes the XPS document and extracts the pages of that document.
        /// 
        /// 
        /// RootOutputPath">
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        private static XpsDocument ProcessXPSDocument(string fileName, string RootOutputPath,
            bool OneBigOne, List<PaperSize> papers, List<EncoderTypes> singleFileOuptut,
            bool multiplePageOutput, string fileNamePart, FileNameDecoder decoder,
            MultiplePageFilePaperAndEncoder multiPageOutput, XpsDocument xpsDoc)
        {
            using (xpsDoc = new XpsDocument(fileName, System.IO.FileAccess.Read))
            {
                FixedDocumentSequence docSeq = xpsDoc.GetFixedDocumentSequence();
                int iTotalPages = docSeq.DocumentPaginator.PageCount;
                for (int pageNum = 0; pageNum < iTotalPages; ++pageNum)
                {
                    DocumentPage docPage = null;
                    try
                    {
                        docPage = ProcessBitMapImage(RootOutputPath, OneBigOne,
                            papers, singleFileOuptut, fileNamePart, decoder,
                            multiPageOutput, docSeq, iTotalPages, pageNum, docPage);
                        docPage = null;
                    }
                    finally
                    {
                        if (docPage != null)
                            docPage.Dispose();
                    }
                }
                if (multiplePageOutput)
                {
                    GenerateMultpliePageOutputs(multiPageOutput);
                }
            }
            return xpsDoc;
        }

        /// <summary>
        /// This method renders the page into a BitMap and then calls the required
        /// pagination routines.
        /// 
        /// RootOutputPath">
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        private static DocumentPage ProcessBitMapImage(string RootOutputPath, bool OneBigOne,
            List<PaperSize> papers, List<EncoderTypes> singleFileOuptut, string fileNamePart,
            FileNameDecoder decoder, MultiplePageFilePaperAndEncoder multiPageOutput,
            FixedDocumentSequence docSeq, int iTotalPages, int pageNum, DocumentPage docPage)
        {
            using (docPage = docSeq.DocumentPaginator.GetPage(pageNum))
            {
                RenderTargetBitmap renderTarget =
                    new RenderTargetBitmap((int)docPage.Size.Width,
                                            (int)docPage.Size.Height,
                                            96, // WPF (Avalon) units are 96dpi based
                                            96,
                                            System.Windows.Media.PixelFormats.Default);
                renderTarget.Render(docPage.Visual);
                if (OneBigOne)
                {
                    MakeOutputFile(RootOutputPath, fileNamePart, pageNum, iTotalPages, singleFileOuptut, renderTarget, decoder);
                }
                MakeOutputFiles(RootOutputPath, fileNamePart, pageNum, iTotalPages, singleFileOuptut, papers, renderTarget, decoder, multiPageOutput);
            }
            return docPage;
        }

        /// <summary>
        /// Produces the output files for each of the multiple page outputs
        /// 
        /// MultiplePageEncoders">
        private static void GenerateMultpliePageOutputs(
            MultiplePageFilePaperAndEncoder MultiplePageEncoders)
        {
            foreach(Tuple<EncoderEncapsulation, FileNamer> output in MultiplePageEncoders.GetOutputDetails())
            {
                ProcudeOutputFile(output);
            }
        }

        /// <summary>
        /// Does the writing of the encoded bitmap to output file.
        /// </summary>
        /// <param name="output"></param>
        private static void ProcudeOutputFile(Tuple<EncoderEncapsulation, FileNamer> output)
        {
            FileStream pageOutStream = null;
            using (pageOutStream = new FileStream(output.Item2.FileName, FileMode.Create, FileAccess.Write))
            {
                try
                {
                    output.Item1.Encoder.Save(pageOutStream);
                    pageOutStream.Close();
                    pageOutStream = null;
                }
                catch (AccessViolationException ex)
                {
                    Debug.WriteLine(ex);
                    Debug.WriteLine(
                        "{0} File {1} is broken trying next", DateTime.Now.ToLongTimeString(), output.Item2.FileName);
                    pageOutStream.Close();
                    File.Delete(output.Item2.FileName);
                }
                finally
                {
                    if (pageOutStream != null)
                    {
                        pageOutStream.Dispose();
                    }
                }
            }
        }

        /// <summary>
        /// Crops the Image to each of the Paper sized tiles, and then produces the output file.
        /// 
        /// 
        /// fileNamePart">
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        private static void MakeOutputFiles(string RootOutputPath,
            string fileNamePart,
            int pageNum, int iTotalPages,
            List<EncoderTypes> singleFileOuptut,
            List<PaperSize> papers,
            RenderTargetBitmap renderTarget,
            FileNameDecoder decoder,
            MultiplePageFilePaperAndEncoder MultiEncoders)
        {
            foreach (PaperSize PaperSize in papers)
            {
                int iHorzTiles;
                int iVertTiles;
                Size Paper = CalculateTiles(renderTarget, PaperSize.CurrentSize, out iHorzTiles, out iVertTiles);
                for (int i = 0; i < iHorzTiles; i++)
                {
                    for (int j = 0; j < iVertTiles; j++)
                    {
                        Int32Rect crop = CalculateCropRectangle(renderTarget, Paper, i, j);
                        if (crop.X == renderTarget.Width || crop.Y == renderTarget.Height)
                            continue;
                        CroppedBitmap croppedBitmap = new CroppedBitmap(renderTarget, crop);
                        AddToMultiPageEncoders(MultiEncoders, croppedBitmap, decoder, iHorzTiles, iVertTiles, i, j, PaperSize, pageNum, iTotalPages);
                        foreach (EncoderTypes encoderType in singleFileOuptut)
                        {
                            FileNamer namer = new FileNamer(
                                RootOutputPath, fileNamePart, pageNum, iTotalPages,
                                iHorzTiles, iVertTiles, i, j,
                                PaperSize, encoderType);
                            EncoderEncapsulation encoder = new EncoderEncapsulation(encoderType);
                            BitmapEncoder enc = encoder.Encoder;
                            enc.Frames.Add(BitmapFrame.Create(croppedBitmap, null,
                                decoder.MakeMetadata(encoderType, pageNum,  iTotalPages, i,j, iHorzTiles, iVertTiles), null));
                            ProduceOutputFile(namer, enc);
                        }
                    }
                }
            }
        }

        /// <summary>
        /// Adds the copped bitmap to the correct paper sizes multiple page output encoder
        /// 
        /// 
        /// croppedBitmap">
        /// 
        /// 
        /// 
        /// 
        /// 
        /// 
        private static void AddToMultiPageEncoders(
            MultiplePageFilePaperAndEncoder MultiplePageEncoders,
            CroppedBitmap croppedBitmap,
            FileNameDecoder Decoder,
            int iHorzTiles, int iVertTiles,
            int i, int j, PaperSize paper,
            int iPage, int iPages)
        {
            if (MultiplePageEncoders == null)
                return;
            BitmapEncoder encoder = MultiplePageEncoders.getEncoder(paper);
            encoder.Frames.Add(BitmapFrame.Create(croppedBitmap, null,
                Decoder.MakeMetadata(MultiplePageEncoders.getEncoderType(paper), iPage, iPages, i, j, iHorzTiles, iVertTiles),
                null));
        }

        private static Int32Rect CalculateCropRectangle(RenderTargetBitmap renderTarget, Size Paper, int i, int j)
        {
            Int32Rect crop = new Int32Rect((int)Math.Min(i * Paper.Width, renderTarget.Width),
                  (int)Math.Min(j * Paper.Height, renderTarget.Height),
                  (int)Math.Min(Paper.Width, renderTarget.Width - ((i) * Paper.Width)),
                  (int)Math.Min(Paper.Height, renderTarget.Height - (j * Paper.Height)));
            return crop;
        }

        /// <summary>
        /// This routine tries and optimises the way the tiles are carved off the base image.
        /// It is more efficient (in terms of number of tiles required) to carve the image
        /// up with the tile longest edge, matching the image longest edge.
        /// (An unproven assertions - the math is probably quite long winded to prove).
        /// </summary>
        /// <param name="renderTarget">The bitmap which will have tiles produced</param>
        /// <param name="Paper">The size of the tile to create
        /// iHrozTiles">The number of horizontal tiles
        /// iVertTiles">The number of vertical tiles</param>
        /// <returns></returns>
        private static Size CalculateTiles(RenderTargetBitmap renderTarget, Size Paper, out int iHrozTiles, out int iVertTiles)
        {
            if (renderTarget.Height > renderTarget.Width)
            {
                iHrozTiles = ((int)(renderTarget.Width / Paper.Width)) + 1;
                iVertTiles = ((int)(renderTarget.Height / Paper.Height)) + 1;
            }
            else
            {
                Size switched = new Size(Paper.Height, Paper.Width);
                iHrozTiles = ((int)(renderTarget.Width / Paper.Width)) + 1;
                iVertTiles = ((int)(renderTarget.Height / Paper.Height)) + 1;
            }
            return Paper;
        }

        /// <summary>
        /// Encodes the full size (same as the original) bit map to required encoding.
        /// </summary>
        /// <param name="RootOutputPath"></param>
        /// <param name="fileNamePart"></param>
        /// <param name="pageNum"></param>
        /// <param name="iTotalPages"></param>
        /// <param name="singleFileOuptut"></param>
        /// <param name="renderTarget"></param>
        /// <param name="decoder"></param>
        private static void MakeOutputFile(string RootOutputPath, string fileNamePart, int pageNum, int iTotalPages,
            List<EncoderTypes> singleFileOuptut,
            RenderTargetBitmap renderTarget,
            FileNameDecoder decoder)
        {
            foreach (EncoderTypes encoding in singleFileOuptut)
            {
                FileNamer namer = new FileNamer(RootOutputPath, fileNamePart, pageNum, iTotalPages, encoding);
                try
                {
                    EncoderEncapsulation encoderType = new EncoderEncapsulation(encoding);
                    BitmapEncoder encoder = encoderType.Encoder;
                    //BitmapFrame framed = BitmapFrame.Create(renderTarget);
                    encoder.Frames.Add(BitmapFrame.Create(renderTarget, null,
                        decoder.MakeMetadata(encoding), null));
                    ProduceOutputFile(namer, encoder);

                }
                catch (AccessViolationException ex)
                {
                    Debug.WriteLine(ex);
                }
            }
        }

        /// <summary>
        /// Does the writing of the encoded bitmap to the output file.
        /// 
        /// namer">
        /// 
        private static void ProduceOutputFile(FileNamer namer, BitmapEncoder encoder)
        {
            FileStream pageOutStream = null;
            try
            {
                using (pageOutStream = new FileStream(namer.FileName, FileMode.Create, FileAccess.Write))
                {
                    encoder.Save(pageOutStream);
                    pageOutStream.Close();
                    pageOutStream = null;
                }
            }
            finally
            {
                if (pageOutStream != null)
                    pageOutStream.Dispose();
            }
        }

        /// <summary>
        /// Checks to see if the file exists
        /// </summary>
        /// <param name="namer"></param>
        /// <returns></returns>
        private static bool AlreadyDone(FileNamer namer)
        {
            return File.Exists(namer.FileName);
        }

    }
}
Advertisements

, , , , , , , ,

Leave a comment

LINQ Performance Tuning: Using the LookUp Class


Introduction

This blog posts I hope to share with you the benefits of using the Lookup Class and the IEnumerable.ToLookup method for creating the Lookup Object.

The benefits of using this .Net Framework object can be a significant reduction in the time  taken to execute LINQ statements. By significant I reduced the elapse time for execution of a heavily LINQ to Objects and LINQ to XML program from 15+ minutes, to seconds.

If inspiration grabs me I may include some benchmarking test in this post, or in a subsequent posts. I’m always a bit cautious of creating artificial test cases for things like this. The artificial test case can be very misleading, as they really demonstrate the technique in the best light. The real test of the of any technique is not in a “test tube”, but in how it helps resolve real programming and performance (in this case) problems.

The Lookup Class – Overview

This is a very interesting animal in the .Net Framework. It has the following interesting features:

  • It is only created through the application of the Factory Method, which is attached to IEnumerable.ToLookup as the LINQ extension method. Classes_Diag
  • There is no public constructor for the class.
  • The Lookup Class is somewhere in between the Dictionary and List. In that it possess properties of both class, as a mixture. These properties include:
    • Like a Dictionary it is supports keyed access to the data. To use an analogy with SQL the way SQL behaves the Lookup is like an indexed table.
    • Like a List it supports storing as many of a “type” in it as possible (available memory or the CPU architecture which the process is running under being the limiting factor).
    • Unlike either List<T> or Dictionary<T>, the one key can have multiple objects stored under it. So, it is a very like a Dictionary<T1, List<T2>> .  The implementation uses an IGrouping<T2>, but the analogy holds true.

The Lookup Class – The Details

Internal StorageLookUpStructure

The Lookup class is an interesting storage mechanism. The key features are:

  • The storage is by the key, which is the important aspect which is being leveraged when improving the performance of LINQ operations.
  • Unlike the Dictionary Class, the Lookup Class stores more than one object against the key.
  • Like the List Class the order of the set of objects stored against the key, are not in an ordered storage.
  • The opposite diagram is one way to visualise the way the Lookup Class stores the data. The important points to note are:
    • The Keys must be of the same type.
    • The Objects stored must be of the same type.
    • The order of the objects within a key is undetermined. This is vey much the same as SQL tables, where the order of the rows is not guaranteed, unless you use an Order By clause (which imposes an order).
    • There can be any number of objects stored under each key.

Creating An Instance of the Lookup Class

There is not too much in the creation of a Lookup Object, apart from the caveats which are:

  • It can only be created as the output of a LINQ operation.
  • It is an invariant object structure. The Lookup object is effectively “read only” there are no ways to mutate (add or remove elements or the key set) the content.

The following example C# code shows two of the ways of creating a Lookup Object.

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var example2 = (from element in seed
                select new { key = element.Attribute("Test1").Value, value = element })
                    .Union(from element in seed
                           select new { key = element.Attribute("Test2").Value, value = element })
                    .ToLookup(A => A.key, A => A.value);

Example1 will result in a lookup indexed by the XName, containing a set of IGrouping of XElement .

Example2 will result in a lookup indexed by the the string value from the Attributes “Test1” and “Test2”, with the same XElement potentially falling under the different key value.

There are variants of the ToLookup method which take an IEqualityComparer for the type of the Key. These I must admit I’ve not needed use. My use cases for the Lookup Object has not included object types in the Key other than string, and the framework provided equality comparer for that has suited me fine (this far).

Limitations Of The Lookup Class

There are number of limitations which the Lookup Class comes with. Many of these limitations have been mentioned in this blog post already, but I’ll put a list of them in here (just for completeness). The imitations include (and these are the ones I’ve found, or found written about):

  • No public constructor. The only way to create a Lookup Object is to use the ToLookup factory method on the IEnumerable interface. Or, if you prefer, the Lookup Object can only be created as the output from LINQ operations.
  • There are no mutators for the Lookup Object. As an output from LINQ, this output sequence is effectively read only. There are no Add, or Remove, methods available for the Lookup object. If you need a different set of content, you will need to generate that set through LINQ, and create another Lookup Object.
  • The IGrouping structure of the storage which the Lookup Class is a bit of a thing to deal with. There are a couple of way to unravel this structure (I’ve found two thus far, but that’s not to say this is an exhaustive list):
      List<XElement> seed = new List<XElement>();
      var example1 = (from element in seed
                      select element)
                     .ToLookup(A => A.Name);
      // One way to unwrap the IGrouping
      var unwrap1 = example1.SelectMany(A=>A);
      // Another way to unwrap the IGrouping
      var unwarp2 = from step1 in example1
                    from unwrapped in step1
                    select unwrapped;
    • Unwrap1 uses the SelectMany method ( see:  LINQ SelectMany and IGrouping for some more on the use of SelectMany ).  The “A=>A” is required for the method, and simply says flatten the multiple input sequences (one for each key value), into one output sequence.
    • Using a nested from clause in the LINQ syntax. If you’ve done anything with LINQ to XML you would be familiar with using intermediate sequences in LINQ syntax.

Using An Instance Of The Lookup Class In LINQ

This is where the “rubber hits the road”, or more to the point where big performance improvements can be made in the execution of LINQ statements.  As with anything performance related please test these techniques in the context of your application. The following is what I have observed in the context of my application. I’ve been using equal measures of the Stopwatch Class and the Visual Studio Profiler to measure the impacts of my application of the Lookup Class.

The following is the fastest way in LINQ to work with the Lookup object (the best way I’ve found thus far).

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var dict1 = new Dictionary<XName, XElement>();
var join_sample = from lut in example1
                  join dict in dict1 on lut.Key equals dict.Key
                  select lut;

The LINQ join clause seems (this is by observation of the impact in the Visual Studio Profiler) to resolve in the indexes of the Dictionary and Lookup objects. The analogy with the way a SQL execution plan will resolve in the indexes of two tables which are being joined.

Conclusion

The performance boost that LINQ operations can gain through the judicious application of the Lookup Object, makes learning to master the use of the object well worthwhile (if you want fast code).

The DGML generating and pagination process I’m currently building has benefitted significantly from the judicious use of these objects. I’ve taken the process of generating multiple files of DGML from minutes (in the 10 to 15 minutes mark) to seconds (20 to 30 seconds).

, , , , , , , , , , , , , ,

Leave a comment

LINQ to SQL Use static CompiledQuery.Compile and static DataContext for BIG Performance Improvements


Introduction

This blog post is describes some of the steps I took to boost the performance of LINQ to SQL an applications (very significantly).

This “train of thought” was prompted by a blog post here on WordPress “LINQ To SQL Very Slow Performance Without Compile (CompileQuery)”. Which “got me thinking” about the data manipulation program I’ve been developing which transforms unstructured text (following a sort of pattern) into structured data, which I’m rendering using DGML (see: “Introduction to Directed Graph Markup Language ( DGML )” for a brief introduction to DGML). I probably post some of the details for generating DGML graphs on this blog (later). This program is using LINQ to SQL to do a bunch of lookups a various points. It had become a bit of a slow process, so was a perfect candidate for a “bit of a tune up” performance wise, and for the applications of some CompiledQuery.Compile enhancements.

The Implementation

There were a couple of steps in the implementation:

  • Move the declaration, and initialisation of the LINQ to SQL DataContext object (which is derived from the System.Data.Linq.DataContext Class ) from the method to a private static field on the class.
    private class DataElementsProcessor
    {
        private static DataElementsDataContext DEctx = new DataElementsDataContext();
  • Next create the declaration of the compiled query. For those reading this who are not familiar with parameterised classes and lambda expressions, I’ll try and explain what is happening in the declaration following the code snippets.
    • What is being replaced (the original LINQ to SQL query statement):
      var found = from de in DEctx.Data_Elements
                  where de.DE_Name == name
                  select de;
    • The replacement declaration:
    • private static Func<DataElementsDataContext, string, IEnumerable>
          GetDeByName = CompiledQuery.Compile((DataElementsDataContext DEctx, string name) =>
                          from de in DEctx.Data_Elements
                          where de.DE_Name == name
                          select de);
    • To explain the declaration:
      • private static: private because only this class should use this, static because I only want one of these available to all instances of the class.
      • Func> : This declares the type which the compiled LINQ to SQL Expression is. It encapusulates a delegate or lambda expression which takes the first two types as parameters and returns the sequence of DataELement objects.
      • GetDeByName: is the name of the delegate being created.
      • CompiledQuery.Compile: the method on which compiles the LINQ to SQL query (which takes the types identified in the Func< part of the declaration) and returned the compiled delegate.
      • (DataElementsDataContext DeCtx, string name) => : Identifies the types and names of the arguments to lambda expression.
      • from .. : the LINQ to SQL query which will be compiled.

Calling the Compiled Version

This is where it gets “tricky” (I’ll cover the “tricky” bits after the simple cases). The simple use cases are straight replacements:

  • Replace the query.
  • //var found = from de in DEctx.Data_Elements
    //            where de.DE_Name == name
    //            select de;
    var found = GetDeByName(DEctx, name);
  • Replace a bit more:
    • Which I think “works” a bit better. There is now no need for the variable “found” in the segment of code, and just testing the Any() on the sequence is all I wanted to do anyway.
  • //var found = GetDeByName(DEctx, name);
    // if(!found.Any())
    if (!GetDeByName(DEctx, name).Any())
    {
    
    }
  • The “tricky” bits:
    • The Sequence you get at runtime is a “iterate once only” sequence. So, the following code will throw an “System.InvalidOperationException” in System.Data.Linq.dll, with the message “<Message>The query results cannot be enumerated more than once.</Message>. In my code the situations where I’d do things which caused this type of exception was like following:
    • if (!found.Any())
      {
      }
      else
      {
          Debug.WriteLine("Rejected update already existed");
          // need to add this when I added changed to a compiled qu
          //found = GetDEByName(DECtx, DE);
          found.ToList().ForEach(A => Debug.WriteLine(String.Format("{0} {1}", A.DE_Name, A.DE_Classification)));
      }
    • This would break with the exception. The “remedy” I’ve applied has been to use a “ToList()“ to pull the results into a collection in one iteration, and then have the flexibility to iterate over the collection as much as I like. This approach cost a bit of memory, but these have been single row result sets, so there is not too much of a penalty.
      • There are probably other, more elegant, ways to “solve” this problem. But, as part of a quick upgrade to get a BIG performance improvement, they would have been more expensive.
    • // Need to pull the result set into a local list
      var found = GetDEByName(DECtx, DE).ToList();
      if (!found.Any())
      {
      }

The Results

On the program which I was working, the following table is the before, and after results.

Run 1 Run 2 Run 3 Run 4
Base Line No Compiles 114451 115143
All Queries Compile and Static Data Contexts 56906 58120 57177 56515
Percentage Improvement (start – end)/start 50.19077

This is a significant improvement in performance for what amounted to “not too much work”. The hard part was getting “my head around” the Func<> template type declaration. The numbers are from the Stopwatch Class and are ElapsedMilliseconds. The stopwatch object is the only way (superior to all other methods) to get elapsed times for executing objects in .Net.

Conclusion:

  • A vote of thanks to author of “Er. alokpandey’s Blog” for writing the article: “LINQ To SQL Very Slow Performance Without Compile (CompileQuery)”, without which I would not been prompted to implement this way of improving the performance of the program (an many more in the future I expect).
  • The conversion is not painless, if you do not understand what the Func<> is doing. Once you see what that piece of syntax is doing, then things start to fall into place.
  • I did not expect such a big gain in performance from the implementation of this change and I achieved. But, I’d not probed the code to discover where the time was being expended.
  • Apologies if you do not like the code snippet approach this post has taken. Most of the time I try and post code segments which can be used without too much modification. This time the topic is really about statements (declarations of things in the code and the syntax), rather being about “a way of doing something” (which a method can illustrate). So, I’ve used snippets of code to demonstrate the syntax used.
  • Again I yearn for the flexibility which Windows Live Spaces gave, where I could put file of code up as well as a blog post. Something which WordPress does not offer, so code snippets ibid is the solution which I’ve adopted.

, , , , , , , , , , , ,

1 Comment

%d bloggers like this: