Archive for November, 2010

Convert an XPS to JPEG, PNG, TIFF or BMP in A4 Pages


Introduction

This tale begins with DGML graphs in Visual Studio. If you want to share a DGML graph with others you have only a couple of alternatives:

  1. Save the Graph as an XPS (Microsoft XML Paper Specification: see Wikipedia on Open XML Paper Specification or the Microsoft XML Paper Specification) file, or
  2. Share the DGML Graph (XML content files, See How to: Edit and Customize Graph Documents or the XML Schema for DGML ).

Both of these options have pit falls waiting for the unwary.

  1. For the XPS path your pit falls include:
    • A dependence on the XPS Viewer (see: What is the XPS Viewer? ). Which is fine if you have the viewer installed. But if you don’t have the XPS Viewer installed or don’t have the privileges to install software on the machine, you’re snookered. XPS is a file format which potentially be a format which your clients cannot read.
    • If the graph is larger than the paper you printer accepts, you probably cannot print the diagram. You are now dependent on the printer having a “poster print” (tiles the large output image onto multiple pages) feature. This again may not be an easily resolved problem.
  2. For the DGML file path your pit falls include:
    • A dependence on Visual Studio. If you don’t have a version of Visual Studio, again you’re snookered. I’ve not looked to see if the Visual Studio Express (which is free – check the Microsoft Licencing Terms and Conditions to see if you can use this path) versions support DGML viewing. Again you’re relying on the clients being able to download and install a copy of Visual Studio, which may not be an option in many work environments.

In my case some of my clients are internal to the organisation, and some are external. In both cases I cannot be guaranteed of either set of clients be able to read DGML or XPS file. So, what alternatives are available? The only path available which will guarantee that the clients will be able to read the files, is to use a graphics file format (TIFF, JPEG, PNG or BMP), to and “chop up” the big images into A4 chunks.

Other Options:

I did looks for some other options. But, the Microsoft Office suite does not seem to like to play with XPS files. There were a couple of option I did try:

  • Using Work to read an XPS. I was hoping that the XPS would come in as an image which I could imbed into a Word document. I did try and suck an XPS files into Word (2007 and 2010 version), but they reported that the XPS file was illegal (and a Microsoft product wrote it!).
  • I did try Visio to read an XPS file. Again I was hoping that it would load the XPS as an image. But, Visio does not seem to have any idea about XPS files. File Open, Insert an Image, both do not accept XPS are a format to be processed.

Converting the XPS to a graphics File Format

My quest for a solution started here (How to convert xps documents to other formats, for example bmp ?) The core code from the solution in MSDN is below.

static public void SaveXpsPageToBitmap(string xpsFileName)
{
    XpsDocument xpsDoc = new XpsDocument(xpsFileName, System.IO.FileAccess.Read);
    FixedDocumentSequence docSeq = xpsDoc.GetFixedDocumentSequence();

    // You can get the total page count from docSeq.PageCount
    for (int pageNum = 0; pageNum < docSeq.DocumentPaginator.PageCount; ++pageNum)
    {
        DocumentPage docPage = docSeq.DocumentPaginator.GetPage(pageNum);
        BitmapImage bitmap = new BitmapImage();
        RenderTargetBitmap renderTarget =
            new RenderTargetBitmap((int)docPage.Size.Width,
                                    (int)docPage.Size.Height,
                                    96, // WPF (Avalon) units are 96dpi based
                                    96,
                                    System.Windows.Media.PixelFormats.Default);

        renderTarget.Render(docPage.Visual);

        BitmapEncoder encoder = new BmpBitmapEncoder();  // Choose type here ie: JpegBitmapEncoder, etc
        encoder.Frames.Add(BitmapFrame.Create(renderTarget));

        FileStream pageOutStream =
            new FileStream(xpsFileName + ".Page" + pageNum + ".bmp", FileMode.Create, FileAccess.Write);
        encoder.Save(pageOutStream);
        pageOutStream.Close();
    }
}

There are a number of things which one should note about the code.

  • There are a number of resources which should be “Disposed” which are not.
  • If you want to deal with large XPS images, then you will need to compile it as x64. If you run the code with out building it as x64, you may get an “” exception. VS_References_For_XPS
  • You will need the following References for the project to compile.
  • You will also need the following using statements.
    using System.Windows.Xps.Packaging;
    using System.Windows.Documents;
    using System.Windows.Media.Imaging;
    using System.IO;

Cleaning up the Code

The following is the shell of a solution which has some of the memory management, and file handling cleaned up.

For those who do not know, the using statement is making sure that the object is cleaned up correctly. the following is from:

The using statement allows the programmer to specify when objects that use resources should release them. The object provided to the using statement must implement the IDisposable interface. This interface provides the Dispose method, which should release the object’s resources.

MSDN: using Statement (C# Reference)

The resulting code is:

    static public void SaveXpsPageToBitmap(string xpsFileName)
    {
        using (XpsDocument xpsDoc = new XpsDocument(xpsFileName, System.IO.FileAccess.Read))
        {
            FixedDocumentSequence docSeq = xpsDoc.GetFixedDocumentSequence();
            // You can get the total page count from docSeq.PageCount
            for (int pageNum = 0; pageNum < docSeq.DocumentPaginator.PageCount; ++pageNum)
            {
                using (DocumentPage docPage = docSeq.DocumentPaginator.GetPage(pageNum))
                {
                    BitmapImage bitmap = new BitmapImage();
                    RenderTargetBitmap renderTarget =
                        new RenderTargetBitmap((int)docPage.Size.Width,
                                                (int)docPage.Size.Height,
                                                96, // WPF (Avalon) units are 96dpi based
                                                96,
                                                System.Windows.Media.PixelFormats.Default);
                    renderTarget.Render(docPage.Visual);
                    BitmapEncoder encoder = new BmpBitmapEncoder();  // Choose type here ie: JpegBitmapEncoder, etc
                    encoder.Frames.Add(BitmapFrame.Create(renderTarget));
                    using (FileStream pageOutStream =
                        new FileStream(xpsFileName + ".Page" + pageNum + ".bmp", FileMode.Create, FileAccess.Write))
                    {
                        encoder.Save(pageOutStream);
                        pageOutStream.Close();
                    }
                }
            }
            xpsDoc.Close();
        }
    }

A Minor Digression: Encoders and Options Supported in File Formats

The core of the above solution is the BitmapEncoder classes, and the classes derived from it. The encoders available are (straight from MSDN):

System.Windows.Media.Imaging.BmpBitmapEncoder
System.Windows.Media.Imaging.GifBitmapEncoder
System.Windows.Media.Imaging.JpegBitmapEncoder
System.Windows.Media.Imaging.PngBitmapEncoder
System.Windows.Media.Imaging.TiffBitmapEncoder
System.Windows.Media.Imaging.WmpBitmapEncoder

Of the different encodings, and various graphics file formats, I was interested in the following capabilities:

Format Class Metadata Multiple Frames (multiple pages in one file)
BPM System.Windows.Media.Imaging.BmpBitmapEncoder No No
GIF System.Windows.Media.Imaging.GifBitmapEncoder No Yes
JPEG System.Windows.Media.Imaging.JpegBitmapEncoder Frame Level not Global No
PNG System.Windows.Media.Imaging.PngBitmapEncoder Frame Level not Global No
TIFF System.Windows.Media.Imaging.TiffBitmapEncoder Frame Level not Global Yes
WMP System.Windows.Media.Imaging.WmpBitmapEncoder (not clear) No

Pagination, Tiling, or Cropping to a Paper Size (or any size you like)

The following are the core routines of my solution.  This one does the writing of the separate tile files.

/// <summary>
/// Produces a series of graphics files in the format specified from the
/// input XPS file.
/// </summary>
/// <param name="JPG_Path">Used to generate the output file name
/// fileNamePart">The filename without directory or extensions,
/// which is used to generate the output file
/// pageNum">Used to generate the output file name
/// renderTarget">The bit map representation of the page to
/// be tiled into multiple parts</param>
/// <param name="Paper">The size of the tiles (in pixels - image is 96 dpi)
/// of the image that will be created.
/// PaperType">A string which describes the tile size (e.g. A3).
/// Used in creation of the output file name</param>
/// <param name="extension">The type of graphics file created.
/// Used to determine the encoder used, and to create the metadata for the file.</param>
private void GeneratePagesAsFiles(string JPG_Path, string fileNamePart,
    int pageNum, RenderTargetBitmap renderTarget,
    Size Paper, string PaperType, string extension)
{
    int iHrozTiles = ((int)(renderTarget.Width / Paper.Width)) + 1;
    int iVertTiles = ((int)(renderTarget.Height / Paper.Height)) + 1;
    BitmapEncoder encoder;
    for (int i = 0; i < iHrozTiles; i++)
    {
        for (int j = 0; j < iVertTiles; j++)
        {
            string outputFileName = MakeFileName(JPG_Path, fileNamePart,
                i, j, iHrozTiles, iVertTiles, "." + extension, PaperType);
            if (File.Exists(outputFileName))
                continue;
            Int32Rect crop = new Int32Rect(
                (int)Math.Min(i * Paper.Width, renderTarget.Width),
                (int)Math.Min(j * Paper.Height, renderTarget.Height),
                (int)Math.Min(Paper.Width, renderTarget.Width - ((i) * Paper.Width)),
                (int)Math.Min(Paper.Height, renderTarget.Height - (j * Paper.Height)));
            if (crop.X == renderTarget.Width || crop.Y == renderTarget.Height)
                continue;
            CroppedBitmap cb1 = new CroppedBitmap(renderTarget, crop);
            BitmapMetadata metadata =
                MakeMetadata(extension, fileNamePart, i, j, iHrozTiles, iVertTiles);
            switch (extension)
            {
                case "png": encoder = new PngBitmapEncoder(); break;
                case "jpg": encoder = new JpegBitmapEncoder(); break;
                case "tiff": encoder = new TiffBitmapEncoder(); break;
                case "gif": encoder = new GifBitmapEncoder(); break;
                case "bmp": encoder = new BmpBitmapEncoder(); break;
                case "wdp": encoder = new WmpBitmapEncoder(); break;
                default: extension = "png"; encoder = new PngBitmapEncoder(); break;
            }
            encoder.Frames.Add(BitmapFrame.Create(cb1, null, metadata, null));
            using (FileStream pageOutStream =
                new FileStream(outputFileName, FileMode.Create, FileAccess.Write))
            {
                encoder.Save(pageOutStream);
                pageOutStream.Close();
            }
        }
    }
}

The paper size object which is passed in is one of the following (96 dpi * paper dimensions (in inches)):

        Size A4Paper = new Size(780, 1100); // rounded to make the checking the math simpler
        Size A3Paper = new Size(1560, 2200); // rounded to make the checking the math simpler

The following is the core routine which generates a multiple page TIFF file:

/// <summary>
/// Writes a multiple page TIFF file, tiled into pages
/// </summary>
/// <param name="JPG_Path">The path for the output file, used to make
/// the output file name
/// fileNamePart">The filename without path or extension,
/// used to make the output file name
/// pageNum">The page number in the XPS file,
/// used to make the output file name</param>
/// <param name="renderTarget">The bit map image of the XPS page</param>
/// <param name="Paper">The size of the output tiles required
/// PaperSize">The description of the tile size(e.g. A3),
/// used to make the output file name</param>
private void GeneratePagesAsMultiPageFile(string JPG_Path, string fileNamePart,
    int pageNum, RenderTargetBitmap renderTarget, Size Paper, string PaperSize)
{
    int iHrozTiles = ((int)(renderTarget.Width / Paper.Width)) + 1;
    int iVertTiles = ((int)(renderTarget.Height / Paper.Height)) + 1;
    BitmapMetadata metadata = MakeMetadata(fileNamePart, "tiff");
    TiffBitmapEncoder encoder = new TiffBitmapEncoder();
    using (FileStream pageOutStream = new FileStream(
        MakeFileName(JPG_Path, fileNamePart, pageNum,
        "MultiplePage", ".tiff", PaperSize),
        FileMode.Create, FileAccess.Write))
    {
        for (int i = 0; i < iHrozTiles; i++)
        {
            for (int j = 0; j < iVertTiles; j++)
            {
                Int32Rect crop = new Int32Rect(
                    (int)Math.Min(i * Paper.Width, renderTarget.Width),
                    (int)Math.Min(j * Paper.Height, renderTarget.Height),
                    (int)Math.Min(Paper.Width, renderTarget.Width - ((i) * Paper.Width)),
                    (int)Math.Min(Paper.Height, renderTarget.Height - (j * Paper.Height)));
                if (crop.X == renderTarget.Width || crop.Y == renderTarget.Height)
                    continue;
                CroppedBitmap cb1 = new CroppedBitmap(renderTarget, crop);
                encoder.Frames.Add(BitmapFrame.Create(cb1, null, metadata, null));
            }
        }
        encoder.Save(pageOutStream);
        pageOutStream.Close();
    }
}

Possible Enhancements

There are a couple of things which could be “sharpened” up in the solution presented. These include:

  • Not outputting blank pages. This is potentially possible, the Bit Map of the cropped image would have all locations with the same value. Maybe putting a “this page is blank” message onto the page.
  • Printing the page number, and or (x,y) location onto the outputs. This again is possible.
  • Support for more paper sizes. This is just a process of defining different paper sizes. The current A4 and A3 suite my requirements.

Conclusions

It took a bit of hunting to get to this solution, but it does work (most of the time – I’ll expand on that next).

I suspect that there is a limit on the size of an XPS diagram which can be handled by these API’s. I have one XPS file which “blows up”, when trying to process it. I’ve more investigating to do before I have completely diagnosed this. I suspect that it will end up being a bug report for Microsoft.

, , , , , , , , , , , , ,

12 Comments

Debug.WriteLine in C# 4.0 and .Net Framework 4.0


Introduction

This is one is a very short blog post. There is a one improvement which has made its way into the C# 4.0 and .Net Framework, which I wish to share.

This improvement is a very simple one. If I had a paranoid streak, I would say Microsoft must have been watching the code I’ve been writing. Why? Because the code I write is littered with the following:

int i = 0;
// The old way
Debug.WriteLine(String.Format("What is i {0} - old way", i));

I use the debug output window in Visual Studio heavily. I find shoving formatted strings into the debug output window is one my standard ways of producing debug diagnostics from programs in development.

The Enhancement

[ConditionalAttribute("DEBUG")]
public static void WriteLine(
    string format,
    params Object[] args
);

The above is the prototype of the addition to the Debug class, another overload of the WriteLine method. This WriteLine method is document here : Debug.WriteLine.

Simply put this is a WriteLine method which encapsulates the String.Format call. Which results in code like:

int i = 0;
// The old way
Debug.WriteLine(String.Format("What is i {0} - old way", i));
// New in Version 4 of C# and .Net Framework
Debug.WriteLine("What is i {0} - New C# 4.0 way", i);

Conclusion

It’s a great improvement. It will save me a lot of keystrokes when I’m developing. But this enhancement , has a downside, now I need to remember to use it!

, , , , , , , , , , ,

Leave a comment

LINQ Performance Tuning: Using the LookUp Class


Introduction

This blog posts I hope to share with you the benefits of using the Lookup Class and the IEnumerable.ToLookup method for creating the Lookup Object.

The benefits of using this .Net Framework object can be a significant reduction in the time  taken to execute LINQ statements. By significant I reduced the elapse time for execution of a heavily LINQ to Objects and LINQ to XML program from 15+ minutes, to seconds.

If inspiration grabs me I may include some benchmarking test in this post, or in a subsequent posts. I’m always a bit cautious of creating artificial test cases for things like this. The artificial test case can be very misleading, as they really demonstrate the technique in the best light. The real test of the of any technique is not in a “test tube”, but in how it helps resolve real programming and performance (in this case) problems.

The Lookup Class – Overview

This is a very interesting animal in the .Net Framework. It has the following interesting features:

  • It is only created through the application of the Factory Method, which is attached to IEnumerable.ToLookup as the LINQ extension method. Classes_Diag
  • There is no public constructor for the class.
  • The Lookup Class is somewhere in between the Dictionary and List. In that it possess properties of both class, as a mixture. These properties include:
    • Like a Dictionary it is supports keyed access to the data. To use an analogy with SQL the way SQL behaves the Lookup is like an indexed table.
    • Like a List it supports storing as many of a “type” in it as possible (available memory or the CPU architecture which the process is running under being the limiting factor).
    • Unlike either List<T> or Dictionary<T>, the one key can have multiple objects stored under it. So, it is a very like a Dictionary<T1, List<T2>> .  The implementation uses an IGrouping<T2>, but the analogy holds true.

The Lookup Class – The Details

Internal StorageLookUpStructure

The Lookup class is an interesting storage mechanism. The key features are:

  • The storage is by the key, which is the important aspect which is being leveraged when improving the performance of LINQ operations.
  • Unlike the Dictionary Class, the Lookup Class stores more than one object against the key.
  • Like the List Class the order of the set of objects stored against the key, are not in an ordered storage.
  • The opposite diagram is one way to visualise the way the Lookup Class stores the data. The important points to note are:
    • The Keys must be of the same type.
    • The Objects stored must be of the same type.
    • The order of the objects within a key is undetermined. This is vey much the same as SQL tables, where the order of the rows is not guaranteed, unless you use an Order By clause (which imposes an order).
    • There can be any number of objects stored under each key.

Creating An Instance of the Lookup Class

There is not too much in the creation of a Lookup Object, apart from the caveats which are:

  • It can only be created as the output of a LINQ operation.
  • It is an invariant object structure. The Lookup object is effectively “read only” there are no ways to mutate (add or remove elements or the key set) the content.

The following example C# code shows two of the ways of creating a Lookup Object.

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var example2 = (from element in seed
                select new { key = element.Attribute("Test1").Value, value = element })
                    .Union(from element in seed
                           select new { key = element.Attribute("Test2").Value, value = element })
                    .ToLookup(A => A.key, A => A.value);

Example1 will result in a lookup indexed by the XName, containing a set of IGrouping of XElement .

Example2 will result in a lookup indexed by the the string value from the Attributes “Test1” and “Test2”, with the same XElement potentially falling under the different key value.

There are variants of the ToLookup method which take an IEqualityComparer for the type of the Key. These I must admit I’ve not needed use. My use cases for the Lookup Object has not included object types in the Key other than string, and the framework provided equality comparer for that has suited me fine (this far).

Limitations Of The Lookup Class

There are number of limitations which the Lookup Class comes with. Many of these limitations have been mentioned in this blog post already, but I’ll put a list of them in here (just for completeness). The imitations include (and these are the ones I’ve found, or found written about):

  • No public constructor. The only way to create a Lookup Object is to use the ToLookup factory method on the IEnumerable interface. Or, if you prefer, the Lookup Object can only be created as the output from LINQ operations.
  • There are no mutators for the Lookup Object. As an output from LINQ, this output sequence is effectively read only. There are no Add, or Remove, methods available for the Lookup object. If you need a different set of content, you will need to generate that set through LINQ, and create another Lookup Object.
  • The IGrouping structure of the storage which the Lookup Class is a bit of a thing to deal with. There are a couple of way to unravel this structure (I’ve found two thus far, but that’s not to say this is an exhaustive list):
      List<XElement> seed = new List<XElement>();
      var example1 = (from element in seed
                      select element)
                     .ToLookup(A => A.Name);
      // One way to unwrap the IGrouping
      var unwrap1 = example1.SelectMany(A=>A);
      // Another way to unwrap the IGrouping
      var unwarp2 = from step1 in example1
                    from unwrapped in step1
                    select unwrapped;
    • Unwrap1 uses the SelectMany method ( see:  LINQ SelectMany and IGrouping for some more on the use of SelectMany ).  The “A=>A” is required for the method, and simply says flatten the multiple input sequences (one for each key value), into one output sequence.
    • Using a nested from clause in the LINQ syntax. If you’ve done anything with LINQ to XML you would be familiar with using intermediate sequences in LINQ syntax.

Using An Instance Of The Lookup Class In LINQ

This is where the “rubber hits the road”, or more to the point where big performance improvements can be made in the execution of LINQ statements.  As with anything performance related please test these techniques in the context of your application. The following is what I have observed in the context of my application. I’ve been using equal measures of the Stopwatch Class and the Visual Studio Profiler to measure the impacts of my application of the Lookup Class.

The following is the fastest way in LINQ to work with the Lookup object (the best way I’ve found thus far).

List<XElement> seed = new List<XElement>();
var example1 = (from element in seed
                select element)
               .ToLookup(A => A.Name);
var dict1 = new Dictionary<XName, XElement>();
var join_sample = from lut in example1
                  join dict in dict1 on lut.Key equals dict.Key
                  select lut;

The LINQ join clause seems (this is by observation of the impact in the Visual Studio Profiler) to resolve in the indexes of the Dictionary and Lookup objects. The analogy with the way a SQL execution plan will resolve in the indexes of two tables which are being joined.

Conclusion

The performance boost that LINQ operations can gain through the judicious application of the Lookup Object, makes learning to master the use of the object well worthwhile (if you want fast code).

The DGML generating and pagination process I’m currently building has benefitted significantly from the judicious use of these objects. I’ve taken the process of generating multiple files of DGML from minutes (in the 10 to 15 minutes mark) to seconds (20 to 30 seconds).

, , , , , , , , , , , , , ,

Leave a comment

%d bloggers like this: