MSBuild Zip task without external dependencies

I have seen quite a few build systems in my life. Most of them were very complicated with a ton of crap a lot of tools, dependencies, perl scripts, batch files, nested targets files and God knows what else. Figuring out how something worked (or, more often, why something did not work) was time consuming and very frustrating. One of the reasons for this was people were adding some stuff to the build system but no one has ever removed anything. What was even more annoying was that new dependencies oftentimes added tens of new files just to enable one small thing. Not only the enlistments were huge (how about ~200GB without QA tests?) but also configuring the machine to be able to build all this and run the tests was a sort of black magic. Because of all this I always felt bad about the fact that MSBuild did not have a Zip task available out-of-the-box. I feel that not having this one thing is the first step to having a build system that everyone hates. Yes, I know there wasn’t a Zip library in .NET Framework until now. Yes, I know there are third party Zip libraries out there. Yes, I know about MSBuild Community Tasks. Yes, I know MSBuild can somehow zip files internally as it can create VSIX files which are zip files… oops – this probably was not the best example. Anyways, not having a built-in Zip tasks means that you need to add some dependencies to your build system to be able to build your project. This will lead to a build system that no one wants to touch to not break anything. What about sharing just a single project? Like for instance my Code First view gen templates? Is it OK to tell people – “you can build it on your own – here is the project, but first you need to install this and this and this or it won’t build”? I don’t think it’s OK, but before I could not help much. Fortunately, a Zip library was finally added to .NET Framework 4.5. This allowed me creating my own Zip task. How is this task different from, for instance, the Zip task from the MSBuild community tasks? I created it as an inline task. As a resull it is just a small text file I can import to my projects. It can be checked in to my source control. It does not require any additional components being installed or present on the machine apart from what’s already there. If I need to know what the task is doing I can see the source without Reflector. I can easily change the task without having to recompile half of my build system just to be able to build what I actually want to build. (See the Disclaimer at the bottom of the page). The task looks just like this:

<UsingTask TaskName="Zip" TaskFactory="CodeTaskFactory" AssemblyFile="$(MSBuildToolsPath)\Microsoft.Build.Tasks.v4.0.dll">
    <ParameterGroup>
      <InputFileNames ParameterType="Microsoft.Build.Framework.ITaskItem[]" Required="true" />
      <OutputFileName ParameterType="System.String" Required="true" />
      <OverwriteExistingFile ParameterType="System.Boolean" Required="false" />
	</ParameterGroup>
    <Task>
      <Reference Include="System.IO.Compression" />
      <Using Namespace="System.IO.Compression" />
      <Code Type="Fragment" Language="cs">
      <![CDATA[        
        const int BufferSize = 64 * 1024;

        var buffer = new byte[BufferSize];
        var fileMode = OverwriteExistingFile ? FileMode.Create : FileMode.CreateNew;

        using (var outputFileStream = new FileStream(OutputFileName, fileMode))
        {
          using (var archive = new ZipArchive(outputFileStream, ZipArchiveMode.Create))
          {
            foreach (var inputFileName in InputFileNames.Select(f => f.ItemSpec))
            {
              var archiveEntry = archive.CreateEntry(Path.GetFileName(inputFileName));

              using (var fs = new FileStream(inputFileName, FileMode.Open))
              {
                using (var zipStream = archiveEntry.Open())
                {
                  int bytesRead = -1;
                  while ((bytesRead = fs.Read(buffer, 0, BufferSize)) > 0)
                  {
                    zipStream.Write(buffer, 0, bytesRead);
                  }
                }
              }
            }
          }
        }        
      ]]>
      </Code>
    </Task>
  </UsingTask>

Using the task is simple. Put the task to a separate file and import the file to the csproj file. In fact the file is available in my github repo – https://github.com/moozzyk/MSBuild-Tasks. Once you import the file to the project you just invoke the task as you would invoke any other task – for example (this is an actual except from one of my csproj files):


  <Import Project="common.tasks" />
  
  <Target Name="BeforeBuild">
    <ItemGroup>
      <FilesToZip Include="$(ProjectDir)\PayloadUnzipped\*.*" />
    </ItemGroup>
    <Zip 
      InputFileNames="@(FilesToZip)"
      OutputFileName="$(ProjectDir)$(TargetZipFile)"
      OverwriteExistingFile="true" />
  </Target>

That’s pretty much it. Works for me and hopefull will work for you.

Disclaimer:
I am not trying to diminish MSBuild Community Tasks or claim that inline tasks will solve all problems of this world. I am trying to say that for small simple tasks inline tasks can be just much more convenient.

24 thoughts on “MSBuild Zip task without external dependencies

    1. This particular task will only compress files. You can write a similar task for extracting files. You can use ZipArchive to do that. The code to extract files would look like this:

      using(ZipArchive zipArchive = new ZipArchive(stream))
      {
      foreach (ZipArchiveEntry entry in zipArchive.Entries)
      {
      entry.ExtractToFile(entry.Name);
      }
      }

      Like

  1. Thanks! Besides, thanks to you now I know how to write the build scripts in C#. 🙂 You have no idea how much suffering it is to write it in some clumsy XML (horrors like Ant or Maven)… Now I can finally format strings and add numbers while building things! pew!

    Like

    1. I prefer the built-in constructs and tasks over ad-hoc ones. They are tested and in some cases MSBuild can infer some additional information from it (e.g. order or things). For formatting numbers you don’t have to write script. Every property in MSBuild is a string and as such you can use string methods on it e.g. $(Version.Substring(0, 2)). You can also use static methods like this: $([System.Guid]::NewGuid()). You can find more interesting stuff here: http://msdn.microsoft.com/en-us/library/dd633440.aspx

      Like

      1. Thanks for the links, but, frankly, after using something like SCons or Rake, MSBuild doesn’t shine… And so far every attempt at using XML for programming sucked big time 🙂 MSBuild files are no exception. Just for the sake of convenience, I keep SCons build for myself (which I’m not allowed to share due to company policies, not even with my coworkers) and the MSBuild project. Just to give you the sense of magnitude: the SCons build is about TEN times shorter and a lot more concise then it’s counterpart. I had no need to invent the wheel for zipping files (I actually had to rework your code to add directory zipping). Wait, I also wanted to export gzipped tar files… and I also had to generate MD5 hashes of the packages etc…

        Like

        1. I am not trying to argue that MSBuild is or is not the best. I am just saying that it is possible to call static methods without turning to using scripts and that it’s better to use built-in things instead of reinventing the wheel and writing own tasks/scripts for something that is already supported. Finally, while there are many different tools sometimes you cannot escape MSBuild and you have to play by its rules. If you have a tool you can use which does a better job than MSBuild then I don’t see a reason to use MSBuild instead of this tool.

          Like

      2. Well, my argument was that instead of providing you with X or X+Y built-in functions it is a lot better to give you the language, in which you can roll your own, because that would be X^Y (x to the power y) functions 🙂 When the build script isn’t itself written in the language, in which you can add more functionality – you will inevitably run yourself into the corner, where you have to extend it, but the tool is just so bad / no one knows how to use, that you will give up on it and either do a lot of manual labour or, which happens most of the time: just won’t do any more advanced automation.

        My argument against using XML was that it imposes arbitrary limits on what you can or cannot do in your code (those examples you have are too trivial), once you want to define a function locally, because you need to recurse through directories or to define a class – you will not be able to do that in the subset of syntax available in XML. And there’s no reason why things should be like that. It’s just the self imposed limitation of whoever chose to use XML for the task.

        Like

        1. More not always means better. In general I am a big proponent of having basic building blogs that can be composed together to build functionality as opposed to trying to cover each possible scenario just by throwing in as many components as possible without even thinking if they could possibly be useful.
          Xml has its own disadvantages (especially verbosity) but I look at it more as implementation detail. Regardless of the way you choose to express your program what makes the real difference is the engine that reads, interprets and executes the program. Real limitations are there and not in Xml itself.
          Looking at MSBuild I believe what’s allowed in Xml are the basic building blocks. For more complicated things you just need to use the right tool for the job which is an MSBuild task.

          Like

      3. > More doesn’t mean better – sure. But whoever will use MSBuild (or Ant, or Maven or Grails or Make, any flower of) will inevitably create _a lot_ more entities, then whoever uses SCons, Rake or Grunt – this is because the later provides a complete programming language to back you up. This means, most importantly, code reuse. Second, but nonetheless important – familiar abstractions, such as functions, variables, data types common to many programming languages.

        Each tool from the first group tries in some sense to reinvent the abstractions – but all they can come up with is a half-arsed, mostly useless replicas of their well-known prototypes (and that only if you are lucky!).

        You need to distinguish between providing more on demand, or forcing you to create more, even though you might not want that. The second group of build tools scales a lot better with complexity of your builds. It also allows you to manage build scripts in the same way you would manage any other programming project. Use all the same tools, familiar paradigms, code editors and so on.

        XML is limiting not only because of verbosity: poor editor support (you don’t know how to structure your program). Non-existent modularity. XML has less expressive power then most of the programming languages today, when describing the data. It cannot describe things like circular graphs, references in general, it cannot introduce new language entities (like macros for example). It cannot introduce new restrictions (you cannot restrict arbitrary chosen entity to only behave in the way you want it to). It is more or less as if you were trying to write in assembly language: very few terminals, very few valid grammar products, yet massive code listings which you cannot navigate in any sensible way. You have to struggle to build your language abstraction on top of what you are given, and soon you begin to feel the futility and redundancy of the effort 🙂

        Like

  2. Hello,
    thanks for nifty bit of code, that I immediately started using. But I ran into a problem, I can’t get it to handle non-ascii filename correctly. If I output the filename with Console.WriteLine just before the call to archive.CreateEntry they are correct. But they get garbled in the zip file. And when I unpack them they are still garbled.

    I tried to test with an UTF-8 encoding when creating the ZipArchive, but it changed nothing. The MSDN documents say that it should be able to handle non-ascii…

    Have you had any problems with this?

    Like

  3. In a msbuild script I convert some odt-files to pdf-files and want to collect them in a zip file (using your code pretty much as it is). It is a bunch of user manuals in different languages, for example in finnish: “Käyttäjän opas (FIN).pdf” which become “K+ñytt+ñj+ñn opas (FIN).pdf”.

    If I pack it with “Send to Compressed folder” the name is preserved.

    The MSDN description of ZipArchive clearly says that :
    “For entry names that contain characters outside the ASCII range, the language encoding flag is set, and entry names are encoded by using UTF-8.”

    I really liked to have a self-contained solution, so I (and my colleagues) didn’t have to be depending on more installed software.

    Like

    1. I’m not sure of exactly how wvxvw zipped directories, but I successfully did so by changing the Code fragment to:

      ZipFile.CreateFromDirectory(InputFolderName, OutputFileName);

      Much Cleaner.

      Just make sure to update the ParameterGroup to “”InputFolderName” instead of “InputFileNames” and to add a task to delete the zip file before creating a new one.

      Like

  4. I’ve made some adjustments to the version of the Zip task available on GitHub. Here’s my version of the class:


    f.ItemSpec))
    {
    var archiveName = Flatten ? Path.GetFileName(inputFileName) : (String.IsNullOrEmpty(RemoveRoot) ? inputFileName : inputFileName.Replace(RemoveRoot, “”));
    var archiveEntry = archive.CreateEntry(archiveName);
    using (var fs = new FileStream(inputFileName, FileMode.Open))
    {
    using (var zipStream = archiveEntry.Open())
    {
    fs.CopyTo(zipStream);
    }
    }
    }
    }
    ]]>

    Changes are as follows:
    a) Changed name from “Zip” to “NativeZip” (to avoid conflicts with MSBuild.Community.Tasks)
    b) Added Flatten and RemoveRoot parameters. If Flatten is set to true, this gets the behavior that the task used to have, creating a flat ZIP file. If RemoveRoot is used, the path specified will be removed from the paths of the files in the ZIP file. If neither is used, the full path of the input file will be used for the path in the ZIP file.
    c) Added call to Directory.CreateDirectory() to make sure that the directory in which the ZIP file will reside exists ahead of time. Without this, the code would fail if the directory did not exist ahead of time.

    Feel free to incorporate this into the GitHub version if you’d like.

    Like

  5. I can’t get this to work: it seems to be compiling and executing the inline task with .Net 4.0 rather than 4.5, so it doesn’t load System.IO.Compression even though it’s referenced.

    Like

    1. I am not sure if I understand what “executing inline task witn .NET 4.0” means. Do you have .NET Framework 4.5(.x) installed on the machine? If you don’t then obviously things won’t work. If you do then .NET Framework 4.5+ is an in-place update and you can’t have .NET Framework 4. If it is failing when compiling the task find how it is compiling (AFAIR MSBuild creates a temp .cs file containing your code and also one with compilation options, alternatively you can use Procmon to check how csc.exe is invoked). What version of MSBuild are you using (and how did you checked this?)?

      Like

Leave a comment