Category Archives: Xml

Unit testing XSD schemas

Once in a while a new task no-one is really eager to work on pops-up. From my experience in teams that are not focusing on or use extensively Xml related technologies most (if not all) tasks that have anything to do with XSD schemas belong to this group. This was the case in our team recently and I ended up to be a “volunteer” since the schedule was tight and I previously worked on the managed Xml team. So, I started refreshing my rusted XSD skills and soon I got something that more or less worked. It was a good starting point but then I asked myself – “how do I test this”. I needed something lightweight that would fit in our unit tests. I briefly searched the Internet but could not find anything that would be suitable.  As the old saying goes, necessity is mother of invention, so I came up with my own way of testing the schema. I like it because it contains just 3 small (less than 30 lines total) helper methods, one helper schema and most of the unit tests are just 2-3 lines. The tests actually also helped me come up with a better design I originally had. Note, I don’t know if this is the “right” approach or if it would scale for bigger schemas. I only know that for the schema I had to write it worked fine.

So, let’s say we need to write a schema for Xml files that have a structure like this:

<Settings>
    <ServiceProvider Type="typeName">
      <Setting Name="Setting1" Value="Value1" />
      <Setting Name="Setting2" Value="Value2" />
    </ServiceProvider>

   <Factory Type="typeName">
     <Setting Name="Setting1" Value="Value1" />
     <Setting Name="Setting2" Value="Value2" />
     <Setting Name="Setting3" Value="Value3" />
   </Factory>
 </Settings>

and that both ServiceProvider and Factory elements are optional.

First we need to create a starting schema. For new schemas I usually create a sample Xml file, open it in Visual Studio and use Xml → Create Schema. The schema created by the VS is not really usable but gives me something I can iterate on. The main problem with the generated schema is that all the types are defined inline. This makes it hard to test – ideally we would like to test each type separately. Generating inline types leads to another problem – each element has its own type even if the same element is used repeatedly (let alone cases where the same types are used for different elements or where inheritance is involved). The key to testing a schema is to have simple types. The simpler the type the easier it is to test. Once a type is tested it can be used as a building block to build more complicated types but it won’t require any more comprehensive testing as part of the more complicated type. For the Xml structure above we can identify three types:

  • Setting (for Setting element)
  • ServiceTypeInitializer (a common type for ServiceProvider and Factory elements)
  • Settings (for Settings element)

The problem with unit testing all these types in separation is that the schema itself should not allow any but Settings element as the document element. Fortunately for testing purposes we can create a helper schema that will allow document elements of types that are normally not allowed to be document elements. We will conditionally add this helper schema to the schema set used for validating the input Xml. Why the helper schema needs to be added conditionally? The tested schema should not allow any but the Settings element as the document element. So, when testing the Settings element we must not add the helper schema to the schema set to make sure that this is the only element allowed as the document element. Let’s see how this looks like in practice. Here is the schema created by refactoring the initial schema created by Visual Studio:


<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified"
  xmlns:xs="http://www.w3.org/2001/XMLSchema">

  <xs:element name="Settings" type="Settings_Type" />

  <xs:complexType name="Settings_Type">
    <xs:sequence>
      <xs:element name="ServiceProvider" type="ServiceTypeInitializer_Type" minOccurs="0" maxOccurs="1" />
      <xs:element name="Factory" type="ServiceTypeInitializer_Type" minOccurs="0" maxOccurs="1" />
    </xs:sequence>
  </xs:complexType>

  <xs:complexType name="ServiceTypeInitializer_Type">
    <xs:sequence>
      <xs:element maxOccurs="unbounded" name="Setting" type="Setting_Type" />
    </xs:sequence>
    <xs:attribute name="Type" type="xs:string" use="required" />
  </xs:complexType>

  <xs:complexType name="Setting_Type">
    <xs:attribute name="Name" type="xs:string" use="required" />
    <xs:attribute name="Value" type="xs:string" use="required" />
  </xs:complexType>
</xs:schema>

Now let’s create the helper schema that will allow testing each of the types separately:


<?xml version="1.0" encoding="utf-8"?>
<xs:schema attributeFormDefault="unqualified" elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Setting" type="Setting_Type" />
  <xs:element name="ServiceTypeInitializer" type="ServiceTypeInitializer_Type" />
</xs:schema>

(In the above schema the Settings element is not present since it’s already allowed at the top level by the other schema). After creating the helper schema we need a function that will validate Xml documents against our schemas:


private static IEnumerable<ValidationEventArgs> RunValidation(string inputXml, bool includeHelperSchema)
{
    var schemaSet = new XmlSchemaSet();
    schemaSet.Add(schemaUnderTest);

    if (includeHelperSchema)
    {
        schemaSet.Add(helperSchema);
    }

    var readerSettings = new XmlReaderSettings()
    {
        Schemas = schemaSet,
        ValidationType = ValidationType.Schema,
        ValidationFlags = XmlSchemaValidationFlags.ReportValidationWarnings,
    };

    var events = new List<ValidationEventArgs>();
    readerSettings.ValidationEventHandler += (s, e) => { events.Add(e); };

    using (var reader = XmlReader.Create(new StringReader(inputXml), readerSettings))
    {
        while (reader.Read())
            ;
    }

    return events;
}

There are two interesting points here. First we need to turn on reporting validation warnings. This is because XmlSchemaSet has a nasty behavior where no error is reported if the document element of the validated Xml document is in different namespace that the targetNamespace of the schema. This may result in accepting documents that are not being validated at all. Turning on reporting warnings is the first step to catch this condition. The second interesting point is that schema validation will throw exceptions for validation errors but not for warnings. Again, to catch the condition where the expected and actual namespaces don’t match we have to set XmlReaderSettings.ValidationEventHandler which will be invoked for both validation errors and warnings. Other than that the method is pretty straightforward – we create an XmlSchemaSet instance and add the schema under test and conditionally the helper schema. Then we create an XmlReaderSettings object and set it up for schema validation. We use the reader settings to create a validating XmlReader. Finally we read the input xml with the validating reader – all errors and warnings are reported by invoking the validation event handler we set.
With the test driver method ready we can start writing test cases. We write test cases for each type starting from “leaf” types (i.e. types that are defined using only pre-defined schema types) moving to more complex types. If a type contains an element of a type that has already been tested we just test that schema accepts an Xml with the simplest child element of that type and, if the type is mandatory, the Xml is rejected if it does not contain the element. If there are multiple elements of the same type we just write test cases to test the type itself and not test cases to test all the possible elements of that type (they will be tested when testing their parent type). If there was a hierarchy we would write test cases for the base type and then test cases just for what was added (or removed – in case of derivation by restriction) in the derived type. The test cases themselves are simple – in most cases a hardcoded minimal Xml document is validated using the validation method we created and we check whether expected errors are reported or that there are no errors for valid Xml documents. Some examples:


[Fact]
public void Schema_accepts_minimal_valid_Xml()
{
    Assert.True(!RunValidation("<Settings />", false).Any());
}

[Fact]
public void Schema_rejects_Setting_Type_without_Name()
{
    var error = 
        RunValidation(@"<Setting Value=""ABC"" />", true)
        .Single();

    Assert.Equal(XmlSeverityType.Error, error.Severity);
    Assert.Equal(
        "The required attribute 'Name' is missing.",
        error.Message);
}

An exemplary test suite using XUnit can be found on my github. The Readme contains details about requirements, setting up the environment, building and running tests. If you just want to see what’s most interesting (i.e. the code) you can find it here

Pawel Kadluczka

Advertisements

Xslt 1.0 biggest issues (kind of) solved

Xslt 1.0 has a number of issues that can make the life of an Xml developer frustrating. A lot of them are addressed by Xslt 2.0. Unfortunately the .NET Framework does not have an Xslt 2.0 compliant processor. Fortunately most of the biggest Xslt 1.0 pain points have workarounds. Having workarounds rather than real solutions is almost never ideal but…

Continue reading

The world has moved on, have you? Xml APIs you should avoid using.

There is a few Xml APIs you should not be using. In some cases the complier makes this obvious – the API is marked as obsolete and you will get a warning when compiling an application that uses any of these APIs. All the obsolete APIs have their replacements. The replacement for the obsolete XmlSchemaCollection class is…

Continue reading

Effective Xml Part 5: Something went really wrong – OutOfMemoryException and StackOverflowException thrown when using XslCompiledTransform

So, your application is crashing and it is crashing in the bad way. After spending hours of debugging and trying different things you figured out that this is this Xslt stylesheet that causes all the problems. How come? XslCompiledTransform is a compiler. It’s a bit different from C# or VB.NET compilers…

Continue reading

Effective Xml Part 4: Let me project this (Xml file) for you

Xml is ubiquitous. No doubt about it. It is being used almost everywhere and almost by everyone. This includes places where huge amounts of data are being processed. This means xml files (or streams) used there are also huge. And the bigger the Xml file the harder it is to process. The two biggest problems…

Continue reading

Effective Xml Part 3: Didn’t you say XslCompiledTransform was fast?

“XslCompiledTransform is fast.”
“Really?”
“Yeah, XslCompiledTransform is fast… if used correctly.”

Continue reading

Effective Xml Part 2: How to kill the performance of an app with XPath…

XPath expressions are pretty flexible. This flexibility allows for very creative ways of using XPath. Unfortunately some of them are suboptimal and cause bad performance of apps. This is especially visible in Xslt transformations where stylesheets contains tens if not hundreds of XPath expressions. Here is the list of the most common bad practices (or even anti-patterns) I have …

Contiune reading

Effective Xml Part 1: Choose the right API

This is the first part of a mini-series of blog posts about using Xml on .NET Framework platform in an effective way. Although I will be focusing on .NET Framework platform I hope that at least some of the information will be general enough to apply to working with Xml on any platform.

Contiune reading