postheadericon A DSL for XML in PowerShell: New-XDocument

In July of last year I wrote a PowerShell script with the goal of allowing me to generate XML from PowerShell with a simple markup that would look a little like the resulting XML ... this week I was using that script again, and had a couple of issues that made me go back and look at the source.

While I was playing with the source and tweaking things a little bit to improve the way it handles namespaces, I started playing with the idea that I could improve the syntax. At the very least, I thought, I ought to be able to do away with all those “xe” aliases…

Well, I was able to. ( [new] new version) And what’s more, I managed to dramatically clean up the way namespaces work, and make it so that really, the only ugly part of the syntax is the initial declaration of namespaces! I’m going to start with two examples, and use them to walk you through the features :)

Example 1

The simplest example I could think of is to list all the files in a folder, with the file size and last modified stamp:

[string]$xml = New-XDocument folder -path $pwd {
   foreach($file in Get-ChildItem) {
      file -Modified $file.LastWriteTimeUtc -Size $file.Length { $file.Name }
   }
}

The output of that, when run on my formats folder, looks like this:

<folder path="C:\Users\Jaykul\Documents\WindowsPowerShell\formats">
  <file modified="2009-11-07T07:27:00Z" size="30474">CliXml.xsd</file>
  <file modified="2009-11-07T07:27:40.48001Z" size="14314">format.xsd</file>
  <file modified="2010-01-16T21:30:06.0562796Z" size="18275">NppExternalLexers.xml</file>
  <file modified="2009-03-18T21:28:51.6579351Z" size="5802">Recommender.Types.Format.ps1xml</file>
  <file modified="2009-11-07T07:27:40.518029Z" size="5107">types.xsd</file>
</folder>

You can immediately see what the script does: New-XDocument (which is aliased as ‘xml’) actually generates the root xml node, so the first argument to it is the name of that node, and any other arguments become attributes … except for the script block. That script block turns into the contents of the node.

Inside the script block, PowerShell code is parsed as usual, but whenever a command that doesn’t exist is encountered, it is turned into an xml node! Pretty simple, right? Of course, if you wanted to create a node with a name that’s already taken by a PowerShell command, you can just replace file with New-XElement file, or (using aliases) xe 'file', which explicitly creates an xml node with the given name.

That’s pretty much it for our first example, so let’s look at a more complicated example, with multiple namespaces, and deeper nesting.

Example 2

This time, we’ll create an Atom document, and we’ll include some namespace extensions (including a made up one for listing my files as we did above):

New-XDocument (([XNamespace]"http://www.w3.org/2005/Atom") + "feed")          `
              -fi ([XNamespace]"http://huddledmasses.org/schemas/FileInfo")   `
              -dc ([XNamespace]"http://purl.org/dc/elements/1.1")             `
              -$([XNamespace]::Xml +'lang') "en-US" -Encoding "UTF-16"        `
{
   title {"Huddled Masses: You can do more than breathe for free..."}
   link {"http://HuddledMasses.org/"}
   updated {(Get-Date -f u) -replace " ","T"}
   author {
      name {"Joel Bennett"}
      uri {"http://HuddledMasses.org/"}
   }
   id {"http://HuddledMasses.org/" }

   entry {
      title {"A DSL for XML in PowerShell: New-XDocument"}
      link {"http://HuddledMasses.org/A-DSL-for-XML-in-PowerShell-New-XDocument/" }
      id {"http://HuddledMasses.org/A-DSL-for-XML-in-PowerShell-New-XDocument/" }
      updated {(Get-Date 2010/03/03 -f u) -replace " ","T"}
      summary {"A while back, I posted a simple mini language for generating XML from PowerShell script. However, I was using it the other day, and I really just felt that the markup was ugly, since it was littered with 'xe' marks and such."}
      link -rel license -href "http://creativecommons.org/licenses/by/3.0/" -title "CC By-Attribution"
      dc:rights { "Copyright 2010, Some rights reserved (licensed under the Creative Commons Attribution 3.0 Unported license)" }
      category -scheme "http://huddledmasses.org/tag/" -term "xml"
      category -scheme "http://huddledmasses.org/tag/" -term "PowerShell"
      category -scheme "http://huddledmasses.org/tag/" -term "DSL"
      fi:folder -Path "~\Formats" {
         foreach($file in Get-ChildItem (Join-Path (Split-Path $profile) Formats)) {
            fi:file -Created $file.CreationTimeUtc -Modified $file.LastWriteTimeUtc -Size $file.Length { $file.Name }
         }
      }
   }
} | % { $_.Declaration.ToString(); $_.ToString() }

There are four things you should notice, in particular:

First: the initial tag has a [XNamespace] added to it. You can specify a tag name that has a namespace by adding them together this way, or by embedding the namespace in the string like "{http://www.w3.org/2005/Atom}feed" instead. Either way works. This initial namespace becomes the default namespace for the document. If you don’t specify a namespace on tags later, they automatically belong to that one.

Second: when you want to add additional namespaces, you can do so with a custom prefix like: -dc ([XNamespace]"http://purl.org/dc/elements/1.1"), and that prefix (dc) takes on a special meaning. When you want to have a tag later on that is part of that namespace, you just prefix the tag, like dc:rights —the same way you would in XML.

Third: any number of attributes can be specified using the -name value syntax, but anything in a {scriptblock} becomes the content — and is subject to the same rules as the outer sections.

Fourth: This generates an XDocument. When you cast an XDocument to string, the xml declaration is left off, so if you want it, you need to manually add it via $XDocument.Declaration. Incidentally, XDocuments are not XMLDocuments, but they are trivially castable to them.

The output of that particular section of New-XDocument is this:

<feed xmlns:dc="http://purl.org/dc/elements/1.1" xmlns:fi="http://huddledmasses.org/schemas/FileInfo" xml:lang="en-US" xmlns="http://www.w3.org/2005/Atom">
 
  <link />http://HuddledMasses.org/
  <updated>2010-03-04T00:44:31Z</updated>
  <author>
    <name>Joel Bennett</name>
    <uri>http://HuddledMasses.org/</uri>
  </author>
  <id>http://HuddledMasses.org/</id>
  <entry>
   
    <link />http://HuddledMasses.org/A-DSL-for-XML-in-PowerShell-New-XDocument/
    <id>http://HuddledMasses.org/A-DSL-for-XML-in-PowerShell-New-XDocument/</id>
    <updated>2010-03-03T00:00:00Z</updated>
    <summary>A while back, I posted a simple mini language for generating XML from PowerShell script. However, I was using it the other day, and I really just felt that the markup was ugly, since it was littered with 'xe' marks and such.</summary>
    <link rel="license" href="http://creativecommons.org/licenses/by/3.0/" title="CC By-Attribution" />
    <dc:rights>Copyright 2010, Some rights reserved (licensed under the Creative Commons Attribution 3.0 Unported license)</dc:rights>
    <category scheme="http://huddledmasses.org/tag/" term="xml">
    <category scheme="http://huddledmasses.org/tag/" term="PowerShell">
    <category scheme="http://huddledmasses.org/tag/" term="DSL">
    <fi:folder path="~\Formats">
      <fi:file created="2009-11-07T07:27:00Z" modified="2009-11-07T07:27:00Z" size="30474">CliXml.xsd</fi:file>
      <fi:file created="2009-11-07T07:27:40.4529965Z" modified="2009-11-07T07:27:40.48001Z" size="14314">format.xsd</fi:file>
      <fi:file created="2009-02-07T13:56:12Z" modified="2010-01-16T21:30:06.0562796Z" size="18275">NppExternalLexers.xml</fi:file>
      <fi:file created="2009-08-09T19:10:06.3647094Z" modified="2009-03-18T21:28:51.6579351Z" size="5802">Recommender.Types.Format.ps1xml</fi:file>
      <fi:file created="2009-11-07T07:27:40.4970185Z" modified="2009-11-07T07:27:40.518029Z" size="5107">types.xsd</fi:file>
    </fi:folder>
  </category></category></category></entry>
</feed>

The New-XDocument script itself is on PoshCode in the Xml Module 4 along with a few interesting functions like Select-XML (which improves over the built-in by being able to ignore namespaces when you write XPath) and Remove-XmlNamespace (which was instrumental in removing namespaces for Select-Xml). There’s also a Format-Xml for pretty-printing, and a Convert-Xml for processing XSL transformations.

I’ll probably post some more examples of this in the next week or two, and I really should write some commentary about the function itself, which uses the tokenizer to discover which “commands” are really xml nodes … but for now, I’ll leave you to enjoy.

Reblog this post [with Zemanta]

7 Responses to “A DSL for XML in PowerShell: New-XDocument”

  • Wow, that on-the-fly implementation of unrecognized command names is very clever! But the potential conflicts (or more importantly, potential future conflicts) seems too dangerous to use in real scripts. I wonder if it might be possible to temporarily alter the command lookup table so that only full cmdlet names could be used in the XML DSL and any “short” names would always be treated as XElements.

    • Yeah, that’s a great idea. Consider it implemented!

      As of version 4.2 the only commands that are NOT converted are ones which have a dash in them, and are either a Cmdlet, a Function, or an ExternalScript (ie: a ps1 file in your path or specified by full name). Aliases, Applications, ScriptBlocks, and Functions or Scripts without ‘verb-noun’ syntax in their name will be ignored (and thus, treated as xml tags). Honestly, it was probably enough to check for “-” but since that’s technically a legal character in xml node names (although I don’t think I’ve ever seen it used), I figured I needed the combination :) .

      I’ve updated the links above, and changed my example script to use Get-ChildItem instead of the now disallowed ls ;)

  • Jason Archer:

    Very cool idea!

    It would be nice if Format-XML could also take the path to an XML file.

  • Mark Dunlop:

    Just looked at 4.3 – looks great.

    I’ve been building a load of code using Mike Kay’s Saxonica libraries, and have recently been tempted to do a proper job of building some advanced functions. Reasoning behind that was the high level of standards compliance, especially wrt XQuery

    Since you’re working in the same namespace, so to speak, I wonder if you’d had any thoughts on doing this.

    • I haven’t really looked at Saxonica in at least a couple of years, so it’s safe to say I wasn’t planning on doing anything with it ;)

      • Mark Dunlop:

        The challenge is that to do a full implementation as advanced functions, with all the optional parameters, looks like over 1000 lines of code :)

        Your solution is looking more and more appealing by the second. Especially since I can just pick it up and use it.

Archives