Although there are a few “CSS Selector” libraries, most browsers haven’t even implemented CSS3 selectors, never mind frameworks like .Net or scripting languages like Javascript or PowerShell ;) so XPath remains the most powerful way to deal with finding specific data in an XML file, and by extension, XHTML and even HTML files (if you can convert them using something like SgmlReader) is to use XPath queries.

There are a lot of XPath tutorials around the web, so there’s no need for me to get into that very much, but I just wanted to write a brief note about using XPath with documents that have namespaces (particularly, from .Net). The problem is that in order to select nodes that have a namespace assigned, you must use a namespace prefix and a NamespaceManager. So even if it’s the default namespace, if there’s an xmlns=”...” on the document, you have to create and use a prefix.

The bottom line is this: if you have an XML document that looks like this:


<event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <system>
    <eventid>100</eventid>
    <version>2</version>
    <level>1</level>
    <task>4002</task>
    <opcode>34</opcode>
  </system>
  <eventdata>
    <data name="BootTsVersion">2</data>
    <data name="BootStartTime">2009-11-22T08:45:11.640400200Z</data>
    <data name="BootEndTime">2009-11-22T08:48:23.432178500Z</data>
    <data name="SystemBootInstance">18</data>
    <data name="UserBootInstance">15</data>
  </eventdata>
</event>
 

That bit at the top where it says: xmlns="http://schemas.microsoft.com/win/2004/08/events/event" is assigning a default namespace. Sometimes you’ll see something like this (eg: in RSS):


<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss/">
   <channel>
      <item>
         <title>Buttons</title>
         <link />http://www.flickr.com/photos/jaykul/4114286262/
         <description>Quick cut-n-paste from a screenshot</description>
         <pubdate>Tue, 17 Nov 2009 20:46:31 -0800</pubdate>
         <guid ispermalink="false">tag:flickr.com,2004:/photo/4114286262</guid>
                        <media:content url="http://farm3.static.flickr.com/2552/4114286262_001e2f8905_o.png" type="image/jpeg" height="157" width="337">
                        <media:title>Buttons</media:title>
                        <media:description type="html">&lt;p&gt;Quick cut-n-paste from a screenshot into your mockup&lt;/p&gt;</media:description>
                        <media:thumbnail url="http://farm3.static.flickr.com/2552/4114286262_cba8aaf967_s.jpg" height="75" width="75">
                        <media:credit role="photographer">Jaykul</media:credit>
                        <media:category scheme="urn:flickr:tags">screenshot snagit</media:category>
      </media:thumbnail>
       </media:content>
</item>
</channel></rss>

This one assigns a specific prefix “media” to the namespace url “http://search.yahoo.com/mrss/” ...

Selecting nodes with namespaces

In either case, if you want to select a node that’s assigned to a namespace (which is ALL the nodes in the first example, but just the ones that start with media: in the second example) in .net, you have to specify the namespace in order to select those nodes with XPath. PowerShell 2.0 has a Select-Xml cmdlet which accepts -Namespaces as a parameter: you simply provide a hashtable of names to urls.

If you had loaded the first document above into $xml, you could select the BootStartTime and BootEndTime using an XPath query like this: //e:Data[Name = ‘BootStartTime’ or Name = 'BootEndTime'] but we have to DEFINE that “e” namespace. To do so using Select-Xml you just pass it into the command


$xpath = @{
    Start="//e:Data[@Name = 'BootStartTime']/text()"
    End="//e:Data[@Name = 'BootEndTime']/text()"
}
$ns = @{ e = "http://schemas.microsoft.com/win/2004/08/events/event" }

[DateTime]$Start = Select-Xml -Xml $xml -XPath $xpath.Start -Namespace $ns |
                   Select -Expand Node | Select -expand Value

[DateTime]$End = Select-Xml -Xml $xml -XPath $xpath.End -Namespace $ns |
                 Select -Expand Node | Select -expand Value

($End - $Start).ToString() # Displays: 00:03:11.7917783
 

Of course, you don’t have to use Select-Xml, you can do this in plain .Net without cmdlets (and this is what you would have to do in C#). In fact, depending on the situation, it might even be simpler:


$xpath = @{
    Start="//e:Data[@Name = 'BootStartTime']/text()"
    End="//e:Data[@Name = 'BootEndTime']/text()"
}

$ns = New-Object Xml.XmlNamespaceManager $xml.NameTable
$ns.AddNamespace( "e", "http://schemas.microsoft.com/win/2004/08/events/event" )

[DateTime]$Start = $xml.SelectSingleNode( $xpath.Start, $ns ).Value
[DateTime]$End   = $xml.SelectSingleNode( $xpath.End, $ns ).Value

($End - $Start).ToString() # Displays: 00:03:11.7917783
 

Selecting nodes regardless of namespaces

There are, however, two ways that you could avoid specifying the namespaces. The first is to just avoid specifying the node name at all. In that first example, that would be fairly easy, because the “BootStartTime” and “BootEndTime” names are unique to the nodes we’re interested in (even if there were boatloads of identical events, you’ll never have a Name=“BootStartTime” attribute on the tag, for instance). The second way is to filter using the local-name function, and specify just the “local” name (which is the non-namespace-qualified part of the name).

So to ignore the tag name, we can just use a * for a wildcard. The only real difference is that we don’t need the namespace, and the XPath pattern will be different:


$xpath = @{
    Start="//*[@Name = 'BootStartTime']/text()"
    End="//*[@Name = 'BootEndTime']/text()"
}

[DateTime]$Start = (Select-Xml -xml $xml $xpath.Start).Node.Value
[DateTime]$End   = (Select-Xml -xml $xml $xpath.End).Node.Value
($End - $Start).ToString() # Displays: 00:03:11.7917783
 

Or to specify the local name, we can use the Local-Name function. Again, we don’t need a namespace, we are just changing the XPath to be a more specific match:


$xpath = @{
    Start="//*[local-name() = 'Data' and @Name = 'BootStartTime']/text()"
    End="//*[local-name() = 'Data' and @Name = 'BootEndTime']/text()"
}

[DateTime]$Start = $xml.SelectSingleNode( $xpath.Start ).Value
[DateTime]$End   = $xml.SelectSingleNode( $xpath.End ).Value
($End - $Start).ToString() # Displays: 00:03:11.7917783
 

Reblog this post [with Zemanta]

2 Responses to “XPath and Namespaces in PowerShell”