<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Huddled Masses &#187; Converting</title>
	<atom:link href="http://huddledmasses.org/tag/converting/feed/" rel="self" type="application/rss+xml" />
	<link>http://huddledmasses.org</link>
	<description>You can do more than breathe for free...</description>
	<lastBuildDate>Sat, 28 Jan 2012 21:37:20 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
<cloud domain='huddledmasses.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Converting HTML to XML in PowerShell</title>
		<link>http://huddledmasses.org/converting-html-to-xml-in-powershell/</link>
		<comments>http://huddledmasses.org/converting-html-to-xml-in-powershell/#comments</comments>
		<pubDate>Fri, 16 Nov 2007 21:40:32 +0000</pubDate>
		<dc:creator>Joel 'Jaykul' Bennett</dc:creator>
				<category><![CDATA[Huddled]]></category>
		<category><![CDATA[Cmdlet]]></category>
		<category><![CDATA[Converting]]></category>
		<category><![CDATA[Development]]></category>
		<category><![CDATA[Html]]></category>
		<category><![CDATA[Parsing]]></category>
		<category><![CDATA[PowerShell]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Xml]]></category>

		<guid isPermaLink="false">http://HuddledMasses.org/converting-html-to-xml-in-powershell/</guid>
		<description><![CDATA[I&#8217;ll write up more information later, but a couple people have asked for this in #PowerShell on irc.freenode.net, and I had it already written, so here you go &#8230; my ConvertFrom-Html cmdlet (in a Huddled.HtmlSnapin). It converts HTML to valid xml using the SGML Parser which was available on GotDotNet years ago. It only works [...]]]></description>
			<content:encoded><![CDATA[	<p>I&#8217;ll write up more information later, but a couple people have asked for this in #PowerShell on irc.freenode.net, and I had it already written, so here you go &#8230; my ConvertFrom-Html cmdlet (in a Huddled.HtmlSnapin). It converts <span class="caps">HTML</span> to valid xml using the <span class="caps">SGML</span> Parser which was available on GotDotNet years ago. It only works with files (doesn&#8217;t do <span class="caps">URL</span> downloads yet). Use it like this:</p>

	<div class="posh code posh" style="font-family:monospace;"><br />
<span style="color: #660033; font-weight: bold;">$url</span> <span style="color: #66cc66;">=</span> <span style="color: #009900;">&quot;http://huddledmasses.org/&quot;</span><br />
<span style="color: #660033; font-weight: bold;">$file</span> <span style="color: #66cc66;">=</span> <span style="color: #0066cc; font-style: italic;">Join-<span style="font-style: normal;">Path</span></span> <span style="color: #660033; font-weight: bold;">$pwd</span> <span style="color: #009900;">&quot;HuddledMasses.html&quot;</span><br />
<br />
<span style="color: #660033; font-weight: bold;">$client</span> <span style="color: #66cc66;">=</span> <span style="color: #0066cc; font-style: italic;">new-<span style="font-style: normal;">object</span></span> System.<span style="color: #003366;">Net</span>.<span style="color: #003366;">WebClient</span><br />
<span style="color: #660033; font-weight: bold;">$client</span>.<span style="color: #003366;">DownloadFile</span><span style="color: #333;">&#40;</span> <span style="color: #660033; font-weight: bold;">$url</span>, <span style="color: #660033; font-weight: bold;">$file</span> <span style="color: #333;">&#41;</span> <span style="color: #666666; font-style: italic;">#NOTE: You need to use a full path here, not relative</span><br />
<br />
<span style="color: #660033; font-weight: bold;">$xml</span> <span style="color: #66cc66;">=</span> <span style="color: #0066cc; font-style: italic;">ConvertFrom-<span style="font-style: normal;">Html</span></span> <span style="color: #660033; font-weight: bold;">$file</span><br />
<br />
<span style="color: #666666; font-style: italic;"># Or even</span><br />
<span style="color: #333;">&#40;</span><span style="color: #0066cc; font-style: italic;">ConvertFrom-<span style="font-style: normal;">Html</span></span> <span style="color: #660033; font-weight: bold;">$file</span><span style="color: #333;">&#41;</span>.<span style="color: #003366;">Save</span><span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$file</span><span style="color: #333;">&#41;</span></div>

	<p>The source code to my plugin may be considered public domain, and is included in <a href="http://HuddledMasses.org/wordpress/wp-content/uploads/2007/11/huddledhtmlsnapin.zip">the Huddled <span class="caps">HTML</span> SnapIn Zip</a>.  </p>

	<p><del>However, the SgmlReader library is a Microsoft Sample which is licensed under <a href="http://dev.live.com/sampleseula.aspx">the old MS Samples license</a> which doesn&#8217;t allow reuse with viral open source software. I&#8217;ve seen some work being done on an <a href="http://www.codeplex.com/htmlagilitypack">HtmlAgilityPack</a> on CodePlex (using a Creative Commons <a href="http://www.codeplex.com/htmlagilitypack/Project/License.aspx"><span class="caps">ASA</span> license</a>) but I have not really looked at it except to see that it has a several active issues related to entity encoding and dropping malformed tags which I haven&#8217;t encountered in SgmlReader &#8230;</del></p>

	<p>The <a href="http://code.msdn.microsoft.com/SgmlReader">SGMLReader</a> library has been re-released on <a href="http://code.msdn.microsoft.com/"><span class="caps">MSDN</span> Code Gallery</a> under an Ms-PL license, and all is well with the world.  <img src='http://huddledmasses.org/wordpress/wp-includes/' alt=':D' class='wp-smiley' /> </p>]]></content:encoded>
			<wfw:commentRss>http://huddledmasses.org/converting-html-to-xml-in-powershell/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

