<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Huddled Masses &#187; BinarySearch</title>
	<atom:link href="http://huddledmasses.org/tag/binarysearch/feed/" rel="self" type="application/rss+xml" />
	<link>http://huddledmasses.org</link>
	<description>You can do more than breathe for free...</description>
	<lastBuildDate>Fri, 27 Apr 2012 05:42:40 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.2</generator>
<cloud domain='huddledmasses.org' port='80' path='/?rsscloud=notify' registerProcedure='' protocol='http-post' />
		<item>
		<title>Custom IComparers in PowerShell (and Add-Type for v1)</title>
		<link>http://huddledmasses.org/custom-icomparers-in-powershell-and-add-type-for-v1/</link>
		<comments>http://huddledmasses.org/custom-icomparers-in-powershell-and-add-type-for-v1/#comments</comments>
		<pubDate>Wed, 17 Dec 2008 04:53:44 +0000</pubDate>
		<dc:creator>Joel 'Jaykul' Bennett</dc:creator>
				<category><![CDATA[Huddled]]></category>
		<category><![CDATA[.Net]]></category>
		<category><![CDATA[BinarySearch]]></category>
		<category><![CDATA[PowerShell]]></category>
		<category><![CDATA[Scripting]]></category>
		<category><![CDATA[Searching]]></category>

		<guid isPermaLink="false">http://huddledmasses.org/?p=878</guid>
		<description><![CDATA[Someone asked the question in #PowerShell (on irc.Freenode.net): How do I find an item (in an array) based on one of it&#8217;s properties? Actually, the question was rather more complicated than that. They were importing a bunch of users from a csv file, and wanted to sort them and search them based on a specific [...]]]></description>
			<content:encoded><![CDATA[	<p>Someone asked the question in #PowerShell (on irc.Freenode.net):</p>

	<h3>How do I find an item (in an array) based on one of it&#8217;s properties?</h3>

	<p>Actually, the question was rather more complicated than that. They were importing a bunch of users from a csv file, and wanted to sort them and search them based on a specific column. There are many ways to skin this cat. Imagine that you have a <span class="caps">CSV</span> file, and have imported it, like so:</p>

	<div class="posh code posh" style="font-family:monospace;"><br />
<span style="color: #0066cc; font-style: italic;">Set-<span style="font-style: normal;">Content</span></span> users.<span style="color: #003366;">csv</span> @<span style="color: #009900;">&quot;<br />
LastName, FirstName, UserName, Url<br />
Bennett, Joel, Jaykul, http://HuddledMasses.org<br />
Rottenberg, Hal, HalR9000, http://halr9000.com<br />
Hicks, Jeffrey, SapienScripter, http://blog.sapien.com/<br />
&quot;</span>@<br />
<br />
<span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #66cc66;">=</span> <span style="color: #0066cc; font-style: italic;">Import-<span style="font-style: normal;">Csv</span></span> .\users.<span style="color: #003366;">csv</span></div>

	<p>Now, imagine that the <span class="caps">CSV</span> file has thousands of users in it, and that you need to not only sort the data by first name or last name on demand, but you also need to pull users from the list (by name) on demand.</p>

	<p>These are trivial tasks in PowerShell:<span id="more-878"></span></p>

	<div class="posh code posh" style="font-family:monospace;"><br />
<span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #66cc66;">=</span> <span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #66cc66;">|</span> <span style="color: #660033;">Sort</span> Lastname, Firstname<br />
<span style="color: #666666; font-style: italic;"># or </span><br />
<span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #66cc66;">=</span> <span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #66cc66;">|</span> <span style="color: #660033;">Sort</span> Firstname, Lastname<br />
<br />
<span style="color: #666699; font-weight: bold;">return</span> <span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #66cc66;">|</span> <span style="color: #660033;">Where</span> <span style="color: #333;">&#123;</span> <span style="color: #660033; font-weight: bold;">$_</span>.<span style="color: #003366;">Firstname</span> <span style="color: #000066;">-eq</span> <span style="color: #009900;">&quot;Bob&quot;</span> <span style="color: #333;">&#125;</span> <span style="color: #66cc66;">|</span> <span style="color: #660033;">Select</span> <span style="color: #000066;">-First</span> <span style="color: #cc66cc;">1</span><br />
&nbsp;</div>

	<p>This <em>works</em> but will have some performance issues when the list gets too long, because your search algorithm is &#8230; well, we could say it&#8217;s O(n), or just say it&#8217;s slow, because you&#8217;re basically going through the whole list every time. And I do mean the <span class="caps">WHOLE</span> list. You were looking for &#8220;Bob&#8221; ... but the <code>Where-Object</code> cmdlet doesn&#8217;t know your data is sorted, and doesn&#8217;t know you only want one user, so it&#8217;s going to iterate the whole list, because <code>Select-Object</code> can&#8217;t signal it to stop.</p>

	<p>There are several ways to solve this, but what it comes down to is that you need a (better) search algorithm than Where-Object can implement: simply put, you&#8217;d like to use a binary search, since the data is already sorted.</p>

	<h4>Binary Search</h4>

	<p>I&#8217;m not going to get into this much, but to put it simply: a binary search algorithm is pretty much the fastest search you can have, but it depends on the data being pre-sorted in order.  It works by starting in the middle, and comparing the middle item to the item you are searching for. If what you&#8217;re searching for comes <span class="caps">BEFORE</span> that item, then it discards the second half of the items, and examines the middle item of the first half.  In this way, it eliminates half the items each time, and returns the correct item (if it exists) in roughly (log n) steps.</p>

	<h4>So how do we do that in PowerShell?</h4>

	<p>Well, that&#8217;s easy: Just use the static method: <code>[Array]::BinarySearch</code>&#8230; The only problem is, in order for BinarySearch to work, the items have to be not only sorted, but comparable.  Because we&#8217;re trying to compare PSObjects, you can&#8217;t just compare two items from the $users array and determine the correct order&#8230;</p>

	<p>If you wanted to use <code>[Array]::Sort</code> you could use an overload that lets you pass in a second array (of comparable items) to serve as the keys for the non-comparable items, but this won&#8217;t work for BinarySearch:<br />
<div class="posh code posh" style="font-family:monospace;"><br />
<span style="color: #003366; font-weight: bold;"><span style="color: #333;">&#91;</span><span style="color: #003366; font-weight: bold;">Array</span><span style="color: #333;">&#93;</span></span>::<span style="color: #660033;">Sort</span><span style="color: #333;">&#40;</span> $<span style="color: #333;">&#40;</span>@<span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$users</span><span style="color: #66cc66;">|%</span><span style="color: #333;">&#123;</span><span style="color: #660033; font-weight: bold;">$_</span>.<span style="color: #003366;">Firstname</span><span style="color: #66cc66;">+</span><span style="color: #009900;">&quot; &quot;</span><span style="color: #66cc66;">+</span><span style="color: #660033; font-weight: bold;">$_</span>.<span style="color: #003366;">Lastname</span><span style="color: #333;">&#125;</span><span style="color: #333;">&#41;</span><span style="color: #333;">&#41;</span>,<span style="color: #660033; font-weight: bold;">$users</span> <span style="color: #333;">&#41;</span></div>

	<p>So you need what is an &#8220;IComparer.&#8221;  That is, a class which implements a &#8220;Compare&#8221; method which can be used to determine the order of the objects.  Sadly, this requires a custom type &#8212; something you can&#8217;t do in pure PowerShell script, believe it or not.  PowerShell does not have the ability to create custom classes (which is how you can tell it&#8217;s not a real programming language), so when you need a custom type in PowerShell you have to drop to another .Net language like C# or Python. </p>

	<p>In PowerShell 2.0 there is a cmdlet &#8220;Add-Type&#8221; which can take a string representing C# source-code, and compile it in memory for immediate use, but in PowerShell 1.0 you need to have a script version of it, which I called New-Type, to avoid confusion because it only handles one of Add-Type&#8217;s four usage models:</p>

	<p><script type="text/javascript" src="http://PoshCode.org/embed/720"></script></p>

	<p>Once you have Add-Type, you can create a ScriptBlock-based implementation of IComparer, by passing this as a string:</p>

	<div class="posh code posh" style="font-family:monospace;"><span style="color: #0066cc; font-style: italic;">New-<span style="font-style: normal;">Type</span></span> @<span style="color: #009900;">&quot;</span></div><br />
<div class="csharp code csharp" style="font-family:monospace;"><br />
<span style="color: #0600FF;">using</span> <span style="color: #008080;">System</span><span style="color: #008000;">;</span><br />
<span style="color: #0600FF;">using</span> <span style="color: #008080;">System.Collections</span><span style="color: #008000;">;</span><br />
<span style="color: #0600FF;">using</span> <span style="color: #008080;">System.Management.Automation</span><span style="color: #008000;">;</span><br />
&nbsp; &nbsp;<span style="color: #0600FF;">public</span> <span style="color: #0600FF;">sealed</span> <span style="color: #FF0000;">class</span> ScriptBlockComparer <span style="color: #008000;">:</span> IComparer<br />
&nbsp; &nbsp;<span style="color: #000000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; ScriptBlock _Comparer<span style="color: #008000;">;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">public</span> ScriptBlockComparer<span style="color: #000000;">&#40;</span>ScriptBlock Comparer<span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span> ComparerScript <span style="color: #008000;">=</span> Comparer<span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">public</span> ScriptBlock ComparerScript<br />
&nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;get <span style="color: #000000;">&#123;</span> <span style="color: #0600FF;">return</span> _Comparer<span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;set <span style="color: #000000;">&#123;</span> _Comparer <span style="color: #008000;">=</span> value<span style="color: #008000;">;</span> <span style="color: #000000;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">public</span> <span style="color: #FF0000;">int</span> Compare<span style="color: #000000;">&#40;</span><span style="color: #FF0000;">object</span> x, <span style="color: #FF0000;">object</span> y<span style="color: #000000;">&#41;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #0600FF;">try</span> <span style="color: #000000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">return</span> <span style="color: #000000;">&#40;</span><span style="color: #FF0000;">int</span><span style="color: #000000;">&#41;</span>ComparerScript.<span style="color: #0000FF;">Invoke</span><span style="color: #000000;">&#40;</span>x, y<span style="color: #000000;">&#41;</span><span style="color: #000000;">&#91;</span><span style="color: #FF0000;">0</span><span style="color: #000000;">&#93;</span>.<span style="color: #0000FF;">BaseObject</span><span style="color: #008000;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #000000;">&#125;</span> <span style="color: #0600FF;">catch</span><span style="color: #000000;">&#40;</span>Exception ex<span style="color: #000000;">&#41;</span> <span style="color: #000000;">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; <span style="color: #0600FF;">throw</span> <a href="http://www.google.com/search?q=new+msdn.microsoft.com"><span style="color: #008000;">new</span></a> InvalidOperationException<span style="color: #000000;">&#40;</span><span style="color: #666666;">&quot;Comparer Script failed to return an integer!&quot;</span>, ex<span style="color: #000000;">&#41;</span><span style="color: #008000;">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; &nbsp;<span style="color: #000000;">&#125;</span><br />
&nbsp; &nbsp; &nbsp; <span style="color: #000000;">&#125;</span><br />
&nbsp; &nbsp;<span style="color: #000000;">&#125;</span></div><br />
<div class="posh code posh" style="font-family:monospace;"><span style="color: #009900;">&quot;@</span></div>

	<p>So now you have everything you need, and you can write your custom comparer, and do binary search. The one thing you need to be aware of is that when you&#8217;re doing binary search, you&#8217;re going to want to pass in the thing you&#8217;re searching for (ie: the first name) not an Object with a FirstName property. So your binary search needs a case for when the second parameter is a string:</p>

	<div class="posh code posh" style="font-family:monospace;"><br />
<span style="color: #660033; font-weight: bold;">$cmp</span> <span style="color: #66cc66;">=</span> <span style="color: #0066cc; font-style: italic;">New-<span style="font-style: normal;">Object</span></span> ScriptBlockComparer <span style="color: #333;">&#123;</span><br />
<span style="color: #666699; font-weight: bold;">Param</span><span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$one</span>, <span style="color: #660033; font-weight: bold;">$two</span><span style="color: #333;">&#41;</span><br />
<span style="color: #666699; font-weight: bold;">if</span><span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$two</span> <span style="color: #000066;">-isnot</span> <span style="color: #003366; font-weight: bold;"><span style="color: #333;">&#91;</span><span style="color: #003366; font-weight: bold;">string</span><span style="color: #333;">&#93;</span></span><span style="color: #333;">&#41;</span> <span style="color: #333;">&#123;</span> <span style="color: #666699; font-weight: bold;">throw</span> <span style="color: #009900;">&quot;This comparer expects to compare objects to strings&quot;</span> <span style="color: #333;">&#125;</span><br />
&nbsp; &nbsp;<span style="color: #660033; font-weight: bold;">$first</span>,<span style="color: #660033; font-weight: bold;">$last</span><span style="color: #66cc66;">=</span> <span style="color: #660033; font-weight: bold;">$two</span>.<span style="color: #333399; font-weight: bold; font-style: italic;">Split</span><span style="color: #333;">&#40;</span><span style="color: #009900;">&quot; &quot;</span><span style="color: #333;">&#41;</span><br />
&nbsp; &nbsp;<span style="color: #660033; font-weight: bold;">$ord</span> <span style="color: #66cc66;">=</span> <span style="color: #660033; font-weight: bold;">$one</span>.<span style="color: #003366;">Firstname</span>.<span style="color: #003366;">CompareTo</span><span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$first</span><span style="color: #333;">&#41;</span> <br />
&nbsp; &nbsp;<span style="color: #666699; font-weight: bold;">if</span><span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$ord</span> <span style="color: #000066;">-eq</span> <span style="color: #cc66cc;">0</span> <span style="color: #000066;">-and</span> <span style="color: #660033; font-weight: bold;">$last</span><span style="color: #333;">&#41;</span> <span style="color: #333;">&#123;</span> <span style="color: #660033; font-weight: bold;">$ord</span> <span style="color: #66cc66;">=</span> <span style="color: #660033; font-weight: bold;">$one</span>.<span style="color: #003366;">LastName</span>.<span style="color: #003366;">CompareTo</span><span style="color: #333;">&#40;</span><span style="color: #660033; font-weight: bold;">$last</span><span style="color: #333;">&#41;</span> <span style="color: #333;">&#125;</span><br />
&nbsp; &nbsp;<span style="color: #666699; font-weight: bold;">return</span> <span style="color: #660033; font-weight: bold;">$ord</span><br />
<span style="color: #333;">&#125;</span><br />
<br />
<span style="color: #666666; font-style: italic;"># Now you can use this to search</span><br />
<span style="color: #660033; font-weight: bold;">$users</span><span style="color: #333;">&#91;</span><span style="color: #003366; font-weight: bold;"><span style="color: #333;">&#91;</span><span style="color: #003366; font-weight: bold;">Array</span><span style="color: #333;">&#93;</span></span>::<span style="color: #003366;">BinarySearch</span><span style="color: #333;">&#40;</span> <span style="color: #660033; font-weight: bold;">$users</span>, <span style="color: #009900;">&quot;Joel&quot;</span>, <span style="color: #660033; font-weight: bold;">$cmp</span> <span style="color: #333;">&#41;</span><span style="color: #333;">&#93;</span><br />
<span style="color: #666666; font-style: italic;"># Or this</span><br />
<span style="color: #660033; font-weight: bold;">$users</span><span style="color: #333;">&#91;</span><span style="color: #003366; font-weight: bold;"><span style="color: #333;">&#91;</span><span style="color: #003366; font-weight: bold;">Array</span><span style="color: #333;">&#93;</span></span>::<span style="color: #003366;">BinarySearch</span><span style="color: #333;">&#40;</span> <span style="color: #660033; font-weight: bold;">$users</span>, <span style="color: #009900;">&quot;Joel Bennett&quot;</span>, <span style="color: #660033; font-weight: bold;">$cmp</span> <span style="color: #333;">&#41;</span><span style="color: #333;">&#93;</span><br />
&nbsp;</div>

	<p>Now, you may not think this is a big deal, so let me just share with you the results of searching a collection of about 49000 users:</p>

<table>
<tr><th>Duration(Sec)</th><th>Commmand</th></tr>
<tr><td>6.74000</td><td>$users = $users | Sort FirstName, LastName</td></tr>
<tr><td>6.68200</td><td>$users | Where { $_.FirstName -eq &#8220;Joel&#8221; }</td></tr>
<tr><td>0.02900</td><td>$users[[Array]::BinarySearch( $users, &#8220;Joel&#8221;, $cmp )</td></tr>
</table>

	<p>What you see here is two things:
	<ol>
		<li>Searching for a single user using <code>Where-Object</code> takes pretty much as long as (re)sorting the whole array, because every single item must be checked.</li>
	</ol>
	<ol>
		<li>Using BinarySearch rocks  <img src='http://huddledmasses.org/wordpress/wp-includes/' alt=':D' class='wp-smiley' /> </li>
	</ol><br />
What you might not notice is that <code>[Array]::BinarySearch</code> returns an index, not the item &#8230; so it&#8217;s particularly useful if you need to find the first of something, etc. because you can just increment the result. </p>

	<p><strong>BinarySearch is not always faster</strong>: If your array is initially unsorted, the requirement to pre-sort the array costs basically the same as any speedup you might get for a single search. Also, because of the call to the script, if the array is <strong>very</strong> small, like the four users in the sample above, it&#8217;s actually faster to just loop through them (20ms versus 24ms on my PC). The BinarySearch starts paying off pretty quickly as you scale up the number of searches and records, but it <strong>does take <em>both</em></strong> more records (the break-even point for just the search is about 10, in this contrived example), and more searches (if you have to eat the cost of the initial sort).</p>

	<p>In any case, New-Type certainly has many other uses, and the <strong>ScriptBlockComparer</strong> can be used for other things as well &#8230; although I will say that as far as I can tell, sorting is almost always better performed with the sort cmdlet than with [Array]::Sort and an IComparer.  <img src='http://huddledmasses.org/wordpress/wp-includes/' alt=';)' class='wp-smiley' /> </p>]]></content:encoded>
			<wfw:commentRss>http://huddledmasses.org/custom-icomparers-in-powershell-and-add-type-for-v1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

