In response to Kirk Munro’s comment on my Writing Cmdlets for the PowerShell Pipeline post:

You know, I’ve looked at your articles about cmdlets/functions in the pipeline and I feel you’re missing something. The purpose of the InputObject parameter is to pass in a collection as a single object. This is as opposed to using the pipeline where a collection is passed along the pipeline one item at a time. There are cases where you want to pass in a collection as a collection.

Quite simply, I disagree. The documentation for these parameters says quite clearly that inputObject “Specifies an object or objects to input to the cmdlet.” This clearly means that I should be able to pass multiple objects, and have them treated as multiple objects, not as a single array object.

If you look at your example (Select -First 3 -Unique -InputObject $a), this does in fact work. It receives one object, an array. It then selects the first 3 objects, but there is only 1 so that is moot. And lastly it selects unique objects, but again there is only 1 so that is moot as well and finally the object is output using the default formatter. In this case the default formatter is showing the contents of the array.

In this example, Select-Object has no reason to take a single object as an input object, at all. The only time that it would be useful for Select-Object to take a single inputObject would be in combination with the property parameters. In fact, if you want to Select-Object from an array to get the first of last n objects, or to get a set of unique objects, you have to pass the objects in via the pipeline — there’s no other way to make it select from an array. If that was indeed the intent, it should have been written as a separate ParameterSet, and the documentation should be changed to reflect that only a single object can be passed in, and that you can’t use the inputObject parameter with the first, last, or unique parameters at all. That’s worse than useless, it’s misleading and confusing.

Kirk is absolutely right that if you assume that the InputObject argument is only allowed to take a single object, then the behavior is correct – but it’s not logical. In fact, the behavior you see in the output of this command is so useless as to be a bug – even if the documentation did not say the parameter accepts multiple objects as input:


Select-Object -input 4,5,6,4,8 -first 2 -unique
4
5
6
4
8
 

I know that that’s the way the built-in cmdlets work.

But quite frankly, just because someone important wrote something useless is no reason to emulate the behavior. The inputObject parameter IS the same parameter which pipeline objects go into. There’s no logical explanation for us to get different results when we pass an array in via the parameter by name instead of via pipeline: the PowerShell pipeline passing the things in the pipeline into the –inputObject parameter … it’s not using some mystical variable like it does in script functions.

Of course, we all know the powershell pipeline unwraps arrays — that’s convenient, and we can work around it when we really want to pass an array in:


PS> @([int[]]@(1,2,3,4)) | % { "Hi: $_" } # typecast and wrapped isn’t enough...
Hi: 1
Hi: 2
Hi: 3
Hi: 4

PS> @(,[int[]]@(1,2,3,4)) | % { "Hi: $_" } # put it as a member of another array
Hi: 1 2 3 4

PS> @(,@(,[int[]]@(1,2,3,4))) | % { "Hi: $_" }  # go too deep and “stuff” happens.
Hi: System.Int32[]
 

My point in all of this is that InputObject is actually a very useful parameter, because there are cases where you really want to pass a collection as a collection into a cmdlet and then do something with it. By making InputObject instead split the collection passed in and pipeline it through, you’re forcing users to wrap collections in an array just to get them passed in as a collection, and personally I don’t feel they should have to do that.

While it’s true that passing in an array is sometimes desirable that’s not the reason the parameter exists, and I don’t believe it should be the default behavior here. It should be just as easy for me to use the cmdlet with the inputObject parameter directly as it is to input them via the pipeline. If I put in unwrapping for the inputObject parameter, you can work around it in the same way I did in the examples above. Incidentally, I think *PowerShell* should unwrap arrays to ValueFromPipeline parameters regardless of whether they’re on the pipeline, but I recognize it’s probably too late for that.

Basically, this is my argument: If inputObject unwraps arrays, the syntax for passing an array by wrapping it in @(,$array) is simple, for those rare occasions when that’s actually what you want. But if it does not unwrap arrays, you’re forced to call it via a separate pipeline, because unwrapping the array and passing it in one at a time in a foreach loop will almost certainly not do the same thing, and this is much uglier — and not compatible with use within the pipeline, particularly if you need to pass the pipeline output into a different parameter.

I guess my final word would be to agree with Kirk that “InputObject … isn’t documented clearly enough” … in fact, it’s clearly behaving incorrectly according to the documentation, and that’s why I originally proposed to unwrap the inputObject parameter when it’s passed as a parameter: to make it work the way the documentation suggests it would, which seems to me to be a better way than the way it actually works.

I’ll write up more information later, but a couple people have asked for this in #PowerShell on irc.freenode.net, and I had it already written, so here you go … my ConvertFrom-Html cmdlet (in a Huddled.HtmlSnapin). It converts HTML to valid xml using the SGML Parser which was available on GotDotNet years ago. It only works with files (doesn’t do URL downloads yet). Use it like this:


$url = "http://huddledmasses.org/"
$file = Join-Path $pwd "HuddledMasses.html"

$client = new-object System.Net.WebClient
$client.DownloadFile( $url, $file ) #NOTE: You need to use a full path here, not relative

$xml = ConvertFrom-Html $file

# Or even
(ConvertFrom-Html $file).Save($file)
 

The source code to my plugin may be considered public domain, and is included in the Huddled HTML SnapIn Zip.

However, the SgmlReader library is a Microsoft Sample which is licensed under the old MS Samples license which doesn’t allow reuse with viral open source software. I’ve seen some work being done on an HtmlAgilityPack on CodePlex (using a Creative Commons ASA license) but I have not really looked at it except to see that it has a several active issues related to entity encoding and dropping malformed tags which I haven’t encountered in SgmlReader …

Well, the first alpha CTP release of PowerShell 2.0 is out, and there’s a lot of new stuff in it … but I won’t repeat the list from the PowerShell blog, because I’m sure you’ve seen it five or six times already. Instead, lets just skip straight to talking about one of the features we’ve been hearing about the longest: in PowerShell 2, you can create Cmdlets in script … bringing nearly full parity between whats possible in a C# cmdlet and what’s possible in script.

There are a few caveats still (Parameter Sets aren’t working yet, and neither is help, really), and a few surprises … there’s a few downsides to PowerShell script vs C# ... but in this particular context one thing that stands out is that in C# the BeginProcessing, ProcessRecord, and EndProcessing blocks are actually methods which can call each other, and as demonstrated in my tutorial for writing cmdlets that work in the pipeline, they can be recursive — without getting duplicate variables.

A sample Script Cmdlet

In the interests of being the first to publish an interesting script cmdlet ;) and to continue my recent trend of talking about writing for the PowerShell pipeline, I’ve merged the logic of my script function and my pipeline cmdlet into a single sample script cmdlet for PowerShell 2.0 and it works great!

A few observations from the process, in no particular order:

  • If you recursively call your cmdlet from within itself, you have to test for parameters using the new $CommandLineParameters.ContainsKey because parameter variables keep their values through recursion if you don’t explicitly pass a value.
  • $CommandLineParameters.ContainsKey works differently in the Begin block where it will return $false for arguments which will get their values from the pipeline, than in the Process block where it will treat values which were passed as CommandLineParameters the same as those which were passed via the pipeline.
  • If you want to see how your function behaves in a pipeline, you should make sure to test it at different points in the pipeline: at the front, in the middle, and at the end.
  • Cmdlets are functions: they show up in the Function provider.
  • Cmdlets are functions: they have to be dot-sourced before you can call them.
  • Cmdlets are not functions: they are a single command Cmdlet which takes a name (which must have a – in it) and a couple of other parameters followed by a function script block.
  • When you recurse by executing &($MyInvocation.InvocationName), that second invocation has an InvocationName of “&” ... so you can’t go any further (this might be a good thing, if you want to stop recursion at one level no matter what. If you want to go further, you need to put your commands into a string, and use Invoke-Expression.

#requires -version 2.0
###################################################################################################
## A Template for Script Cmdlets which can _also_ be executed in the pipeline ....
##   by Joel Bennett, in hopes it will help...
## Version History
## v1.0 - First public release (after over 9 different versions in my various other functions)
## v1.2 - Show the use of Write-Output, and change "return" in the BEGIN to "Write-Output" to avoid
##      the pooling of the output from the process block when it's invoked as a function.
## v1.3 - Switched back to "break" instead of "return" so that if you pass via the pipeline AND via
##      the inputObject, only the inputObject gets process (this is how cmdlets behave).
##      - Cleaned up the comments, and removed the confusing alternate method and $args handling
##
## v2.0 - First Version as a Script Cmdlet.
##      This is much easier with support for [ValueFromPipeline] and [ValueFromPipelineByName]
##
###################################################################################################
Cmdlet Test-PipelineV2 -ConfirmImpact low -snapin Huddled.Tests
{
   Param (
      [Position(0)] [ConsoleColor] $Color,
      [Position(1)] [Mandatory] [ValueFromPipeline] [String[]] $InputObject
   )
   BEGIN {
      if ($CommandLineParameters.ContainsKey("InputObject")) {
         # Don't do anything here, because we're about to get re-invoked...
         $FromArgs = $true
      } else {
         # Normal "run-once" BEGIN processing
         $FromArgs = $false
         Write-Verbose "Begin $Color"
      }
   }
   PROCESS {
      # We no longer have to test for $_ or even to see if the [ValueFromPipeline] param is set
      # It *HAS* to be set, because it's a [Mandatory] parameter :)
      if ($FromArgs) {
         # Don't do anything here except re-invoke ourselves.
         Write-Output $InputObject | &($MyInvocation.InvocationName) $Color
      } else {
         # Normal Pipeline-friendly per-item processing
         Write-Host "Process: $InputObject" -Fore $Color
         ## You should make a practice of explicitly calling Write-Output on things
         ## That's how you emit them into the pipeline instead of just printing them
         Write-Output $InputObject
      }
   }
   END {
      if ($FromArgs) { 
         # Don't do anything here ... it just confuses things
      } else {
         # Normal "run-once" END processing
         Write-Verbose "End $Color"
      }
   }
}

 

A test case


## Test Script:
##
## "a","b","c" | Test-PipelineV2 "Cyan" -verbose
## @("a","b","c") | Test-PipelineV2 "Cyan" -verbose
## Test-PipelineV2 "Cyan" @("a","b","c") -verbose
##
## "a","b","c" | Test-PipelineV2 "Cyan" -verbose | Test-PipelineV2 "Green" -verbose
## @("a","b","c") | Test-PipelineV2 "Cyan" -verbose | Test-PipelineV2 "Green" -verbose
## Test-PipelineV2 "Cyan" @("a","b","c") -verbose  | Test-PipelineV2 "Green" -verbose
###################################################################################################
## Expected Output (sorry, no color here...)

VERBOSE: Begin Cyan
Process: a
a
Process: b
b
Process: c
c
VERBOSE: End Cyan
VERBOSE: Begin Cyan
Process: a
a
Process: b
b
Process: c
c
VERBOSE: End Cyan
VERBOSE: Begin Cyan
Process: a
a
Process: b
b
Process: c
c
VERBOSE: End Cyan
VERBOSE: Begin Cyan
VERBOSE: Begin Green
Process: a
Process: a
a
Process: b
Process: b
b
Process: c
Process: c
c
VERBOSE: End Cyan
VERBOSE: End Green
VERBOSE: Begin Cyan
VERBOSE: Begin Green
Process: a
Process: a
a
Process: b
Process: b
b
Process: c
Process: c
c
VERBOSE: End Cyan
VERBOSE: End Green
VERBOSE: Begin Green
VERBOSE: Begin Cyan
Process: a
Process: a
a
Process: b
Process: b
b
Process: c
Process: c
c
VERBOSE: End Cyan
VERBOSE: End Green
 

In a continuation of what is, sadly, becoming a series on how the PowerShell Pipeline works … Karl Prosser brought to my attention that certain powershell commands which have an -InputObject parameter don’t actually work when you pass something into it … so I thought I should create a cmdlet to show you how to correctly handle the InputObject parameter with the ValueFromPipeline set so you can pass the input in either way.

To demonstrate the problem, try this:


$a = @("A","B","A","C")
$a | Select -First 3 -Unique
Select -First 3 -Unique -InputObject $a
 

This should expose two weirdnesses about how the Select-Object cmdlet works:

  1. The -First parameter affects the input before the -Unique parameter does.
  2. When you pass the input in via -InputObject, the whole array is treated as a single object, and the command basically doesn’t do anything.

The big problem with this behavior is that there’s essentially no hint that you’ve done something wrong — there’s actually no way to make Select-Object work properly except by passing the objects in via the pipeline. The bigger problem is that it would have been simple for the Microsoft team to catch this and alert you, but they didn’t — so you probably won’t even notice there’s a problem until you run it on a trivial data set like my example. The even bigger problem is that it doesn’t just affect Select-Object (try it with Where-Object, just for instance). (more…)