Huddled Masses
You can do more than breathe for free...
Browse: Home / PowerShell Convert Delimiters, Import Tab Delimited Text…

PowerShell Convert Delimiters, Import Tab Delimited Text…

By Joel 'Jaykul' Bennett on 07-Oct-2007

Well, just for a change, lets cut straight to the script ( [new] 5pm EST Sunday):


Function Convert-Delimited([regex]$from,[string]$to)
{
  process
  {
    $_ = $_ -replace "(?:`"((?:(?:[^`"]|`"`"))+)(?:`"$from|`"`$))|(?:((?:.(?!$from))*.)(?:$from|`$))","Þ`$1`$2Þ$to"
    $_ = $_ -replace "Þ(?:$to|Þ)?`$","Þ"
    $_ = $_ -replace "`"`"","`"" -replace "`"","`"`""
    $_ = $_ -replace "Þ((?:[^Þ`"](?!$to))+)Þ($to|`$)","`$1`$2"
    $_ = $_ -replace "Þ","`"" -replace "Þ","`""
    $_
  }
}

Basically, this is a set of regular expressions that can be run in the pipeline to convert delimited text from one delimiter to another. If the new delimiter needs quotes added to make it work correctly, the converter will do so (actually, it adds a marker character Þ around every field, and then removes them where they are not needed, and replaces them with quotes where they are — long story, and I welcome any improvements).

Just as an example, assume a column-based text file like this Hosts.txt file (notice that the columns are separated by two or more spaces, since there are spaces within the text columns):


Host          Standout Features                   Languages                            Space/Bandwidth   Monthly
BlueHost      PostgreSQL + MySQL, Free Cert       PHP5,Perl,Python,Ruby                   300GB/3TB      .95/mo for 2yrs
DreamHost     Jabber,SVN,WebDAV                   PHP4,PHP5,Perl,Python,Ruby              500GB/5TB      .95/mo for 2yrs
NetFirms Adv  2 domains, Shared Cert              PHP4,PHP5,Perl,Ruby,Python              250GB/2TB      .95/yr with "995" code
NetFirms Bus  Windows + Linux, MsSQL, 5 domains   ASP,ASP.NET,PHP4,PHP5,Perl,Ruby,Python  300GB/3TB      .95/mo for 3mo

You can convert it to CSV by using this function, and write it back out to file like this: Get-Content Hosts.txt | Convert-Delimiter " + " "," | Set-Content Hosts.csv. Now, since my original problem was all about importing the files I also went ahead and wrote a slightly more magical function called Import-Delimited:


Function Import-Delimited([regex]$delimiter=",", [string]$PsPath="")
{
  begin{
    $script:tmp = [IO.Path]::GetTempFileName()
    write-debug "Using tempfile $($script:tmp)"
   
    Function Import-String([string]$inputString){
      if($inputString.Length -gt 0 ) {
      write-debug "Importing $inputString"
      if(($inputString -as [IO.FileInfo]).Exists) {
        Get-Content $inputString | Convert-Delimiters $delimiter "," | Add-Content $script:tmp
      } elseif( ((Join-Path $pwd $inputString) -as [IO.FileInfo]).Exists) {
        Get-Content (Join-Path $pwd $inputString) | Convert-Delimiters $delimiter "," | Add-Content $script:tmp
      } else {
        $inputString | Convert-Delimiters  $delimiter "," | Add-Content $script:tmp
      }
      } else {
        write-debug "Nothing to Import"
      }
    }
   
    Import-String $PsPath
  }
 
  process{
    Import-String $_
  }
  end
  {
    Import-Csv $script:tmp
    Remove-Item $script:tmp
  }
}

This will allow you to directly import most delimited text files. But not only that, it can import text from the pipeline! So you can actually create objects out of any legacy application that outputs delimited text! Just for example, with our previous text file, we could do any of the following one-liners:


$hosts = Import-Delimited " + " Hosts.txt

$hosts = Get-Content Hosts.txt | Import-Delimited " + "

$hosts = Get-Content Hosts.txt | Convert-Delimiters $delimiter "," | Import-Delimited # CSV is the default

Any questions? Here’s the pair of functions as a single script download. :)

Question 1: What’s with the “ + “?

It’s a regular expression meaning one or more spaces followed by a space (ie: two or more spaces).

Question 2: Regular expressions? How do I just use this to import a tab-delimited text file?

The regular expressions for separators should be pretty simple, really. For instance, a tab-delimited text file would just be “`t” and a comma-delimited file is just a comma: “,” ... if you wanted to make sure to get rid of any spaces around the commas, you could use “ +, +” but we should mostly be taking care of that already.

Question 3: I get weird errors about the value of argument “name” being invalid?

Import-Delimited actually uses Import-Csv in the background, and requires a header row which has a name for each column: You can’t have a two separators in a row, or a trailing separator (although the scripts try to clean that up). Incidentally, this is a really good point: you must have the first row be the delimited headers, otherwise your first object will be treated as the header row.

Similar Posts:

  • The problem with calling legacy/native apps from PowerShell
  • What Scope Am I In?
  • A DSL for XML in PowerShell: New-XDocument
  • Creating WPF UIs for PowerShell with PowerBoots and Visual Studio WPF Designer
  • Better error messages for PowerShell ValidatePattern

Posted in Huddled | Tagged Import-Csv, PowerShell, RegEx, Scripting

« Previous Next »

Lijit Search

Tags

.Net .Net 2008 Scripting Games Automation Bugs Design Development Funny Gadgets GeoShell GUI Huddled Masses Internet licensing Microsoft Modules My Software News Personal PInvoke Pipeline Politics PoshCode PoshConsole PowerBoots PowerShell PowerShell Functions PowerTips Rants Recommender Repository Scripting ShowUI Software Solutions Textile Tips User Group UserInterface WalkThrough WebHosting Windows 7 WordPress WPF Xml

About Huddled Masses

This is web site is dedicated to the musings of Joel Bennett (aka Jaykul) about technology, software, software development, the web, and the world.

Any resemblance of the views expressed and the views of my employer, my terminal, or the view out my window are purely coincidental. The resemblance between them and my own views is non-deterministic. The question of the existence of views in the absence of anyone to hold them is left as an exercise for the reader.

P.S.: I occasionally link to things I think are great. When I do, I occasionally find a "referral code" so I can make a little cash. I promise that I don't link to anything just because of that cash (I wouldn't cross the street for the amount of cash those links bring in, never mind write a whole blog post) ... but I do not promise that things I link to will stay great as time passes, nor that you will agree with me about their greatness!

Archives

  • January 2012
  • October 2011
  • August 2011
  • July 2011
  • June 2011
  • March 2011
  • February 2011
  • January 2011
  • November 2010
  • August 2010

Copyright © 2012 Joel Bennett.

Powered by WordPress and Hybrid.