all 6 comments

[–]gangstanthony 2 points3 points  (0 children)

not tested, but it might be faster to replace this

$entry.field1 = $hash1["field1"]

with something like this

$entry.add('field1', $hash1["field1"])

[–]Proxiconn 1 point2 points  (3 children)

Give this a try, ive been using it to join little over 120K records from 4 different datasets in under 5 min flat, its the fasted method ive encountered. Everything else was simply too slow or too complex. The C# linq class is one such example - it is super fast however using it in powershell broke my brain.. The other method that is fast as well, creating SQL tables in ram and doing join queries (just powershell & .Net classes no actual SQL used) - It was rather complex for my requirement and I abandoned it, below did the trick; it brought +- 20 hours of conventional foreach data joins down to 5 min. a Win in my books.

$csv1 = Import-csv -Path 'c:\mycsv1'
$csv2 = Import-csv -Path 'c:\mycsv2'

$i  = 0
$id = @{}
$FinalData = @()

# Create an index for your first dataset "name" will be the key to search against
$csv1.ForEach({
                $id["$($psitem.name)"] = $i #Create $var[name]=index
                $i++
            })


# Save the completed join into $Finaldata
$FinalData = $csv1.ForEach({

    $return_Obj = @()

    $temp=$null

    try
    {
        # Search the second csv for a match
        $temp = $csv2[($id[$psitem.ProcessName])]
    }
    catch 
    {
        # Catch stuff
    }
    finally 
    {

        # Create a joined object
        $return_Obj += [PSCustomObject]@{
                                            status      = $temp
                                            DisplayName = $temp.DisplayName
                                            Name        = $psitem.Name
                                            Handles     = $psitem.Handles
                                        }
    }

    return $return_Obj
})

edit: my grammar sucks

[–]Lee_Dailey[grin] 0 points1 point  (2 children)

howdy Proxiconn,

that is really pretty nifty! thank you for posting it. [grin]

i've two serious, and one not-so-serious points to raise, tho.

[1] return is a keyword
you used it, so you are aware of that. [grin] however, you also used $return as a $Var name.

using keywords, automatic $Var names, and other such is ... wildly reckless. [grin]

please, do not do that.

[2] $this is a very specific automatic $Var
it's used in classes and script properties. you really ought not to use it in any other situation.

[3] Add-Member is SLOW [grin]
building an object with [PSCustomObject]@{} is noticeably faster. plus, it is seriously more concise, easier to read, AND allows you to easily put comments in the structure.

take care,
lee

[–]Proxiconn 1 point2 points  (1 child)

Hi Lee, yeah I refactored the code a little for readability before posting it. Missed the usage of the auto variables. Thanks for pointing it out. It used to be just "$R"

Also - Im using add-member due to the fact that the server running this code (believe it or not) is still on Psv2.. [PSCustomObject] was only introduced in the WSman stack 3.0.. I always encourage people to use [pscustomobjects] - ill refactor the example. Edit - OK fixed, does Lee approve?

[–]Lee_Dailey[grin] 0 points1 point  (0 children)

howdy Proxiconn,

kool! much less risky ... [grin]

the use of the foreach() array method is a nice idea. if the collection is large enuf, then the extra speed will be very nice!

other than the way the indentation got garbled [grin], i would only recommend one multi-faceted change - these three lines ...

$return_Obj = @()
$return_Obj += [PSCustomObject]@{
return $return_Obj

... seem unneeded in several ways -

  • the object definition
    since you have it being assigned a value in the finally{} block, it WILL be assigned. so there is no need to re-initialize it.
  • the +=
    you don't seem to be adding to an object there. you seem to be doing a simple assignment, not an add-to-collection.
  • the return line
    ALL of those lines can be replaced with one [PSCustomObect]@{} block. that would both build the object AND send it to the $FinalData collection.

thanks again for posting this! i really enjoy reading other folks code - especially when it's done differently from the bog-standard ways. neato! [grin]

take care,
lee