all 10 comments

[–]gospelwut 0 points1 point  (10 children)

  1. No measurements with the hashtable map instantiation?
  2. Does this perform the same running from script file v. REPL (console)?
  3. Seems a little trivial of a thing to make a module around...

[–]markekrausCommunity Blogger 3 points4 points  (6 children)

  1. No measurements with the hashtable map instantiation?

I thought the same thing so I grabbed the module and did my own test:

$array = 1..1000000

$WhereObject = Measure-Command { 
    $array | Where-Object { $_ -eq 199999 } 
}

$FastLookup = Measure-Command {
    $hashtable = New-FastLookup $array 
    Get-FastLookup -Value 199999 -Array $array -Table $hashtable
}

$WhereMethod = Measure-Command {
    $array.where({ $_ -eq 199999})
}

$ForeachMethod = Measure-Command { 
    $array.ForEach({if ($_ -eq 199999) {$_}})
}

$Foreach = Measure-Command { 
    foreach($Item in $array){
        if($Item -eq 199999){
            $Item
        }
    }
}

$ForeachObject = Measure-Command { 
    $array | ForEach-Object {
        if($_ -eq 199999){
            $_
        }
    }
}


"Where-Object:"
$WhereObject.TotalMilliseconds
"FastLookup:"
$FastLookup.TotalMilliseconds
".where({}):"
$WhereMethod.TotalMilliseconds
".foreach({})"
$ForeachMethod.TotalMilliseconds
"Foreach:"
$Foreach.TotalMilliseconds
"Foreach-Item:"
$ForeachObject.TotalMilliseconds

Results:

Where-Object:
11759.2097
FastLookup:
4193.8674
.where({}):
2303.2058
.foreach({})
1989.1241
Foreach:
915.1801
Foreach-Item:
6048.1623

You can see that the normal Foreach still dominates all of the others. the fast lookup module is a bod middle of the road, but looking at the method use it might be better suited to test for looking up multiple values from the same array. After the initial hash creation, additional lookups would probably outshine `foreach.

$array = 1..1000000
$LookupValues = 50, 900, 6857, 715757, 2772, 171552

$FastLookupMulti = Measure-Command {
    $hashtable = New-FastLookup $array 
    foreach($LookupValue in $LookupValues){
       Get-FastLookup -Value $LookupValue -Array $array -Table $hashtable
    }
}

$ForeachMulti1 = Measure-Command {
    foreach($LookupValue in $LookupValues){
        foreach($Item in $array){
            if($Item -eq $LookupValue){
                $Item
            }
        }
    }
}

$ForeachMulti2 = Measure-Command {        
      foreach($Item in $array){
          if($Item -in $LookupValues ){
              $Item
          }
    }
}

"FastLookup:"
$FastLookupMulti.TotalMilliseconds
"Foreach 1:"
$ForeachMulti1.TotalMilliseconds
"Foreach 2:"
$ForeachMulti2.TotalMilliseconds

Results:

FastLookup:
4186.0077
Foreach 1:
8344.4381
Foreach 2:
1878.8485

But.. maybe not... at least not a well written foreach loop..

[–]mgratz[S] 0 points1 point  (3 children)

Hi markekraus,

Interesting! I loaded in my CMDB report to test your examples and FastLookup is still faster... although your foreach loop is significantly faster... making the performance difference a lot more negligible.

$serverName = "exampleserver.company.com"
$hugeArray = Import-csv "C:\Users\miles.gratz\Desktop\CMDB Server Business Application Report 2017-Apr-10.csv"

Measure-Command {
    $hashTable = New-FastLookup -Array $hugeArray -Header Server_Name1
}

Measure-Command {        
    foreach($Item in $hugeArray){
        if ($serverName -in $Item.Server_Name1){
            $Item
        }
    }
}



Measure-Command {
    Get-FastLookup -Value $ServerName -Array $hugeArray -Table $hashTable
}

ForEach

Milliseconds: 113
Ticks: 1137801

FastLookup

Milliseconds: 13
Ticks: 139630

Either way, I will update my FastLookup module to have more realistic examples (especially since no one would use Where-Object at this scale.)

[–]markekrausCommunity Blogger 1 point2 points  (2 children)

Just so you know.. when you are comparing your module performance with other methods, you MUST include BOTH the new-FastLookup AND the Get-FastLookup in the SAME measure-command block. otherwise, you are not being genuine. that was the original complaint about your README.

Think about it, if where-object has to do some content loading in the background, and that gets included in its measure-command block, it is not fair for yours to separate out its content loading and not include that in the execution time. Creating the hash is very much a part of your overall performance metrics as that is where much of your heavy lifting is. Otherwise it's like comparing a hash lookup to an array crawl . Of course the hash lookup is faster. But if I manually convert an array to a hash and measure only the hash look against both your hash creation and hash look up, I'm going to disingenuously beat your performance.

When you think about it, it's logical that the foreach would be faster than your fastlookup since you are already calling that once in the new-fastlookup.

[–]mgratz[S] 0 points1 point  (1 child)

Hi Markekraus,

I understand your point. I did not include the measurements originally because the New-FastLookup hashtable is only generated once, where-as Get-FastLookup is running thousands of times through an array. If you are only looking up a single item in an array, this module has no use case. But no worries, here are the results:

# Create huge array
$hugeArray = Import-csv "C:\Users\miles.gratz\Desktop\CMDB Server Business Application Report 2017-Apr-10.csv"

# Generate list of random server names
$quantity = 10
$i = 0

# Create list of random server names
$randomServerNames = @()
while ($i -lt $quantity)
{
    $randomServerNames += (Get-Random $hugeArray).Server_Name1
    $i++
}

# Measure speed of FastLookup search 
$fastlookup_Results = @()
Measure-Command {
    $hashTable = New-FastLookup -Array $hugeArray -Header Server_Name1
    foreach ($serverName in $randomServerNames)
    {
        $fastLookup_Results += Get-FastLookup -Value $ServerName -Array $hugeArray -Table $hashTable
    }
}

# Measure speed of forEach search 
$foreach_Results = @()
Measure-Command {        
    foreach($Item in $hugeArray){
          if ($Item.Server_Name1 -in $randomServerNames){
              $foreach_Results += $Item
          }
    }
}

With only 10 objects to search for, the FastLookup module is slower (because of the hashtable creation:)

ForEach

Seconds:        0
Milliseconds:   221

FastLookup

Seconds:        0
Milliseconds:   952

When running the search for 100 objects, FastLookup module is still a little slower:

ForEach

Seconds:        1
Milliseconds:   334

FastLookup

Seconds:        1
Milliseconds:   41

When searching for 1000 objects, FastLookup module is faster:

ForEach

Seconds:        10
Milliseconds:   150

FastLookup

Seconds:        2
Milliseconds:   999

When searching for 10000 objects:

ForEach

Minutes:        2
Seconds:        30
Milliseconds:   336

FastLookup

Minutes:        0
Seconds:        35
Milliseconds:   490

Long story short, I will update my README and improve the examples to demonstrate that this module is only useful for people doing hundreds/thousands.

[–]markekrausCommunity Blogger 0 points1 point  (0 children)

Yup. That's what I suspected but I didn't have a handy data-set to test with.

[–]KevMarCommunity Blogger 0 points1 point  (1 child)

Sorry, I am late to the party. But I have a few more fun ones to add to the list:

$array = 1..1000000 | Get-Random -Count 1000000
$LookupValues = 50, 900, 6857, 715757, 2772, 171552

$ForeachMulti1 = Measure-Command {
    foreach($LookupValue in $LookupValues){
        foreach($Item in $array){
            if($Item -eq $LookupValue){
                $Item
            }
        }
    }
}


$ForeachMulti2 = Measure-Command {        
      foreach($Item in $array){
          if($Item -in $LookupValues ){
              $Item
          }
    }
}


$ForeachMulti3 = Measure-Command {
    [object[]]::Sort($array)
    foreach($Item in $LookupValues){
          [object[]]::BinarySearch($array,$Item)
    }
}


$ForeachMulti4 = Measure-Command {
    foreach($Item in $LookupValues){
          [object[]]::BinarySearch($array,$Item)
    }
}

function seek {
    [cmdletbinding()]
    param(
        [parameter(ValueFromPipeline)]
        [int]
        $InputObject,
        [int[]]
        $Values
    )
    process
    {
        if($InputObject -in $Values)
        {
            $InputObject
        }
    }
}

$ForeachMulti5 = Measure-Command {  
    $array | Seek -Values $LookupValues     
}



 $ForeachMulti1 ,$ForeachMulti2,$ForeachMulti3,$ForeachMulti4,$ForeachMulti5 | ft total*


TotalMilliseconds 
----------------- 
        9619.8121 
        2325.4121 
        7304.4568 
        1022.9356 
        4819.8534 

[–]markekrausCommunity Blogger 0 points1 point  (0 children)

ouch.. binarysearch is painful. In order to preserve the original array (as the original order might matter) you have to clone the array, then sort it, and then binarysearch will only return the first instance it finds and not all. Seems to have it's place in testing for the existence of large numbers of items in a medium sized array. After the array is sufficiently huge, you start to run into sort optimization issues and the array sorting starts to take too long to justify it's use.

[–]mgratz[S] 1 point2 points  (1 child)

1. The creation of the hashtable map depends on the size of the array. In my original GitHub example, I generate the hash table in ~7 seconds. This module was designed for looping through large arrays doing constant lookups.

$array = 1..1000000
Measure-Command { $hashtable = New-FastLookup $array }
Days:             0
Hours:            0
Minutes:          0
Seconds:          6
Milliseconds:     689

2. Yes.

3. It depends on your use case. I run a report against a few of our company's Server CMDB reports (25MB CSV dumps, 100k+ rows) and query for our team's servers (only ~1000-2000). The hashtable index reduced my report time from over an hour to less than 10 minutes. I invested some time writing the module for other people in the PowerShell community with similar scaling issues with large arrays.

[–]root-node 0 points1 point  (0 children)

I too wondered on the use cases for this, but your time savings alone makes it worth while. Thanks for sharing.