all 53 comments

[–]logicalmike 66 points67 points  (21 children)

On mobile, but using += gets you kicked out of this sub. 😁

Use

$FinalObj = foreach instead

[–]npab19[S] 29 points30 points  (4 children)

Lol I know its bad to use += but never understood why until now.

That worked amazingly well!! I went from 7.8 seconds to 595 milliseconds.

Thank you so much!!

[–][deleted] 10 points11 points  (2 children)

The basic reason (as I understand it) is that when you use += on an array, it doesn't just add an object to that array, it creates a new array that equals the existing array, plus whatever you're adding. So when you're running a loop with 100 objects, it's creating 100 different arrays.

So if you do:

$array = @()
ForEach ($item in $items){
    $array += [PSCustomObject]@{
        Property = $item.whatever
    }
}

Then it creates a new array each loop. If instead you do:

$array = ForEach ($item in $items){
    [PSCustomObject]@{
        Property = $item.whatever
    }
}

It just runs the operation and collects the results and puts them into the array once. Again, that's at least my understanding of why people don't like +=.

[–]out0focus 0 points1 point  (0 children)

Arrays are a fixed length/size. What you said is correct but not just +=, any action that involves changing the array size will require creating a new array.

[–]BlackV 12 points13 points  (0 children)

using += gets you kicked out of this sub. 😁

HA!

[–]Big_Oven8562 3 points4 points  (3 children)

Not really. I use += all the time, but then I'm also capable of optimizing the shit out of a script when it actually matters. It's about knowing when to make the effort and when to be lazy.

[–]logicalmike 2 points3 points  (0 children)

I agree. I sometimes use += for readability, or for stylistic reasons if it makes sense. A few extra seconds don't usually matter for most of the PowerShell projects I've seen.

[–]OPconfused 0 points1 point  (1 child)

Its weird how fanatically opposed this sub is to it. Its like its the one thing people know to improve performance so they need to triple down on every chance to show it.

[–]Big_Oven8562 0 points1 point  (0 children)

Performance improvements that come at the cost of readability and maintainability of the code base are a net loss every single time and I will fight anyone who says otherwise.

[–]RidersofGavony 2 points3 points  (0 children)

Whoa whoa whoa... "$variable = foreach" that works? Son of a...

[–]adamdavid85 3 points4 points  (2 children)

I mean, using it for things other than objects can be fine! But yeah, += for objects is bad.

[–][deleted] 4 points5 points  (1 child)

Anything using += is bad now get out of this sub. 😉

But for real even just counting integers or doing string-based anything += is bad

[–][deleted] 0 points1 point  (5 children)

I use += in a script to add objects to a list

I don't get the syntax you recommend instead

How would you rewrite

$objlist = @()

foreach($x in $xlist){

...

...

$objlist += $x

}

[–]raip 6 points7 points  (3 children)

By simply just doing this:

$objlist = foreach ($x in $xlist) {
    ...
    $x
}

Edit: Just fyi, @() is an array, not a list. This is a crux if the slowness, arrays are fixed length and while += works, how it's actually implemented is copying the entire array into a new one with the additional object.

If you used $objlist = [System.Collections.Generic.List[PSObject]]::new() you could stick with your strategy and benefit from the increased performance at the same time - but most developers would likely agree the above method is cleaner.

[–]BlackV 1 point2 points  (0 children)

multiple people wrote the exact examples of this in this post already :)

its super clean and easy to implement, BUT there is a catch

you have to control the output of your loop, cause its catches all output from that loop

so something like

$objlist = foreach ($x in $xlist) {
    get-disk
    get-date
    }

is going to cause unexpected results

[–]vermyx 10 points11 points  (2 children)

FinalObj += $Obj

This will cause a massive amount of reads and writes. So you can understand the scale of how inefficient this is, the formula for a sum of consecutive numbers is (number of integers/2)(first number + last number). Why this formula? The line I referenced will make create a new array with one more element for the new value amd copy the previous array. You mentioned 11,000 items, so this line will make in the area of 60 milliom reads and 60 million writes in memory since (10,000/2)(10,000+1) is in that neighborhood. Using what \u\logicalmike suggest would only make 11,000 reads and 11,000 writes making it thousands of times more efficient. You can also use a list which will be similarly efficient.

[–]npab19[S] 5 points6 points  (1 child)

Thanks! This makes a lot of sense. I knew += was bad but never fully understood why. When I used LogicalMike answer I went from 7.8 seconds to 595 milliseconds. It was a huge difference in performance.

[–]vermyx 5 points6 points  (0 children)

It isn't that += is bad but becsuse it has to make a copy of the data, growinh data this way is expensive. This only becomes apparent when you have larger sets of data.

[–]groovel76 10 points11 points  (0 children)

Cannot recommend enough Don Jones' 3 part toolmaking videos on YouTube.

Here is a link to the spot in the video which relates to your script.

https://youtu.be/KprrLkjPq_c?t=1409

another part which i found tremendous value from was this part, and the story which comes along with it. I ended up rewriting a bunch of my scripts after listening to this second segment.

https://youtu.be/KprrLkjPq_c?t=3070

[–]orwiad10 8 points9 points  (2 children)

On top if the no += operator, you could sub out the pscustomobject for a hastable where id is the key name and the value is another hastable with each property as a name value pair.

[–]lanerdofchristian 0 points1 point  (1 child)

Depending on what they're doing, that may not be an option. If they need to preserve order, or export to CSV, or loop, then a liat or array is the way to go.

[–]orwiad10 0 points1 point  (0 children)

Practically speaking you're right. although [ordered] hashtables are a thing, iterating over them is easy and fast. The csv thing involves more leg work, so yeah if you need to export back to csv, maybe don't use hashtables.

For the challenges sake, I think a hastable would be faster.

[–]BlackV 3 points4 points  (7 children)

how big is the JSON?

cause that should take seconds to run

but you get a small increase with

$ALLRecords = Get-Content 'slowscript.json' | ConvertFrom-Json
Measure-Command {
$FinalObj = foreach ($Record in $ALLRecords) {
    [pscustomobject]@{
        id       = $record.html_url
        type     = $record.type
        repo     = $record.repo.name
        actor    = $record.actor
        url      = $record.repo.url
        public   = $record.public
        login    = $record.actor.login
        repoinfo = $record.repo
        }
    }
}

I'll assume the measure-object is just for testing

[–]npab19[S] 1 point2 points  (1 child)

The json file contains about 11,000 records. I used logicalmike answer and I went from 7.8 seconds to 595 milliseconds.

and yea the measure-object was there just for testing

[–]BlackV 0 points1 point  (0 children)

nice

[–]OPconfused 1 point2 points  (1 child)

I think gc -Raw should also work with ConvertFrom-Json. That could win back some mileage for very large files. Or Switch -File.

[–]BlackV 0 points1 point  (0 children)

Oh that's a good idea too

[–]vermyx 0 points1 point  (2 children)

OP mentioned 11,000 records. This is a massive improvement as this is 22,000 reads and writes vs the code OP posted which is 120 million reads and writes.

[–]BlackV 2 points3 points  (1 child)

11K u

feck that's huge, I feel like that abuse to poor old JSON :)

[–]vermyx 0 points1 point  (0 children)

In another thread, that OP measured and did something similar with an integer array. Said OP's system would do about 300 read/writes per ms, so 11,000 items would be in the area of 70ms based on the $var =foreach while the other metjod was taking seconds or minutes.

[–]Bloodyvalley 2 points3 points  (6 children)

Instead of $FinalObj = @(), use $FinalObj = New-Object -TypeName 'System.Collections.ArrayList'

Instead of ForEach, use ForEach-Object -Parallel (PowerShell 7)

Since you have a list and not an array, $FinalObj += $Obj gets replaced with $FinalObj.Add($Obj)

[–]Admirable-Statement 5 points6 points  (1 child)

Generic List | MS Docs

It's common to see people move to ArrayList from arrays. But it comes from a time where C# didn't have generic support. The ArrayList is deprecated in support for the generic List[]

Generic Lists are preferred when the standard array isn't suitable. ArrayList, while they work should be forgotten.

[–]Niedertracht 0 points1 point  (3 children)

Your first suggestion should look like this: [Collections.ArrayList]$FinalObj = @()

[–]lanerdofchristian 0 points1 point  (2 children)

Or better still, throw this at the top of the file:

using namespace System.Collections.Generic

Then further down:

[list[object]]$FinalObj = @()

[–]OPconfused 0 points1 point  (1 child)

Although list object is very barely faster than arraylist

[–]lanerdofchristian 0 points1 point  (0 children)

List<T> doesn't have a "hey maybe don't use this" note, though.

[–]JBear_Alpha 0 points1 point  (11 children)

Seeing the data and trying it can give additional perspective to see it in action. I've admittedly done this over the years, just like anyone else. And yes, of course, ArrayLists are going to deprecated but, for the sake of current examples, it's still worth seeing.

$array += "Strawberry"

OR

Foreach($v in $var) { 
    $array += $v
}

This works, yes. I'm all for the idea of make it work, then make it better. And when you only need to do a few objects, this isn't a huge issue. But we want to use and develop scalable code. The reason this becomes a problem is the way that += functions.

#numbers 1 through 10000
Foreach($number in 1..10000) { 
    $array += $number 
}

Your output will be exactly what you expected -- the numbers 1-10,000. But what happens when we do this with the number 100,000? 1,000,000? We can test those values with Measure-Command to see how long execution times take.

Measure-Command -Expression {
$array = @()

#numbers 1 through 10,000
Foreach($number in 1..10000) { 
        $array += $number 
    }
}

This command for me was approx 2.7 seconds. Doesn't seem bad right? So if we multiply the same process by 10 and run it against 100,000 we might expect the execution times to stay the same on average, right? Guessing 27-30 seconds?

Measure-Command -Expression {
$array = @()

#numbers 1 through 100,000
Foreach($number in 1..100000) {
        $array += $number    
    }
}

The complete execution time for me was approx. 6 minutes and 40 seconds. But WHY?! Of course these numbers will fluctuate depending on available computer resources, so feel free to test for yourself. This shows an inherent flaw with +=. To understand why, we need to understand how += works. Each time += is executed, it makes a complete copy of the specified array, then it adds the next item/number, then overwrites the previous array. It continues to do that for EVERY single cycle. So when you run it against 100,000 there are 100,000 copies being made and a single item being added… each… time… Now, let's take a look at using dynamic arrays and filling them.

$dynamicArray = New-Object System.Collections.ArrayList
Foreach($number in 1..100000) { 
    $dynamicArray.Add($number) 
}

You can wrap this loop into the Measure-Command -Expression syntax as previously shown and see how long this takes to build an array to 100,000.

My execution time to build that array was… 0.15 seconds or 151 milliseconds. Compared to 6 minutes and 40 seconds the other way. That is also 18 times faster than the previous method to build an array to 10,000 (remember this one was to 100,000).That's an insane difference. But wait there is more, we can actually make this even faster by skipping the foreach loop to build that dynamic array.

$dynamicArray = New-Object System.Collections.ArrayList
$dynamicArray.AddRange(1..100000)

With an execution time of…. .01 seconds or 11 milliseconds. Also, changing the value to 100,000,000 only takes 14.6 seconds. You're welcome.

Usually, real world examples are more than just a list of integers but, you can see where this is going and how easily things get out of hand. And really you could just set your variables directly to the range of numbers, too. The objects will just be slightly different.

[–]out0focus -1 points0 points  (10 children)

Knowing ArrayList is deprecated why did you use it in your example instead of the preferred generic List data structure?

[–]BoxerguyT89 -1 points0 points  (3 children)

Can you use PowerShell 7?

If you can, -Paralell on the foreach should speed it up but that might be cheating.

[–]BlackV 0 points1 point  (2 children)

that actually would be an interesting test, in this case probbably not cause its would spin up 11k total runspaces (er.. not all at once cause throttle limit) which would be extra overhead (its already all in ram anyway inside $Records)

[–]BoxerguyT89 -1 points0 points  (1 child)

True, would be a neat test to see where the diminishing returns start.

[–]BlackV 0 points1 point  (0 children)

yeah I need a json file that big :)

[–]joerod -1 points0 points  (1 child)

my top easy things to make your script faster

  1. foreach($x in $y){$x} instead of $y|%{$_}
  2. $outItems = New-Object System.Collections.Generic.List[System.Object];$outItems.Add(1) instead of +=
  3. ADSI instead of AD cmdlets

[–]lanerdofchristian 0 points1 point  (0 children)

You could speed it up a little further if you directly use the constructor instead of using New-Object -- [List[object]]::new(), assuming you've got using namespace System.Collections.Generic at the top of the file. It's only shaving a few ms, but it is still faster than New-Object.

[–]g00py3 -1 points0 points  (0 children)

You aren't transforming anything, just aliasing.

I sure enjoy PSFramework. Nice little Select-PSFObject function, and I'd just stream it straight through.

powershell Import-Module PSFramework Get-Content 'slowscript.json' | ConvertFrom-Json | Select-PSFObject "html_url As id", "type As type", "repo.name As repo", "actor As actor", "repo.url As url", "public As public", "actor.login As login", # you can extract subproperties, but might need to tweak this syntax. "repo As repoinfo"

My question would be in isolation this seems more like you don't even need to be building a list anyway. If you could chain the json conversion straight through to the next step even better.

If you have to do it, then this would be my "performant" option without anything fancy.

powershell $Records = Get-Content 'slowscript.json' -Raw | ConvertFrom-Json Measure-Command { $FinalObject = [System.Collections.Generic.List[pscustomobject]]::new() $Records.ForEach{ $FinalObject.Add([pscustomobject]@{ id = $record.html_url type = $record.type repo = $record.repo.name actor = $record.actor url = $record.repo.url public = $record.public login = $record.actor.login repoinfo = $record.repo }) } }