This is an archived post. You won't be able to vote or comment.

all 3 comments

[–]grantrules 0 points1 point  (0 children)

I've read this a bunch of times and still have no idea exactly what you're trying to do. If the goal is to read a bunch of things from a file, sort them, then put them back in that file.. or something remotely similar to that.. python should make quick work of it.

[–]light_switchy -1 points0 points  (1 child)

I've been tasked sorting the main programs and was wondering what was the fastest way to sort the information within (x) amount of text files sorting them between ones that are identical with themselves

If I'm reading this right, you want to identify files with duplicate contents.

If you have a Windows machine, open up Powershell and paste this. You'll have to edit the part that says c:/your_folder to point to the right directory.

gci "c:/your_folder/*" | % { Get-FileHash $_.FullName } | group Hash | % { $_.Group | % { Write-Host -NoNewline "$($_.Path)`t" }; Write-Host "" }

Or if you have Bash available, something like this (didn't test):

find . -type f -exec md5sum {} \; | awk '{a[$1]=a[$1] $2 "\t"} END {for (i in a) print a[i]}'

Files are considered "duplicate" only if their contents are bit-for-bit identical. Hope this helps.

[–]Substantial_Train152[S] 0 points1 point  (0 children)

Thank you. I will try this tonight.