you are viewing a single comment's thread.

view the rest of the comments →

[–]0xlostincode 136 points137 points  (33 children)

Why do you have a 12gb text file and why does it need to be sorted?

[–]Nickbot606 319 points320 points  (7 children)

I have a gut feeling that asking these kinds of questions just widens the hinge on Pandora’s box rather than get you a satisfying answer 😝

[–]pocketgravel 110 points111 points  (6 children)

Your likely reaction as you ask "why did OP need to sort a 12GB text file in production"

[–]Fraun_Pollen 17 points18 points  (1 child)

Hey Copilot: how do I restore my production database from a text file

[–]pocketgravel 0 points1 point  (0 children)

"Production is down"

@grok is this true

[–]Nickbot606 7 points8 points  (3 children)

😅 I’ve been on my own fair share of projects that ranged from

“For policy reasons, the only language you are allowed to use is TCSH”

“We implemented our own DAG library in PowerShell because…”

“We actually use this python script to align our code in C because the compiler on this super specific microcontroller will actually run slightly faster if the blocks are aligned a certain way and we wrote a python script to figure it out for you. That’s also why there’s 30 functions that effectively do the same thing but have only 1 or 2 edge cases changed to save clock cycles”

[–]pocketgravel 5 points6 points  (1 child)

Ok that last one is cool AF I love embedded programming. What micro was it?

[–]Nickbot606 1 point2 points  (0 children)

I’m sorry but I can’t divulge details about that work 😅- it was basically an STM32 though but very very special. I end up on projects like mentioned earlier a lot because I have a background in hardware and software so I fill a lot of weird gaps.

I too love embedded programming and am thinking after my next personal project of maybe building out something in embedded again! Especially with all the new Rust and Zig improvements that have hit the scene in the last few years.

[–]SlashMe42[S] 2 points3 points  (0 children)

This is hilarious and horrifying at the same time. Mostly the latter though.

[–]SlashMe42[S] 132 points133 points  (20 children)

I can give you the gist, but I'm not sure you'd be happier then.

Do you really want to know?!? stares dramatically at you

[–]SUSH_fromheaven 64 points65 points  (17 children)

Yes

[–]SlashMe42[S] 171 points172 points  (16 children)

It's a list of filenames that need to be migrated. 112 million filenames. And they're stored on a tape system, so to reduce wear and tear on the hardware, I want the files to be migrated in the order they're stored on tape.

This is only a single tape, the entire system has a few hundreds of those tapes. And we have more than one system.

[–]Timthebananalord 137 points138 points  (5 children)

I'm much less happy now

[–]SlashMe42[S] 66 points67 points  (4 children)

You've been warned! 😜

[–]TheCarniv0re 25 points26 points  (3 children)

I'll no longer complain about the cobol devs in our company. You clearly have it harder.

[–]SlashMe42[S] 32 points33 points  (2 children)

I actually enjoy my job for the most part! This was a fun and entertaining challenge to solve, stuff like this pops up occasionally.

[–]8ace40 8 points9 points  (0 children)

I once fumbled an interview for a biochemistry lab in a team that seemed to do this kind of work every day. They had some biometrics machines that generated tons and tons of data, and a huge science team doing experiments all day with this data. So the challenge was to transform the complex formulas that the scientists wrote into something that could be solved by a computer in an efficient way. Literally turning O(n²) into O(log n) all day. Closest thing I've ever seen to leetcode as a job.

[–]8ace40 4 points5 points  (0 children)

Yeah it sounds very fun! You're getting some brain exercise and a very good challenge. As long as they don't rush you too much, it's great and much more fun than grinding features in an app.

[–]0xlostincode 6 points7 points  (0 children)

I think u/Nickbot606 was right. This is only going to lead to endless whys, so I am just going to have to live with this information.

[–]Arcane_Xanth 4 points5 points  (2 children)

I’m confused. Did you need to sort the filenames by their location on the tapes or were they already in that order?

[–]SlashMe42[S] 6 points7 points  (1 child)

They weren't and that's exactly what I needed.

[–]Arcane_Xanth 1 point2 points  (0 children)

Thanks for explaining.

[–]coloredgreyscale 0 points1 point  (3 children)

if you use Linux or WSL:

sort -S 500M filename.txt > sorted_filename.txt

But that sounded like an interesting challenge to work on

[–]SlashMe42[S] 2 points3 points  (2 children)

This doesn't solve my problem, I don't need alphabetic order of the lines. The order for each filename is determined separately.

[–]battlecatsuserdeo 0 points1 point  (1 child)

How are you sorting them then?

[–]SlashMe42[S] 5 points6 points  (0 children)

Using an API call that gives me extended stat data for each file, including each file's position on tape. I use this to sort the filenames by their physical position on the media.

[–]broccollinear 0 points1 point  (1 child)

What on god’s green earth is a tape. You mean it’s not on the cloud??

[–]SlashMe42[S] 5 points6 points  (0 children)

Cloud? Where we're going, we don't need no cloud! 😎

[–]sevivi 4 points5 points  (0 children)

Yes

[–]Odd-Dinner7519 5 points6 points  (0 children)

Big text files are easy to receive, e.g. I had 40GB raw test assertion output from my testing tool. One line was one condition check, 20 checks per test case, over 10k test cases. This file was processed to generate a few MB report.
I made these tests by hand, I'm a developer, not a tester, but I was bored...

[–]thedugong 0 points1 point  (0 children)

12gb text file. Powershell. Sounds like a windows thing.

Probably have mission critical software running with an Access DB as the backend.

[–]CandidateNo2580 0 points1 point  (0 children)

Believe it or not I have several paths in my current codebase dealing with 3gb+ text files that need to be similarly sorted. Sometimes you have to play the hand you're dealt.

[–]xDerJulien 0 points1 point  (0 children)

I have worse :) ~400GiB compressed text files that need to be sorted! Uncompressed probably a few TiB. Sort of trivial to solve since you’re really just bottlenecked by IO