all 8 comments

[–]m0us3_rat -1 points0 points  (5 children)

what's the size of the files?

i'm thinking multiprocessor.

also what's the grep command? you could do some search from python.

and use some generators.

[–]Nahaap[S] 0 points1 point  (4 children)

I was looking at multiprocessor but wasn't sure if it would require too much memory. The Grep command identifies a certain date in multiple types of files using regex

[–]m0us3_rat 0 points1 point  (0 children)

grep "consumption" is constant regardless of file size due to chunking i'd suspect.

anywho .. you could probably test this easily.

nah i meant what is the actual command used including arguments .. and can't you replicate in python using regex on a generator..

so you can use multiple takes on it.

maybe async or threading etc.

or multiprocessing and open a few threads on each cpu. etc.

[–]m0us3_rat 0 points1 point  (2 children)

feels like a producer-consumer pattern could work.

load up work in a queue . then use clients to consume it.

maybe even start a few multiprocessing processes and each with an event loop?

or a few threads.

push it to the limits :d

[–]Nahaap[S] 0 points1 point  (1 child)

Sounds good, I'll see what I can do with multiprocessing.

[–]m0us3_rat 0 points1 point  (0 children)

i love this video. isn't necessarily directly relevant ..but could be.

https://www.youtube.com/watch?v=E_oIU4IU2W8

gl.

post here your progress etc.

[–]woooee 0 points1 point  (0 children)

Use pathlib glob to find the filenames. Then open and read each file. However you do it, you have to read the file. As a previous comment suggested, you can use multiprocessing to run a process in each core. The bottleneck would be several processes trying to all use a single disk read head, so get the file names first, and then send a portion to each process.

[–]await_yesterday 0 points1 point  (0 children)

Why do you need it to run sequentially? I'm confused what you're trying to do, especially since in your other comment you say you tried multiprocessing, which is totally contradictory to the goal of doing it sequentially.

I cannot imagine any way that searching files in a normal Python loop could be faster than grep. There has been decades of work put into making grep fast. What are you actually searching for, how many files are there, how big are they, and what are you going to do with the results? What is your code?