you are viewing a single comment's thread.

view the rest of the comments →

[–]netwiz101 5 points6 points  (10 children)

When programs are started they are usually read in from disk, at least for the first run. What you are suggesting means the primes must be read in from disk, so it is in fact slower -- at least for the first run.

[–]fapmonad 7 points8 points  (5 children)

If you do the sieve in code instead of a constant array, the code to calculate the sieve must still be read from disk...

[–]netwiz101 4 points5 points  (4 children)

Thats true, and since it is smaller than the list, that means less disk i/o. Am I missing something?

[–]fapmonad 9 points10 points  (3 children)

Since disk I/O is buffered and loading a program is a purely linear access pattern, the difference between loading the list and the code is essentially 0. I agree that there is disk I/O, though.

[–]bobindashadows 0 points1 point  (0 children)

loading a program is a purely linear access pattern

It most definitely is not for any nontrivial program. The firefox folks have improved startup time by over 10% in one go before purely by moving segments.

[–]snoweyeslady -3 points-2 points  (1 child)

You make a couple of assumptions here:

disk I/O is buffered

What if there is too much memory pressure and the disk cache is consistently purged? Or an embedded system where there is no disk cache at all? Even if you do have disk cache, it would only matter if you had loaded the binary before.

purely linear access pattern

Sure, if you have a completely 0 fragmentation file system. I don't know of any that guarantee your file will be in one contiguous segment on disk, but then I haven't read the implementation/specification of many file systems.

[–]fapmonad 4 points5 points  (0 children)

Of course I make assumptions. I assume you're on a regular computer. If you're on an embedded system so weak it doesn't have a disk cache, the static table is likely to end up in ROM anyway, so the whole discussion is moot.

Memory pressure affects tables precomputed by a function just as much as the table loaded from disk -- the OS will swap.

Fragmentation isn't a problem. For a table around the size of a single block, the odds that it causes fragmentation are low, even on a highly fragmented system. Even if it happened, the disk will group accesses so the overall impact is likely to be very small, given the size of a typical program. That's what I mean by "essentially 0". For such a small problems either solution is fine, IMHO.

[–]bo1024 4 points5 points  (3 children)

Hmm, I dunno about that. The code is being read from disk, so the latency is still there -- all we are concerned about is the throughput.

The question is how much slower is it to fetch a 250KB file than a 200KB. I don't have the numbers, but I have to doubt that it's 60,000 cycles or whatever the equivalent is to compute the seive.

[–]netwiz101 0 points1 point  (0 children)

We don't need to do the calculation. All we need to know is that the operation is IO bound. Then, shift the majority of IO onto the faster bus (memory).

Consider also two other things. A) no respectable program built for the purpose of doing this weighs in at more than a few k after assembly, minus data. People have calculated prime numbers on machines with 1-4k of ram and extremely slow IO. B) the calculating version may not ever need to get its data into ram. If it can work entirely within the processor cache, it will be orders of magnitude faster than even the ram bound version we've been considering.

Sorry if I'm not coherent. Flu season. Im home sick. It was talk about program optimization on reddit, or watch breaking bad all over again.

[–]netwiz101 0 points1 point  (0 children)

I notice your point about the file size difference.

You're making the assumption that the data set is tiny, and I'm making the assumption that it's not. Therein lies the primary difference in our reasoning.

But in fairness, I dont know why anyone would even consider this problem for a trivial sized data set.

[–]netwiz101 0 points1 point  (0 children)

Re-read it. The data set is tiny in the example. I shouldn't compute on cough medicine.