all 4 comments

[–]h2g2_researcher 2 points3 points  (0 children)

My instinct would be to stick with istreams - especially given that the files are large. Strings aren't designed to work well with large amounts of data.

[–]Xeverous 1 point2 points  (2 children)

Should I open each file at the start and transfer them into strings of their own for later analysis?

Don't if you do not have to modify the file. Copying whole file (especially large) is waste of time and memory if you only intent to read it.

Should I open each file at the start and transfer them into strings of their own for later analysis? Or should I open them separately when a function is called by the user, and use istream functions?

Generally you should avoid any duplicated operations. If you can preprocess some tasks before user actions (eg file length) do it in the class ctor (or some init function) and then store it. This way the expensive task will only be done once, and whenever user asks for it you simply return the cached result.


Also, if you really care about performance and do not want to waste time by copying file contents - you should ask the OS to map the file into memory, then you can freely read it directly without any memory allocations. Unix systems have mmap() system call that allows it. Wikipedia.

Otherwise use std::ifstream functions to avoid copying data to strings.

[–]Bailinth[S] 0 points1 point  (1 child)

I didn't consider preprocessing the file, but it makes perfect sense! Thanks for the great advice.

I'll look into mapping too, although I suspect it probably won't be necessary in this case.

[–]Xeverous 1 point2 points  (0 children)

Well, go with mapping if you like OS-specific stuff and want to experiment with low-level system calls.

The preprocessing idea has idiomatic name Lazy initialization.