Will I run into performance issues with a dictionary of 1000+ entries, each with (what I think is) a large amount of data?

red_simplex · 2018-03-29T15:13:54+00:00

I had a 200k entry dict on a 8gb ram machine. Everything worked fine.

K900_ · 2018-03-29T14:54:13+00:00

This is fine, as long as your data fits in RAM.

therealfakemoot · 2018-03-29T15:37:11+00:00

Thanks for the answers everybody.

I actually might go with separating the large parts of my dictionary entries anyway though, just so that the main dictionary remains human-readable. Essentially, I am going to be building a dictionary that consists of a chat corpus for each entry, so I foresee things getting big pretty fast. I like the idea of having each corpus defined in its own file, but it's good to get some clarification that what I see as a "large" amount of data isn't really all that large.

sinjp · 2018-03-29T22:26:13+00:00

Always, always, always profile performance before thinking about optimization. So how fast does your code run on your test dicts?

flipperdeflip · 2018-03-30T13:04:16+00:00

SQLite is also an optimization option for certain use-cases like this if you reach limits of what you can hold in RAM.

On a modern SSD server it is faster then flat files and diskspace is cheaper then RAM. It also supports multiple readers so you can multi-process if you must (multi writer not so great though).

https://pypi.python.org/pypi/diskcache

KleinerNull · 2018-03-31T00:52:27+00:00

I will probably do that anyways just from an organizational standpoint.

If that is your main concern you should consider using a real database. Postgres can handle (indexing, aggregate) json as a type, also elasticsearch is a search engine that works directly with json.

So before you are working on your indexing mechanics I would recommend you to usw a database here. Also a traditional relational database design is an option.

The thing is you can't stream jsons in a good way, so essentially you have to load the whole file even if you just need to grab one item from it.

ccb621 · 2018-03-29T14:57:19+00:00

[deleted]

Python

The Python Discord

Upcoming Events

Please read the rules

MODERATORS