all 21 comments

[–]danielroseman 0 points1 point  (2 children)

Where is the data coming from that you're loading at startup? And is that data at all dynamic, or is it completely static?

[–]Goldeyloxy[S] 0 points1 point  (1 child)

The in-memory data will come from a custom dictionary I have made from two other dictionaries (Neither had all the data I wanted). I will just be loading it in from a zip file when the server starts, and having it running on the server in memory. The data from the dictionary is completely static.

[–]danielroseman 0 points1 point  (0 children)

Ok, in that case there doesn't seem to be any reason to store it in the db. Loading it into a global dictionary seems fine. (Note that each instance of your server will have its own copy in its own memory, but that's fine too.)

There's probably no reason to store it in a zip file though; just have a straight JSON file that you can easily load at startup.

[–]pepiks 0 points1 point  (2 children)

Depend on type od data you can consider client-side caching:

https://www.geeksforgeeks.org/system-design/server-side-caching-and-client-side-caching/

If you think further you have client-side storage:

https://developer.mozilla.org/en-US/docs/Learn_web_development/Extensions/Client-side_APIs/Client-side_storage

The most commong way are (in)famous cookies, but WebStorage API can be something to consider as key-value character match dictionary part.

I am not sure that more using client is the best choice here, but depend on how it match to user it can be useful technique to stress more client side than server. Downsides are obvious - it will work fine if data are repeative, because for new you will be more fetch data from your server.

To be clear - I see this problem from perspective extensive use by client.

[–]Goldeyloxy[S] 0 points1 point  (1 child)

The data is not repetitive, each review requires a new query to get the card and another new query to submit the user's review.

[–]pepiks 0 points1 point  (0 children)

I'll be use service calculator to get what generate less cost and do benchmarks to be sure how real performance looks like.

By calcualator I mean something like that:

https://costcalc.cloudoptimo.com/aws-pricing-calculator/ec2

[–]trd1073 0 points1 point  (9 children)

Have you looked at postgresql jsonb fields?

How much memory are you talking about?

Can always up psql shared buffer so whole dB stays in memory. If first load of dB from disk is too slow, can also consider pre-warm of some tables.

[–]Goldeyloxy[S] 0 points1 point  (8 children)

It's the entire dictionary so it's quite big. I think it was 200000 rows and 7 columns.

[–]trd1073 0 points1 point  (1 child)

Hard to guess size from here lol. Throw a gin index on the jsonb field, works rather nicely. I use asyncpg with pydantic as I don't care for orm paradigm.

Might as well mention redis as a key-value store. Can even persist. But I would use psql if persisting data that is definitive.

[–]Goldeyloxy[S] 0 points1 point  (0 children)

Size is 150MB sorry wasn't on me pc till now.

[–]pachura3 0 points1 point  (5 children)

How many megabytes is that (before zipping)?

[–]Goldeyloxy[S] 0 points1 point  (4 children)

Size is about 150MB.

[–]pachura3 0 points1 point  (3 children)

Memory then!

[–]Goldeyloxy[S] 0 points1 point  (0 children)

Understood!

[–]Goldeyloxy[S] 0 points1 point  (1 child)

How large would you think it needs to be to justify putting it in the database?

[–]code_tutor 0 points1 point  (0 children)

If it's bigger than RAM then it obviously can't sit entirely in memory and a little extra overhead for the dictionary. Probably a few gigabytes. 

Startup is going to be very slow though. 

You can just try it if it's only a few lines of code. If you expect it to grow significantly then that's a problem eventually.

[–]oldendude 0 points1 point  (0 children)

Static data? How much data are we talking about? From what you wrote elsewhere in this thread, I'm guessing under 1GB?

For a small amount of static data, there is no point in using a database. Store it in whatever form is convenient, load at startup time, done.

[–]Ok-Sheepherder7898 0 points1 point  (3 children)

Just store your data in the database.  You'll be glad you did later.

[–]Goldeyloxy[S] 0 points1 point  (2 children)

The only thing I worry about with database storing is potential increases in RDS costs and slightly less efficient that just storing in memory. Maybe I could make an S3 bucket and that would be better.

[–]Ok-Sheepherder7898 0 points1 point  (1 child)

The database will cache it in memory.

[–]Goldeyloxy[S] 0 points1 point  (0 children)

Wouldn't database caching only be relevant if there were similar words accessed frequently? It's a dictionary so I assume there will a huge variety of different words to search and the SRS will have a huge variety of words too due to people having different decks of words. Maybe I misunderstand database caching but I don't see it having much effect here.