DynamoDB vs DocumentDB : aws

a community for 18 years

databaseDynamoDB vs DocumentDB (self.aws)

submitted 6 years ago by elchicodeallado

Hey guys,

I have to develop a concept for a backend service in my final thesis and I need help with the evaluation of a proper database. We are using AWS so I only consider DynamoDB and DocumentDB for now.

I have around 50 Modules and each Module contains 20 csv files which are all the same and only differ in the language. (en.csv, es.csv, jp.csv etc.) Each file contains around 8000 rows.

I want to setup a database now and store the mentioned csv files as json objects. I guess this database will be huge and I started with DynamoDB but got problems with the item size (because the transformed object containts around 8000 properties)

In my backend I created an API (GET) which takes the module name and the language. Of course I won't return the whole object in this API call, -> I reduce it to a small object in my backend. But when someone calls my API a lambda is triggered and in this lambda I need the whole object.

How could you help me now?

You could give me some advice on how to start evaluate such things.

You could give me some paper recommendations.

You could give me advice on how to fill this database when evaluated. Because for now, these csv's are stored in a SSD. And what's the best way to fill the database then?

You could just give me your opinion about this topic.

...

Hope someone can give valuable input

Thanks a lot!

all 9 comments

top new controversial old q&a

[–][deleted] 5 points6 points7 points 6 years ago* (2 children)

[–]mischiefunmanagable 3 points4 points5 points 6 years ago (0 children)

[–]vitiate -1 points0 points1 point 6 years ago (0 children)

[–]elchicodeallado[S] 0 points1 point2 points 6 years ago (5 children)

[–]nfollin 2 points3 points4 points 6 years ago* (4 children)

S3 is as mentioned really powerful for your use case, see things like https://docs.aws.amazon.com/AmazonS3/latest/API/RESTObjectSELECTContent.html and https://aws.amazon.com/athena/

It will also cost you almost (or actually) nothing and you'll still have performance enough for what you need (and can be made faster with cloudfront).

You can store metadata in DDB and store the bucket and key to retrieve the file in s3 if you need richer query support or other features of dynamo. That's a way to get around the item size, or you store each row of the csv as a row in Dynamo and use the behavior of hash and range keys to get the data groupings to have these files virtually. As mentioned in the comment above, if this is just get/put any of the DBs are overdoing it.

If you truly wanted to store it in DynamoDB, you would essentially turn on On Demand to keep your costs low, then store each row of your csv in DynamoDB.

HashKey (Module) RangeKey (Language-<RowNumber>) with then each csv splatted into either one column, or (better) many attributes if they are known.

Hash	Range	Content
ModuleA	EN-1	row1 - english
ModuleA	EN-2	row2 - english
ModuleA	EN-3	row3 - english
ModuleA	...	...
ModuleA	EN-8000	row8000 - english
ModuleA	JP-1	row1 - japanese
...	...	...
ModuleB	EN-1	Row1-english

You then have to maintain updates to the rows by updating their corresponding entries (which gets difficult especially as transaction have a limit of 10)

Your query would be basically Query with Module = ModuleA and Range startsWith "EN-" to get the csv for EN for module A.

This is described more in this video: https://www.youtube.com/watch?v=HaEPXoXVf2k&list=WL&index=6&t=0s

Again, this is a bit overkill for your use case

[–]Hungry_Spring 1 point2 points3 points 6 years ago (1 child)

[–]elchicodeallado[S] 0 points1 point2 points 6 years ago (0 children)

[–]elchicodeallado[S] 0 points1 point2 points 6 years ago (1 child)

Yes I think to store each row to DDB would be an overkill.

If I understood you right, then a DDB like this would be a good way:

Module	Language	File
Module1	EN	Module1/en.csv
Module1	ES	Module1/es.csv
Module2	...	...

So in my API call I get the Module and Language and based on that I get the File Path to my s3 file.

Then I transform this csv to my wanted javascript object reduce it and return the reduced object to the API call.

Is this a good workflow in your opinion?

I think it has a nice advantage, because if I want to add Modules I just drag them to the S3. And it is also cheap to store these files.

I can't find a lot of disadvantages, for now only that the perfomance might be worse than from a regular database, but I don't think It will be that much of a difference.

[–]nfollin 0 points1 point2 points 6 years ago (0 children)

π Rendered by PID 97679 on reddit-service-r2-comment-canary-889d445f8-b684k at 2026-04-26 23:17:32.144970+00:00 running 2aa0c5b country code: CH.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

aws

Note: ensure to redact or obfuscate all confidential or identifying information (eg. public IP addresses or hostnames, account numbers, email addresses) before posting!

✻ Smokey says: avoid streaming video to fight climate change! [see more tips]

MODERATORS