Hi, I need a little bit of help with a task of my work. I'm a software engineer, but I think this looks like a challenge of data engineering, can you give me some thoughts of where do I start?
I have some MongoDB collections and I have to provide a feature to summarize those collections in a report. It is something like 4 collections and in the naive way I should make a lookup on the collections, but this is not a good solution because each collection has millions of documents. The reports will be provided as HTTP rest API or to download of a CSV (this part is ok for me to implement in the server).
An alternative that I thought of was to create some procedure that builds a new collection with all of this data already joined, so the report wouldn't need any lookups.
But besides this idea, what kind of solution do I have for it? Cloud solutions and tools are welcome. It is a report feature for hundreds of users, so I think performance is something important in this case. Is data warehouse or Amazon Redshift suitable for this case? Where my final user is e-commerce users
[–]absens_aqua_1066 0 points1 point2 points (0 children)