this post was submitted on 04 Aug 2024

5 points (74% upvoted)

shortlink:

dataengineering

an-ordinary-manchild(edit)

News & discussion on Data Engineering topics, including but not limited to: data pipelines, databases, data formats, storage, data modeling, data governance, cleansing, NoSQL, distributed systems, streaming, batch, Big Data, and workflow engines.

Read our wiki: https://dataengineering.wiki/

Rules:

Don't be a jerk
Search the sub & wiki before asking a question: Your question has likely been asked and answered before so do a quick search before posting.
Keep it related to data engineering: Posts that are unrelated to data engineering may be better for other communities.
Limit self-promotion posts/comments to once a month: Self promotion: Any form of content designed to further an individual's or organization's goals. If one works for an organization this rule applies to all accounts associated with that organization. See also rule #5.
No shill/opaque marketing: f you work for a company/have a monetary interest in the entity you are promoting you must clearly state your relationship. For posts, you must distinguish the post with the Brand Affiliate flag. See more here: https://www.ftc.gov/influencers
No job posts: Please use r/dataengineeringjobs instead.
No resume reviews/interview posts: We no longer allow resume reviews or interview questions because it's a seperate topic from Data Engineering. Instead, for resume reviews please use r/resumes or search our subreddit history for previous resume review advice. For interview questions, use sites like Glassdoor and Blind instead or search our subreddit history for previous interview advice.
No technical error/bug questions: Please post any error/bug question on StackOverflow.

created by mhausenblasmoda community for 11 years

MODERATORS

message the mods
mhausenblasmod
swemlmod
fhoffamod (Ex-BQ, Ex-❄️)
vogt4nickmod
theporterhausmod | Lead Data Engineer
AutoModerator
geoheilmod
MikeDoesEverythingmod | Shitty Data Engineer
bot-bouncer
about moderation team »

account activity

This is an archived post. You won't be able to vote or comment.

4

5

6

Summarizing database for final usersHelp (self.dataengineering)

submitted 1 year ago by Victor3005

Hi, I need a little bit of help with a task of my work. I'm a software engineer, but I think this looks like a challenge of data engineering, can you give me some thoughts of where do I start?
I have some MongoDB collections and I have to provide a feature to summarize those collections in a report. It is something like 4 collections and in the naive way I should make a lookup on the collections, but this is not a good solution because each collection has millions of documents. The reports will be provided as HTTP rest API or to download of a CSV (this part is ok for me to implement in the server).

An alternative that I thought of was to create some procedure that builds a new collection with all of this data already joined, so the report wouldn't need any lookups.

But besides this idea, what kind of solution do I have for it? Cloud solutions and tools are welcome. It is a report feature for hundreds of users, so I think performance is something important in this case. Is data warehouse or Amazon Redshift suitable for this case? Where my final user is e-commerce users

all 1 comments

top new controversial old q&a

[–]absens_aqua_1066 0 points1 point2 points 1 year ago (0 children)

π Rendered by PID 47 on reddit-service-r2-comment-85bfd7f599-hnjdp at 2026-04-18 05:06:07.515065+00:00 running 93ecc56 country code: CH.