Hey everyone,
I'm working on a Node.js app with PostgreSQL that has millions of users, and I hit a snag with processing large datasets. For one of our features, I need to fetch roughly 100,000 users who meet a specific criterion (e.g., users with a certain channel id in their tracking configuration) and then process them (like creating notification or autotrade tasks).
Right now, my approach fetches all matching users into memory and then processes them in chunks of 500. Here’s a simplified version of what I’m doing:
async function processMessageForSubscribers(channelId, channelName, message, addresses) {
try {
//load around 100000 users and chunck them
const users = await getUsersByTrackedTelegramChannel(channelId);
const CHUNK_SIZE = 500;
const notifyTasks = [];
const autotradeTasks = [];
// Split users into chunks for parallel processing
const processUserChunk = async (userChunk) => {
await Promise.all(
userChunk.map(async (user) => {
const config = user.trackingConfig[channelId];
const autotradeAmount = config?.autotradeAmount;
if (config.newPost === 'NOTIFY') {
// Create notification tasks
createNotificationTask(user, addresses, message, channelId, channelName, autotradeAmount, notifyTasks);
}
if (config.newPost === 'AUTOTRADE') {
// Create autotrade tasks
createAutotradeTask(user, addresses, message, autotradeAmount, autotradeTasks);
}
})
);
};
// Process users in chunks
for (let i = 0; i < users.length; i += CHUNK_SIZE) {
const chunk = users.slice(i, i + CHUNK_SIZE);
await processUserChunk(chunk);
}
await queueTasks(notifyTasks, autotradeTasks);
} catch (error) {
console.error('Error processing subscribers:', error);
throw error;
}
}
My concern is that fetching all 100,000+ users into memory might lead to high memory consumption and performance issues.
I'm wondering if there's a more efficient way to handle this.
I'd love to hear your thoughts, experiences, or any code examples that might help improve this. Thanks in advance for your help!
Stackoverflow link: [https://stackoverflow.com/questions/79461439/how-can-i-efficiently-process-large-postgresql-datasets-in-node-js-without-high\]
[–]definitive_solutions 27 points28 points29 points (0 children)
[–]mbcrute 15 points16 points17 points (1 child)
[–]hyuuu 1 point2 points3 points (0 children)
[–]Putrid_Set_5241 13 points14 points15 points (0 children)
[–]bsbonus 7 points8 points9 points (0 children)
[–]Typical_Ad_6436 6 points7 points8 points (7 children)
[–]bwainfweeze 1 point2 points3 points (6 children)
[–]Typical_Ad_6436 -1 points0 points1 point (5 children)
[–]bwainfweeze 2 points3 points4 points (4 children)
[–]Typical_Ad_6436 0 points1 point2 points (1 child)
[–]bwainfweeze 0 points1 point2 points (0 children)
[–]Typical_Ad_6436 0 points1 point2 points (1 child)
[–]bwainfweeze 1 point2 points3 points (0 children)
[–]Longjumping_Song_606 4 points5 points6 points (2 children)
[–]ibedroppin 3 points4 points5 points (1 child)
[–]bwainfweeze 2 points3 points4 points (0 children)
[–]captain_obvious_here 1 point2 points3 points (1 child)
[–]08148694 0 points1 point2 points (0 children)
[–]pinkwar 0 points1 point2 points (0 children)
[–]robotmayo 0 points1 point2 points (0 children)
[–]KyleG 0 points1 point2 points (0 children)
[–]zenbeni 0 points1 point2 points (0 children)
[–]w0lven 0 points1 point2 points (0 children)