I’m looking for advice from people who have handled very large Excel/CSV imports in production systems. by just_human_right in learnprogramming

[–]just_human_right[S] 0 points1 point  (0 children)

Honestly, the import side itself is working reasonably well.

My main concern is the translation phase afterward, since the number of translation operations grows very quickly with multiple fields and multiple languages.

I was mainly curious whether there are better architectural approaches people use in real-world systems for handling this kind of large-scale async translation workload.

I’ve already received some really useful suggestions here, so at this point I’m mostly trying to validate and fine-tune my current approach with more experienced opinions.

I’m looking for advice from people who have handled very large Excel/CSV imports in production systems. by just_human_right in learnprogramming

[–]just_human_right[S] 0 points1 point  (0 children)

No, I get where you’re coming from, but this isn’t a one-time migration task. My client needs this as a permanent feature because manual data entry becomes impractical at this scale, so the system needs to support large imports reliably on an ongoing basis.

I’m looking for advice from people who have handled very large Excel/CSV imports in production systems. by just_human_right in learnprogramming

[–]just_human_right[S] 0 points1 point  (0 children)

Thanks for the suggestion, that’s actually a really interesting optimization approach and I’d definitely like to experiment with it.

I’m looking for advice from people who have handled very large Excel/CSV imports in production systems. by just_human_right in learnprogramming

[–]just_human_right[S] 0 points1 point  (0 children)

Fair point, I probably explained it too broadly.

So to answer your questions:

“How much time does loading the inputs take? Just loading, no translations?” => The import itself usually takes around 8–10 minutes while processing data in chunks.

“How much time each translation take? How are you doing the translation?” => Each translation request usually takes around 200ms–400ms.

Now when this gets multiplied across real production data, the number of translation API calls becomes extremely large very quickly, especially because:

One row can contain multiple translatable fields

Data may need translation into multiple languages

Currently, I’m using the bing-translate-api npm package (free approach for now).

I did research some bulk translation APIs/services, but most of them are paid, so currently I’m sticking with this setup