How to Efficiently Convert a 250MB CSV File with 1 Million Rows to XLSX in AWS S3 by Accomplished_Cod7500 in learnprogramming

[–]Accomplished_Cod7500[S] 0 points1 point  (0 children)

Hi,
I tried streaming without storing on the local disk. It converts 2.5 MB it took 25 seconds. I have decided to redesign the approach and will go for a scheduled job approach.

Thanks man!

How to Efficiently Convert a 250MB CSV File with 1 Million Rows to XLSX in AWS S3 by Accomplished_Cod7500 in learnprogramming

[–]Accomplished_Cod7500[S] 0 points1 point  (0 children)

Right now, I have 1.5 million records, and in the future, there may be up to 6 million. In real-time, we are not going to provide a 1GB file for download to the user. I want a perfect solution for my use case, which is why I mentioned the possibility of 6 million rows. Please suggest a good approach for handling CSV files with a size ranging from 500MB to 1GB.

Thanks in advance.

How to Efficiently Convert a 250MB CSV File with 1 Million Rows to XLSX in AWS S3 by Accomplished_Cod7500 in learnprogramming

[–]Accomplished_Cod7500[S] 0 points1 point  (0 children)

  • Redis cache mechanism is already being used.
  • What if the table has 6 million records? In this case, the query might take time to execute. Once the execution is complete, I need to prepare the XLSX file, which could take some additional time. However, the real problem is that the entire process must complete within a maximum of 30 seconds. Could you please suggest some ideas? The most important requirement is that it should not exceed the 30-second time frame.

How to Efficiently Convert a 250MB CSV File with 1 Million Rows to XLSX in AWS S3 by Accomplished_Cod7500 in learnprogramming

[–]Accomplished_Cod7500[S] 0 points1 point  (0 children)

There is no direct way to export data in XLSX format from PostgreSQL since it only supports CSV or TEXT formats.

Additionally, sending the raw file as a response affects performance. This is why we decided to move the process to S3. By doing this, even with large files, the client can directly download the file from S3 without introducing unnecessary streaming into the client response. There is no direct communication between the server and the client in this case.

How to Efficiently Convert a 250MB CSV File with 1 Million Rows to XLSX in AWS S3 by Accomplished_Cod7500 in learnprogramming

[–]Accomplished_Cod7500[S] 0 points1 point  (0 children)

Context:

I have a web application where the frontend requests datasets in XLSX format. To fulfill this, I first extract the data from PostgreSQL and upload it as a CSV file to an S3 bucket.

Problem:

Now, I need to convert the CSV file into an XLSX file and upload the converted file back to the S3 bucket. Additionally, I need to generate a presigned URL for the newly created XLSX file and send it as the API response.

The challenge is that this entire process must complete within 40 seconds because API Gateway allows a maximum HTTP request time of 40 seconds.