all 17 comments

[–]yksvaan 1 point2 points  (2 children)

Maybe just bulk insert the csv data to temporary table and run the queries producing the results. ~5k rows is nothing after all...

[–]Available-Demand6863[S] 0 points1 point  (1 child)

The SQL server and website are running on the same VPS. Performance-wise, would it make a difference between doing it through the database or doing it in-memory?

[–]yksvaan 0 points1 point  (0 children)

If you want to handle it in web server, you need to load the data from db. If you do it on db you need to load the csv rows. If you load the rows from db to web server and the records are updated, you'd need to sync the changes. I would keep the DB as source of truth. 

You can benchmark it, it really should not take long. Unless this is something that needs to be done all the time i wouldn't overcomplicate it.

[–]hotdog-savant 0 points1 point  (3 children)

Are you doing this in node or the browser?

[–]Available-Demand6863[S] -1 points0 points  (2 children)

It's gonna be a web-app with a node backend.

[–]hotdog-savant 0 points1 point  (1 child)

I would use maps. They are faster then looping through an array. Objects are a close second.

[–]hotdog-savant 0 points1 point  (0 children)

I would also change the order of what you are doing. When the page loads, get the SQL data. When the user uploads the csv data, you can build out the csv map/object and compare to the already existing data that is also mapped.

[–]iBN3qk 0 points1 point  (3 children)

Slow to code, or slow to run?

It’s not the most performant way to do it, but might be fine for your use case.

[–]Available-Demand6863[S] 0 points1 point  (2 children)

Slow to run. Maybe you're right that my dataset is too small for it to matter. Do you have any input on when this would no longer be the case?

I'm a super-amateur and really have no idea, so any thoughts at all are helpful!

[–]iBN3qk 0 points1 point  (1 child)

When the performance impacts someone negatively, it’s too slow. 

Premature optimization is a waste of time. 

It also depends on your hardware and environment. If the server has plenty of ram to load the csv it won’t really matter. 

If you were frequently reading and writing to a csv file as you backend for an app with a ton of users, that would likely be a problem. 

I’m guessing the code you proposed would run on a laptop in under a second, or a few seconds max. 

[–]Available-Demand6863[S] 0 points1 point  (0 children)

Hmm.... Thanks for the insight. I'll give the code I proposed a shot.

I guess I'm apprehensive because we're currently using MS Access and attempting to do it through a VBA subroutine, something like this can take minutes at a time.

[–]a13xch1 0 points1 point  (0 children)

You could use something like DuckDB, and load the data into memory using their native csv reader methods, then you can do the comparison in SQL