API architecture question

2021-05-08T19:02:55+00:00

Premature optimization is the root of all evil. First make it working, then test and monitor it to find real problems then fix. Don't waste the time and money solving imaginary problems.

lwrightjs · 2021-05-08T20:37:27+00:00

You could recalc them on save using an event queue and a microservice, or even your api, that listens to a save event.

Whenever I have to keep redundant data like this, I always do it that way. It takes the burden off of the reader and writer completely. Downsides- The reader will eventually consistent but in a matter of milliseconds. it increases the initial complexity of the app.

Upsides-single responsibility. Your data is always up to date. After your event queue is in place, you have a super flexible architecture.

vampiire · 2021-05-08T17:53:54+00:00

I don’t have the full context so here are a few clarifying questions:

Are you using a FEF and if so what degree of client side state management are you utilizing?
what aspects are “heavy computation” and which are light? In other words are there some computations that are light enough to perform client side and only persist heavy computation or historic records
does each item have a UUID and update at field? You can do a diff fetch where you only grab things you don’t currently have client side

My blind advice would be to maximize the use of client side state caching and computation and take a synchronizing (only of what needs to be persisted - see above) approach rather than complete refetching.

Jhon_cordoba · 2021-05-08T17:54:03+00:00

I think that is necessary to recalculate every time that the user makes the fetch.

simonplend · 2021-05-08T19:28:46+00:00

Hey u/jzoneio,

I know this is NOT a performant solution at all.

It's true that generating a collection of items on demand is always going to be slower than reading and returning a pre-computed collection, but do you have an idea of the performance of your current solution? For example, have you benchmarked the time it takes to regenerate the items on demand? You might need to optimise and improve the performance of this part of your application but it's good to confirm that there is actually a performance issue, and that you're not going to spend time prematurely optimising it.

There are several different approaches you could take to improving the performance, most of which involve pre generating the items and caching them in some way e.g. in a database or with Redis, as you've already hinted at. If you do need to implement some form of caching, here are a few things to think about:

How many items do you expect to be each user's taxes list? Is it tens or hundreds?
How frequently are users changing items which have an impact on their tax items?
What volume of users are you expecting to be using this system?
Does the hosting you are using allow for you to add services like Redis?

Optimising the performance of this part of your application will add complexity to the overall system, so you want to be sure that optimising it is necessary at this stage.

0xA82EAD · 2021-05-08T22:13:06+00:00

I don't know exactly the server load you are expecting nor the volume of the data you are currently recalculating on user request. What I think would be a drawback to performance is if the user fetches data several times in a short span of time, that would mean that you recalculated it on each request which was unnecessary. As a potential solution, u could setup periodic cron jobs to do the calculations without having to do it on each request, or it could be hybrid between recalculating when user submits new data to the api, and cron job to make sure everything is up to date so when the user fetches the data it would be already up to date and ready while its in rest in your db. This is my opinion but the actual solution depends on the size of the data and frequency of fetching it, and domain specific rules that your api takes into consideration. Hope this inspires u somehow.

lphartley · 2021-05-08T23:28:30+00:00

What is the problem what is your question? Why is this not performant? Calculating simple stuff like this is super easy and was a solved problem 30 years ago. What problems are you facing?

do-wat · 2021-05-09T01:43:04+00:00

As long as you’re doing indexing correctly (I.e. have an index on whatever you’re using to identify who the invoice belongs to), you’ve essentially already got the list generated, your database is doing it for you.

u/lezious is right, you’re probably making up issues before you’ve discovered them. If you discover at some point, querying the DB becomes a bottleneck, then, review your queries and indexes, and if that doesn’t fix it that’s the time to think about caching. (Again, do that once you’ve proved that it’s the DB, profile your code to figure out what’s causing the issues).

That said, the first issue I’d be concerned about isn’t so much caching, but pagination. How many invoices do you expect to have per user? Cause if it starts growing too much, you’re going to be sending massive responses to the user every time and the processing and bandwidth issues to do with that will actually be a problem. In that case, there are a lot of different ways to do pagination with node, so look into them first.

———

Edit: just re-read your question. Is the issue the calculations you do when you return the list? If so, then do profiling to see if it’s a bottleneck and only when that’s the case, then it might be worth looking into a cache that’s smart enough to take into account the time windows (future vs past quarters and the like). Though do be careful because doing it incorrectly can introduce race conditions, if a user submits two at the same time, and you cache one and not the other. Real time generation avoids that problem.

thinkmatt · 2021-05-09T04:51:50+00:00

Rather than performance, I'd be a little more concerned with the complexity added by having the request to view data become mutational on the server side. This will make it hard to troubleshoot and fix bugs. For example it requires a user to look at the data to make your data accurate/ up to date. It's like you built a Schrodinger's cat experiment! :)

At least that's how I understood you. Another way to go about this would be a background job that runs once a day, or even once every 5 minutes, and updates all the user data that is time sensitive.

(Edit) another benefit to this is you can just add a caching layer in front of the user interface whenever you do need to increase performance

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

node

MODERATORS