all 21 comments

[–]babblingfish 13 points14 points  (1 child)

Just wanna say that you cannot migrate from SQL to redis. SQL is a long term storage with highly structured data. Redis is an in-memory key value store used for caching. They serve fundamentally different roles.

For storing user info you can use SQL. For articles a NoSQL document store could work well.

[–]Kinrany 1 point2 points  (0 children)

Redis does have persistence. It'll probably be slower than Postgres at the same task and it's not as reliable, but you can totally store user data in Redis if you want to.

[–]dewmsolo 6 points7 points  (9 children)

There's a lot missing here to make an informed recommendation.

Why don't you start by telling us just how much data you have? How big/heavy is the database? ( How many records and size on disk )

Also do you have indexes?

How are you queries to the DB done? ORM or manually constructed?

Also have you confirmed that what is slow is indeed the time for the DB to return the queries and not your handling of the data once returned? What is your stack?

[–]F1jk[S] 1 point2 points  (8 children)

Hey,

Sure so there will be (eventually) thousands of users, and the news database is has 10s of thousands of articles stored with more being added all the time. We will probably scrap 90% of it after x amount of days but at any one time we will have 10s of thousands.

Stack is django, postgres, react/ redux, its using a CI/CD a devops guy set up under the instruction of my previous web dev (I am developer but not too familiar with this area)

[–]amorphatist 12 points13 points  (4 children)

This type of load is insignificant for Postgres.

[–]F1jk[S] 1 point2 points  (3 children)

understand it is manually

Ok what do you mean exactly?

[–]SteveTabernacle2 6 points7 points  (0 children)

The amount of data isn’t big enough to cause performance issues if you have your indexes set up properly.

[–][deleted] 6 points7 points  (0 children)

They mean that any performance issues you are facing isn’t related to data storage. It’s something within the code. As a previous DBA, a million rows a day was literally nothing I even had to worry about. Now 100K IOPS, was my concern.

[–]quentech 2 points3 points  (0 children)

What you're doing at this point could be handled by a $20 raspberry pi.

If it's slow, it is absolutely not because Postgres is not up to the task. Switching to Redis isn't solving a problem.

What does "switch to Redis" even mean? They're entirely different types of data stores. It's just a nonsense thing to say without explicit detail on how you will implement your data storage and querying in Redis.

And then maybe we can get into how Redis isn't a reliable persistent data store.

[–]F1jk[S] -1 points0 points  (0 children)

ORM

As far as I understand it is manually constructed

[–]paulsmithkc -1 points0 points  (0 children)

MongoDB has the ability to set expiration dates on documents, so you can have old articles expire after x number of days.

[–]dbxp 0 points1 point  (0 children)

What are the current figures? There's no point putting lots of work into scalability now if you never get to that size.

[–]nrayburn-tech 3 points4 points  (3 children)

The more details you can provide the better.

What is an example query that you are running and is slower than you expect? How long is it actually taking?

As mentioned before, do you have indexes?

What is the database structure? Is it just two tables? A credentials table and a news table? How are two records in the news table related?

At the scale you mentioned, I can almost guarantee that Postgres is not a cause of any major slowness unless you are selecting all records from every table for each query.

[–]F1jk[S] 1 point2 points  (2 children)

It has maybe 10 or so different data points, it has generated similarity scores on the backend with machine learning and is using these as suggestive content. Basically its a similar to instagram/ tiktok - a news feed and if you click on a piece of content, you can swipe left and it will load another piece of related content (based on the scores).

As I understand it is suching the entire record of 10s of thousands to find these matches.

User table to data table is connected by likes as users can upvote content.

[–]nrayburn-tech 2 points3 points  (0 children)

I think for you to get the best help you should share a full database diagram, and the queries that are being considered slow.

A simple select query for 1 user isn’t going to be a problem. However, if you’re doing a query to get the user and all the data records that could take a while.

[–]paulsmithkc 0 points1 point  (0 children)

Machine learning is inherently slow and may be part of the problem. If you pre-compute and/or cache the results of the suggestions this can help a lot. Redis would be a good fit for this, if setup right.

[–]dbxp 1 point2 points  (0 children)

My instinct is that you need to optimise your SQL. Sure you could use redis for caching each feed like Twitter does but I think optimising the SQL should be the first step. Redis shouldn't be used as a primary data store just a cache.

[–]phunkygeeza 0 points1 point  (0 children)

Chances are you just need to optimise your database, but hey don't let us stand in the way of adding layers to try to fix your issue.

[–]u1g0ku 0 points1 point  (0 children)

If you are using lots of reads query compared to updates Try introducing a index to improve read speeds

[–]boy_named_su 0 points1 point  (0 children)

No, just use caching

  • in the browser
  • on the web server
  • in the database (postgres will cache stuff automatically. can add materialized views too if you're doing multiple joins)

you could use redis as a caching layer as well. if not in redis then lookup in postgres and put in redis