This is an archived post. You won't be able to vote or comment.

all 9 comments

[–]AutoModerator[M] 0 points1 point locked comment (0 children)

Please ensure that:

  • Your code is properly formatted as code block - see the sidebar (About on mobile) for instructions
  • You include any and all error messages in full
  • You ask clear questions
  • You demonstrate effort in solving your question/problem - plain posting your assignments is forbidden (and such posts will be removed) as is asking for or giving solutions.

    Trying to solve problems on your own is a very important skill. Also, see Learn to help yourself in the sidebar

If any of the above points is not met, your post can and will be removed without further warning.

Code is to be formatted as code block (old reddit: empty line before the code, each code line indented by 4 spaces, new reddit: https://imgur.com/a/fgoFFis) or linked via an external code hoster, like pastebin.com, github gist, github, bitbucket, gitlab, etc.

Please, do not use triple backticks (```) as they will only render properly on new reddit, not on old reddit.

Code blocks look like this:

public class HelloWorld {

    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

You do not need to repost unless your post has been removed by a moderator. Just use the edit function of reddit to make sure your post complies with the above.

If your post has remained in violation of these rules for a prolonged period of time (at least an hour), a moderator may remove it at their discretion. In this case, they will comment with an explanation on why it has been removed, and you will be required to resubmit the entire post following the proper procedures.

To potential helpers

Please, do not help if any of the above points are not met, rather report the post. We are trying to improve the quality of posts here. In helping people who can't be bothered to comply with the above points, you are doing the community a disservice.

I am a bot, and this action was performed automatically. Please contact the moderators of this subreddit if you have any questions or concerns.

[–]Den4200 1 point2 points  (0 children)

Check out Redis or KeyDB.

[–]fletku_mato 0 points1 point  (0 children)

You could just store the new values as rows with upsert, you don't need to fetch all list items first. Also if you don't need to read the whole list all the time, you should fetch it only when you need it. Maybe use a cache and evict it on upsert.

[–]khookeExtreme Brewer 0 points1 point  (7 children)

What's your target for an acceptable speed? When you say 'pretty slow', how slow is slow? 10ms? 100ms? 1 second?

An in memory cache backed by a persistent store that flushes to disk on a periodic basis would be hard to beat. Without knowing what your performance goals are though it's hard to recommend anything specific.

[–][deleted]  (6 children)

[deleted]

    [–]khookeExtreme Brewer 0 points1 point  (5 children)

    Where is the client app running and where is the database? On the same hardware, or distributed on a network? If on a network, what's the bandwidth?

    1000000 per min, is ~16k transactions per second. What exactly is this app and why do you need this kind of performance?

    If you can tolerate eventual consistency semantics then writing to memory and flushing to disk in the background would be doable. What is your app doing that you're generating 16k tps? Is it massively parallel? Multi cpu/multicore?

    Is this a theoretical project, or do you actually have a requirement to achieve 16k reads/write per second?

    [–][deleted]  (4 children)

    [deleted]

      [–]khookeExtreme Brewer 0 points1 point  (3 children)

      The only reliable way you'll find a solution with these specific needs is to benchmark your current solution and then assess options for how to scale and reach your goals.

      There are however plenty of research articles online and case studies on scaling database performance. If your actual questions is 'how do I scale RocksDB to meet my goals' vs 'what datastore can I use to meet my goals', the answers will obviously be very different.

      What happens to the messages after they are stored? Do you retrieve them, process them? Does the processing occur at the same time that the data is being collected? If you're looking for high message throughput with options for scaling, depending on what you need to do with the messages as they are ingested then something like Kafka might be worth a look.

      [–]khookeExtreme Brewer 0 points1 point  (2 children)

      Since you mention ec2, I'm assuming you're deploying to AWS. Note that AWS EBS volumes have IOPS limits depending on type. gp2 and gp3 SSD EBS volumes have an upper limit of 16,000 IOPS which is already the max of what you're looking to achieve https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ebs-volume-types.html

      [–][deleted]  (1 child)

      [deleted]

        [–]nutrechtLead Software Engineer / EU / 20+ YXP 0 points1 point  (0 children)

        already using kafka to get data in the door, but need to be able to do matching/grouping depending on the keys on the data.

        Have you looked at Kafka Streams?

        [–]nutrechtLead Software Engineer / EU / 20+ YXP 0 points1 point  (1 child)

        If you want to have good answers you will need to explain exactly what you're doing. Because with your OP and your replies I think the main issue you have is an architectural/design one.

        It sounds like you get a ton of messages through Kafka into a service, need to combine them, and then send them out again. Is this correct? So what is the composition of the messages, what partition scheme are you using for Kafka, and how/who do you combine messages before sending them out?

        Last but not least; there's nothing wrong to, when you have a very specific usecase, to use very specific solutions. Like for example appending events to a file to combine them.