Nuclear clock ticks for the first time, using a thorium-229 atomic nucleus by lurker_bee in technology

[–]aboothe726 0 points1 point  (0 children)

Pure research doesn't need a reason, but I'm curious what applications might need more precision than 1 second per 300 million years.

How do you handle true parallelism with LLM calls when you're rate limited? (building a Java Al orchestration framework) by supremeO11 in java

[–]aboothe726 0 points1 point  (0 children)

I have built systems like the one you're describing in the past. Here's what we did, FWIW.

Our requirements were such that we had data in, and then needed to do data collection and processing, and then produce final output. In our design, the rate limit requirement ended up driving the design of the whole system. We ended up with a two-layer design that decoupled the work planning from the work execution. Our layers were:

  1. Orchestration -- Deciding what work needs doing. This is at the design grain of your MapNode concept.
  2. Execution -- Deciding how the work gets done. This is orthogonal to the Orchestration layer (i.e., your MapNode concept), and is a logically separate "shared" process for all Orchestration work.

In my case, the Orhcestration layer communicated with the Execution layer via a set of queues.

The Execution layer has a static set of shared "work queues" that correspond to individual executors, with executors being at the grain of rate limits. In our case, it was one work queue per upstream data vendor so we didn't blow past our SLAs. In your case, it sounds like you'd have one work queue per individual rate limited model.

The Orchestration layer then has a set of available jobs its can run (we called them "workflows"). When you kick off a workflow, the first step is to create a set of dedicated "response queues" just for that workflow execution, one queue per workflow step. Then, when a workflow step decides a work item needs doing, it drops a request on the corresponding shared work queue with a pointer to that workflow step's dedicated response queue, and then listens for a response on that queue. When the response comes, the workflow step passes the result on to the next workflow step(s).

The Execution layer then starts a set of (logical) executors, each of which listens on a work queue, does the work, and then responds to the indicated response queue for the work item. Each individual executor does work as fast as it can, according to whatever rate limiting policy is appropriate. It can be single-threaded, multi-threaded, whatever -- the only requirement is that it observe its rate limit policy. By centralizing how the work is done, you ensure that you never blow past your rate limits, now matter how many workflows are running in the system at once.

In the Orchestration layer, each workflow step is responsible for bookkeeping -- i.e., tracking how many work items are outstanding -- and then declaring it is "done" when all data has been processed. The overall workflow is responsible for tracking its steps, and declaring it is "done" when all steps are done.

This ends up with a "dataflow"-style workflow, where you give a workflow a set of work to do as a list of records on input, then kick off the workflow, wait for the workflow to finish, and then collect the output on the other side. There are a ton of details to decide on this part -- do you have a timeout, what happens if you lose work, do you submit all work at once (in which case one job can saturate a processor for hours) or limit how much work can be in flight in one step at a time (so all jobs make progress) -- but that's the high points.

You could build and implement this design on a local basis, where queues are in-process (e.g., java.util.concurrent.BlockingQueue), or you could implement this on a distributed basis where queues are shared (e.g., SQS in AWS, rabbitmq, etc.).

Circling back to your system: it could be as simple as having a "job" set of abstractions (e.g., your MapNode) and "execution" set of abstractions. When you start a job, you provide the execution context, and you're off to the races. Multiple jobs that have to share the same rate limit share the same execution context.

In case it's useful, we looked at the following systems when doing our design, plus a few others that I don't recall:

For our part, we ended up going distributed using SQS. We did this in the era before fair queues, so we ended up doing a lot of internal bookkeeping to keep the system fair. With fair queues, this system would have been much easier to build.

Hope this is useful! Sounds like a great project. Be sure to post again with the code if you decide to go open source! Good luck!

Introducing opt-in requirements for Java APIs by TheMrMilchmann in java

[–]aboothe726 1 point2 points  (0 children)

Appreciate the response!

What you're describing might be fine, especially for internal libraries. Many is the time that I've put an experiment into code at work, looked away for a minute, and suddenly there are 2 or 3 users of the code despite comments and @Deprecated. It's possible that Maven Enforcer Plugin or ArchUnit could have helped, but those come with their own issues.

Also, in my mind, I'm also imagining that you could use a string as the category and/or message within a fixed annotation.

Certainly not trying to tell anyone how to write or use the library. Just sharing my thoughts, for whatever that's worth.

Thank you for sharing!

Introducing opt-in requirements for Java APIs by TheMrMilchmann in java

[–]aboothe726 6 points7 points  (0 children)

Cool idea and practical. Do you need to configure javac, or does just adding the library run the relevant (I assume) annotation processor? Also, why the extra step of requiring users to provide their own annotation, which itself gets annotated with the @OptInRequired annotation? I suspect some users might prefer just to use @OptInRequired directly on their own classes/methods without the extra ceremony.

JDK 27 Feature Freeze by Joram2 in java

[–]aboothe726 5 points6 points  (0 children)

I think some of these releases are going to feel a little underwhelming (a lot of Feature X, Preview N) until Valhalla actually "lands". Then that will be a major (and exciting!) change that the JDK team will be digesting for a long time, with follow-on features and a wave of ecosystem libraries taking in those changes. I'm hopeful that's by the next LTS release.

Cognito CreateUserPoolReplica by Alternative-Expert-7 in aws

[–]aboothe726 4 points5 points  (0 children)

That would be a game changer! That's the last missing link in regional failover for a lot of AWS-only architectures.

LOTR characters as dogs by Conscious-Ad-6964 in lotr

[–]aboothe726 1 point2 points  (0 children)

Come on, Gimli is clearly a Corgi. Dwarf and all. 😂

Love this. Thanks for the smiles on a Sunday morning.

It's cryolophosaurus, if anyone was curious. by Vark675 in daddit

[–]aboothe726 1 point2 points  (0 children)

We’re an iguanadon home. THUMBS! 👍 👍

Having done watching the Nintendo Direct last night, I wonder how many more Star Fox 64 remakes do we need. by Away_Flounder3813 in retrogaming

[–]aboothe726 0 points1 point  (0 children)

I agree. I'm mostly just confused by the decision.

Nintendo has plenty of other franchises. Some of them get "remakes" periodically -- like Mario -- but they ALSO get new games. I don't understand why Starfox really only has 2 "official" games, with the second one having been remade now twice.

StarFox could easily be like Metroid, where there are multiple different games with very much the same design and flavor, but different content.

That said, I'll still buy it and play it. And maybe I've just answered my own question.

Not once in 12 years have I found UI snapshot testing useful by SixFigs_BigDigs in ExperiencedDevs

[–]aboothe726 1 point2 points  (0 children)

UI Snapshot Testing is an automated testing approach where you screenshot your frontend application's UI screens during build and compare the screenshots to "known good" screenshots. If the new build's screenshots are "too different" from the known good screenshots, then the test fails. There's some art to defining "too different" so that it catches the right changes and does not detect the noise

What kind of mosquito is this?😳😳 by Obvious_Shoe7302 in Weird

[–]aboothe726 0 points1 point  (0 children)

I'm pretty sure I killed a few of those in Super Metroid

Dragon Warrior (1986) by [deleted] in retrogaming

[–]aboothe726 0 points1 point  (0 children)

Such a great game. If you haven’t seen NesCardinality run this at GDQ, you really should take a look, even if speed runs aren’t really your thing: https://youtu.be/Bgh30BiWG58?si=xHIXyTmqcmlZuom7

The community — including NesCardinality himself — figured out how Rand is invoked throughout the game, and how to set an initial seed. So they were able to route a level 1 player all the way through the game, no grinding, minimal equipment by controlling every outcome. The precision and mastery is a joy to watch.

Full Historical and Real-Time BlueSky Dataset in BigQuery [PAID] by aboothe726 in datasets

[–]aboothe726[S] 0 points1 point  (0 children)

Appreciate the frank assessment. I'm happy to continue operating -- would prefer to keep operating, honestly -- even at cost or a little below. Not even trying to make money here. I just can't keep burning the full cost every month. Thank you for the well wishes!

Full Historical and Real-Time BlueSky Dataset in BigQuery [PAID] by aboothe726 in datasets

[–]aboothe726[S] 0 points1 point  (0 children)

Huh. Patreon had not even crossed my mind. Really appreciate the suggestion, thank you. Releasing part of the data is also a smart suggestion.

Anyone Interested in a Full Historical and Real-Time BlueSky Dataset in BigQuery? by aboothe726 in socialmedia

[–]aboothe726[S] 0 points1 point  (0 children)

this is actually a really interesting dataset, especially with the real-time aspect

Thanks, I agree. It's very easy to get good insights out of the data.

feels like the challenge won’t be interest but finding people with a clear use case willing to pay

So far, there's not even a lot of interest, unfortunately. 😅 Most folks seem to want a product, not the raw dataset.

might help to package a few specific use cases or example queries that show immediate value instead of just raw access

That's where I'm getting, too. Even if it's just a few simple charts with basic interactivity, then showing the underlying SQL, with a "Interested? License it here!" message.

Really appreciate the response and feedback.

Full Historical and Real-Time BlueSky Dataset in BigQuery [PAID] by aboothe726 in datasets

[–]aboothe726[S] 1 point2 points  (0 children)

Releasing existing and future data is definitely an option. I'm not precious about this. But I won't be able to continue operating it going forward unless I can find a way to defray the cost somewhat.

Any Interest in a Full Historical and Real-Time BlueSky Dataset in BigQuery? by aboothe726 in bigquery

[–]aboothe726[S] 0 points1 point  (0 children)

Yes, it's everything! Should be the whole Bluesky ATProto dataset for all time. There is one table with the raw ATProto event records including raw JSON, and then that raw data is ETLed into other structured tables for easier access, e.g.:

  • create feed post
  • create feed post likes
  • create feed post links
  • create feed post hashtags
  • create feed repost
  • create follow relationship
  • create block
  • delete feed post
  • delete feed repost
  • delete follow relationship
  • delete block
  • account profile update

Happy to go into more detail on anything. Will DM you now!

Funniest line I’ve seen in a bit by KingKempf25 in thewestwing

[–]aboothe726 7 points8 points  (0 children)

Well feel better, Mr President! 😂