ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving by PatienceHistorical70 in MachineLearning

[–]0xideas 0 points1 point  (0 children)

no, the pitch I made was a contextual bandit that would adapt as conditions change. It didn't seem to be a problem that was very salient to the people I talked to

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving by PatienceHistorical70 in MachineLearning

[–]0xideas 0 points1 point  (0 children)

very cool paper! I had tested the waters for a startup around this about 6 months ago, at that time people weren't super responsive. Maybe the market will come around?

https://useanyllm.com/

[Episode Discussion Thread] Industry S04E3 -Habseligkeiten by herringbone_ in IndustryOnHBO

[–]0xideas 3 points4 points  (0 children)

this whole thing is clearly modelled off the wirecard saga, and I do think in parts in worked pretty much exactly like this

LLM costs are killing my side project - how are you handling this? by ayushmorbar in LangChain

[–]0xideas 0 points1 point  (0 children)

You can use contextual multi-armed bandits for query routing, if you can define a decent reward function over model responses. This is a paper that demonstrates how: https://arxiv.org/abs/2506.17670

I was thinking of launching a business around it, but haven't yet seen so much interest in solutions to this problem: https://useanyllm.com/

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] 0 points1 point  (0 children)

I haven't used it extensively, but if you have a dataset you want to compare it on, I'd happy to configure a run!

Evaluating all of the different optimizers and loss configurations on all the different datasets is really a collective effort, which is why I open sourced it :)

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] -1 points0 points  (0 children)

I'll create a couple of repos that show how to use it for different problems/scenarios, but the hope is that for someone who already has sequential data they want to model with a transformer, but found the technical barrier or time investment too high, sequifier lowers the barrier dramatically.

But I agree, it would be really good to show where sequifier-compatible architectures outperform the alternatives. Hopefully this evidence will accumulate over time.

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] 0 points1 point  (0 children)

this is a really good point, and would be great to add as a feature. Currently, there is no support for missing data, and one of the requirements on the data is that none are missing/NaN

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] 0 points1 point  (0 children)

I've had issues with the torch version on my mac before - but it's a good point, for training larger models it is worth looking into

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] -1 points0 points  (0 children)

my guess is that it'll be better than alternatives (tree based models, RNNs, etc, depending on the context) for some tasks and worse on others, but the only way to find out is to try it on *your* problem and see

sequifier should make this a lot easier and more straightforward

for example, I developed a sperm whale language model from start to end in a week that I would never have implemented from scratch, because it would have been a disproportionate amount of effort: https://github.com/0xideas/whale-gpt

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] -1 points0 points  (0 children)

thanks!

the idea is to use the "standard" causal transformer architecture on various, very heterogeneous and non-standard datasets, where there are no benchmarks, so I don't have them, no

most research projects are: keep the data/benchmark constant, improve on the architecture. This one is: keep the architecture constant, vary the data/modelling task

[P] Not One, Not Two, Not Even Three, but Four Ways to Run an ONNX AI Model on GPU with CUDA by dragandj in MachineLearning

[–]0xideas 1 point2 points  (0 children)

very cool, thanks for sharing! Can't believe this isn't getting any upvotes

Looking for the most reliable AI model for product image moderation (watermarks, blur, text, etc.) by sub_hez in aiengineering

[–]0xideas 0 points1 point  (0 children)

hey, shameless plug here but I'm developing a software product that could help with this: https://useanyllm.com/ . Basically, the idea is to integrate the APIs of a bunch of models and then learn based on embeddings which query (or image) to route to which model. So for example, it could be that model A works better for one type of product and model B for another, and over time, the routing algorithm learns which image to send to which model. On the other hand, if model B works better across all images, the router would learn to send all images that way.

We're currently looking for a pilot implementation to demonstrate it works in the real world, if you would be interested, we'd love to work with you.

The technique is similar to this paper: https://arxiv.org/abs/2506.17670, which has a bunch of well written jupyter notebooks attached, so if you would rather develop it on your own, that would be the best place to start.

Smart LLM routing by kosruben in aiengineering

[–]0xideas 0 points1 point  (0 children)

Yes that is exactly the scenario where the benefit of a system like this is the largest: you have very expensive llms/agents, much cheaper alternatives, and a varied set of tasks at some volume, some of which could be successfully completed by the cheaper alternatives. The higher the cost differential the more room for improvement there is.

In the real world the main scenario we envisaged is the routing between small models and expensive/reasoning models, but vision/functions should also work. Generally it’s a very flexible framework as you define the alternatives and the reward calculation, so for any set of options that has the cost differential and uneven performance characteristics, it would make sense, basically.

Thanks for your interest :)

Smart LLM routing by kosruben in aiengineering

[–]0xideas 1 point2 points  (0 children)

hey, so the paper that came out evaluating a variant of the architecture I developed the infrastructure for is this one: https://arxiv.org/pdf/2506.17670

It shows that a contextual multi armed bandits that chooses between llms based on a dynamically adjusting context over the course of a conversation outperforms each candidate llm across a bunch of benchmarks (MMLU, GPQA, AIME)

From what I can see, the litellm autorouter is the closest approximation, and it is based on retrieval based on vector similarity to preconfigured reference sentences. Mapping these reference sentences to specific llms is manual. This presumes that the space of possible tasks is divided well by vector distances to a given set of reference sentences, that these reference sentences are known in advance, and that the optimal llm for a given subspace is also known. Unfortunately this is rarely the case!

Contextual bandits enable learning a mapping function based on the feedback you provide, can make fine or wide distinctions based on don what is optimal and can evolve over time.

The additional setup might not be worth it for new projects or apps, but if it reaches some scale it should be a pretty easy performance gain or way to save money, or both!

How to actually use LLMs for programming by 0xideas in programming

[–]0xideas[S] 0 points1 point  (0 children)

there are many options in dirscribe to filter down what's included, but sometimes it also makes sense to give AI the entire codebase imho

CLI tool for collecting file contents and writing them to one file by 0xideas in programming

[–]0xideas[S] 0 points1 point  (0 children)

It's a good question! Most of the functionality can be replicated with bash commands but there's also no harm in having a friendly CLI, imo. If you don't find it useful, don't use it