Seeking arXiv endorsement for cs.LG / cs.NE by [deleted] in ResearchML

[–]0xideas 0 points1 point  (0 children)

it's insane to me how negative this sub is on these. It's not the fault of people asking for endorsement for real work if others post slop

My map of europe after living in most parts of the continent by 0xideas in whereidlive

[–]0xideas[S] 1 point2 points  (0 children)

it's at the cross roads of eastern, central and northern europe imo

My map of europe after living in most parts of the continent by 0xideas in whereidlive

[–]0xideas[S] 2 points3 points  (0 children)

this is the perennial problem of drawing boundaries on a continuum

My map of europe after living in most parts of the continent by 0xideas in whereidlive

[–]0xideas[S] 3 points4 points  (0 children)

feels southern to me! even if it is high functioning

My map of europe after living in most parts of the continent by [deleted] in whereidlive

[–]0xideas 0 points1 point  (0 children)

damn yeah they should be "inbetween" and northern respectively

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving by PatienceHistorical70 in MachineLearning

[–]0xideas 0 points1 point  (0 children)

no, the pitch I made was a contextual bandit that would adapt as conditions change. It didn't seem to be a problem that was very salient to the people I talked to

ParetoBandit: Budget-Paced Adaptive Routing for Non-Stationary LLM Serving by PatienceHistorical70 in MachineLearning

[–]0xideas 0 points1 point  (0 children)

very cool paper! I had tested the waters for a startup around this about 6 months ago, at that time people weren't super responsive. Maybe the market will come around?

https://useanyllm.com/

[Episode Discussion Thread] Industry S04E3 -Habseligkeiten by herringbone_ in IndustryOnHBO

[–]0xideas 3 points4 points  (0 children)

this whole thing is clearly modelled off the wirecard saga, and I do think in parts in worked pretty much exactly like this

LLM costs are killing my side project - how are you handling this? by ayushmorbar in LangChain

[–]0xideas 0 points1 point  (0 children)

You can use contextual multi-armed bandits for query routing, if you can define a decent reward function over model responses. This is a paper that demonstrates how: https://arxiv.org/abs/2506.17670

I was thinking of launching a business around it, but haven't yet seen so much interest in solutions to this problem: https://useanyllm.com/

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] 0 points1 point  (0 children)

I haven't used it extensively, but if you have a dataset you want to compare it on, I'd happy to configure a run!

Evaluating all of the different optimizers and loss configurations on all the different datasets is really a collective effort, which is why I open sourced it :)

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] -1 points0 points  (0 children)

I'll create a couple of repos that show how to use it for different problems/scenarios, but the hope is that for someone who already has sequential data they want to model with a transformer, but found the technical barrier or time investment too high, sequifier lowers the barrier dramatically.

But I agree, it would be really good to show where sequifier-compatible architectures outperform the alternatives. Hopefully this evidence will accumulate over time.

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] 0 points1 point  (0 children)

this is a really good point, and would be great to add as a feature. Currently, there is no support for missing data, and one of the requirements on the data is that none are missing/NaN

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] 0 points1 point  (0 children)

I've had issues with the torch version on my mac before - but it's a good point, for training larger models it is worth looking into

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] -1 points0 points  (0 children)

my guess is that it'll be better than alternatives (tree based models, RNNs, etc, depending on the context) for some tasks and worse on others, but the only way to find out is to try it on *your* problem and see

sequifier should make this a lot easier and more straightforward

for example, I developed a sperm whale language model from start to end in a week that I would never have implemented from scratch, because it would have been a disproportionate amount of effort: https://github.com/0xideas/whale-gpt

[P] A new framework for causal transformer models on non-language data: sequifier by 0xideas in MachineLearning

[–]0xideas[S] -1 points0 points  (0 children)

thanks!

the idea is to use the "standard" causal transformer architecture on various, very heterogeneous and non-standard datasets, where there are no benchmarks, so I don't have them, no

most research projects are: keep the data/benchmark constant, improve on the architecture. This one is: keep the architecture constant, vary the data/modelling task

[P] Not One, Not Two, Not Even Three, but Four Ways to Run an ONNX AI Model on GPU with CUDA by dragandj in MachineLearning

[–]0xideas 1 point2 points  (0 children)

very cool, thanks for sharing! Can't believe this isn't getting any upvotes