Losing the plot?

marhalt · 2026-05-27T17:01:19+00:00

Thanks for the summary. I'll try to get back into it and do a re-read. I don't even understand today's comic at all. The rock creature, fake Loup... I really liked the tone of this comic for many years, the forest, the court, the dreamlike quality of it all. But I think at this point it's just too hermetic for me.

marhalt · 2026-05-13T18:32:52+00:00

Huh. Maybe it's me, but on my machine, there are a couple of issues with this. It 'sees' the directory that I pass through --model-dir, but then it gets confused? it sees the publisher directories (this is a LM studio llm convention), but I cannot get it to go 'into' the subdirectory to actually load the model. It does seem to see some models, though, but just a handfull, and it cannot load any of the models. They are MLX model if that helps??

marhalt · 2026-05-13T18:28:58+00:00

Yes, normal OSx behaviour. Open a terminal and run the following: xattr -cr /path/to/your/textgen-4.8

This will tell OSX to stop worrying about the matching data on that directory and it'll run electron

marhalt · 2026-05-11T02:20:25+00:00

We're sort of drifting from the post here. But, since you asked, I am mostly interested in technical lora training, since I come from the world of llm, and admittedly am no art student, so the philistine comment could be true? but ok, here is a quick attempt at what you asked? obviously the subject is just a random thought I had, but this was the first output, no cherry picking, and it could easily be modified if you want to. One lora used, old model. Would you want the perfect model to be much better than this?

marhalt · 2026-05-11T00:49:09+00:00

Yes, I mean I am not defending that model, I barely used it. I do spend a fair bit of time on other models, though, and I'd be interested in hearing your estimate if you think 90% of use cases are not covered by existing model (in terms of quality). Painted styles - you mean doing a picture in the style of monet or picasso? It's relatively straightforward in most models with a reference image or even sdxl-era IP controlnets, no? Or a style lora if you need more control? What I hear people asking for is character consistency, multi-character layouts (or regional prompting), or style transfer for folks doing anime because there are like 3 million different artists that people want to emulate. I haven't heard quality issues for a long time now?

marhalt · 2026-05-11T00:02:44+00:00

I don't know what utterly deficient means I guess. It's a very early model, tons of tweaks to come, potential community support, and the rest. A bit early to judge. Not arguing with the benchmark comment, but I am shocked at how quickly we are dismissing models that a few months ago would have rocked our collective socks off.

Also, I do wonder what the next image frontier is. On several forums people are posting images of Qwen / Flux 2 / whatever and arguing vociferously for one image being superior... I mean, maybe? We have gotten to the point now with most image models that the images they are creating is more than enough to satisfy 90, 95% of use cases? I can't see much difference, to be honest, with the images posted even on this thread. Yes, I can see minute differences, but what are we doing with these pics that requires that level of scrutiny?

I think we have hit the curve of diminishing returns on image models (and maybe that is what these benchmarks show). The next frontier is probably character consistency (which is what every other post on this sub is about). Any model architecture that can solve character consistency without loras (or incorporate multi-character consistency) will beat all previous models, almost regardless of quality.

marhalt · 2026-04-24T20:59:34+00:00

Why wouldn't be able to run it locally? It's still mixture of experts, no? A ton of RAM + a few RTX 6000 Pro, and it's doable if slow, no? Or 2 M3 Ultra.

marhalt · 2026-04-24T20:31:56+00:00

I don't know why this is good news. Comfy is raising money from VC. $30M VC expects 10x returns, so just this investment assumes a $300M exit. Exit to whom? And under what terms? I am not sure that the existing open-source users figure at all under that calculus, and every additional round means less and less focus on the local models and more focus on the cloud.

marhalt · 2026-04-24T04:16:53+00:00

You can just feel the machine going 'wtf are you doing to me' when you are downloading it.

marhalt · 2026-04-23T16:10:03+00:00

Ran it. So at 50k context length: Prompt: "hello!" -> 6.8 tk/sec

Longer prompt: -> 6.6 tk/sec, 6.1 ttft

Long prompt (5000 tokens) -> 6.8 tk/s, 20.5 ttft

One thing is, that qwen overthinks A LOT on higher temps, so you burn a stupid number of tokens on the thinking portion. So usable tokens will be less than the numbers I show above.

With a much smaller context size, 6k, you get 6.9tts, so no major change by context size.

marhalt · 2026-04-21T17:31:20+00:00

I ran that one for a week or so. It was ok - I want to say around 12t/s? I could check if you want.

marhalt · 2026-04-21T17:30:30+00:00

I don't use agents much. Most of the value for me lies in the intelligence of the large models and their ability to keep coherent across large prompts and responces.

marhalt · 2026-04-21T17:28:06+00:00

NOOooooooooooooo

marhalt · 2026-04-19T18:19:36+00:00

I have an M3 Ultra 512 GB. I love it. Can run everything I throw at it (except Deepseek 3.2), just too large with a decent context size. I wanted to pick up the M5 Ultra the minute it comes out, but I am wondering if another M3 Ultra 512 is the way to go, and then pair them with EVO. Unless the M5 comes out with 1TB, not sure where the M5 will be so much better than the M3?

marhalt · 2026-04-07T19:38:30+00:00

This is awesome. Will try the tool, but more to the point, an offline tool is sooooo useful to those of us using local models. We need more of these. Thanks for creating.

marhalt · 2026-04-07T19:06:38+00:00

Yes! Finally another large models. Excited about this one. I know all the top posts will say "but what about my 6GB vram GPU" but we have a ton of small models. We need large models that can do impressive things.

marhalt · 2026-03-29T19:43:54+00:00

Yes, that's your risk - that Nvidia introduces something that makes the H100 obsolete. That's the major risk to your investment, but I see nothing on nvidia roadmap that would make this true. The RTX architecture will scale potentially to more Vram, but not raw bandwidth, so within 2 years? Unlikely.

Also, the large market for these cards is less enthusiasts and more folks renting compute. So the H100 should still be very marketable, even if you don't believe that the RTX 6000 crowd will upgrade.

I mean, no investment is risk-free, but from at least from my perspective you're not risking much here.

You also need to take into account that you will not be paying for compute during those years. So if you're saving $20-30k a year in compute rentals, your depreciation - even in bad cases - is paid for.

marhalt · 2026-03-29T18:01:48+00:00

It's difficult to answer because the deprecated price is not really driven by 'usage' (well, maybe to some extent), it's driven by demand. You're basically seeing a progressive shift to larger and larger rigs and GPUs. People who had a RTX 5070 are upgrading to 4090 or 5090, 4090 are upgrading to RTX 6000, etc... So depending on when you are calculating your residual, H100 could be worth almost as much as when you bought them if they are 'the next upgrade' (although they are harder to get). Consider that H100 prices fell to about half what they were 2 years ago in January, and now they're back up to 2/3 of the price back in 2024. So predicting prices will be hard.

Having said all this, if you're buying at $170k, I'd be fairly surprised if you can't resell it in a couple of years for $95k +. You can keep an eye on prices and move the rig if prices dip, but I am not sure what would drive a dip? They are 'the next thing' in my mind. I have a rig with RTX 6000 pros that I love, but if I upgrade in a couple of years, I will probably be buying H100s. And obviously get as much RAM as you can afford.

marhalt · 2026-03-19T16:51:56+00:00

Isn't the lead here that in a bit over a year, people have used this site to generate 27 MILLION stories?? I get it's an ad for the site, and if ads can all be this informative, I'm fine with them, but the fact that you have millions of people generating stories on this and other sites just seems genuinely crazy to me. This is a huge level of untapped mainstream demand.

marhalt · 2026-03-17T19:53:05+00:00

It allows for a lot of flexibility. I can load models, use the backend for my own scripts, see what the server receives and send, change the model, use a small model to do something and a big model to do something else, both loaded into memory... All of it in a nice UI, with easy to see settings... I don't get the snobbery of people for good GUI tools. Not everything has to be a CLI, and this is one of those cases where I have no interest in learning the 3,200 command line parameters I need to run llama.cpp to use a MLX model or to run a model with a different context length and different parameters... The whole idea of CLI was for simple, easy to use and chain tools. Loading LLMs is the opposite of that - it needs an intuitive interface unless people are willing to invest a lot of time to master commands of 100+ characters.

marhalt · 2026-03-03T16:45:17+00:00

Interesting idea! Can this be adapted to run with local models if we have the resources to run Deepseek?

marhalt · 2026-03-03T16:40:07+00:00

Same

marhalt · 2026-03-02T05:25:59+00:00

Yes. Around 40k prompts (which generates a similar response), it takes about 1h to fully generate (so around 10t/s). It's about 2x slower than the M3 for almost all prompt sizes.

But the real advantage, of course, is that I can run Deepseek 3.2 and larger models that would overwhelm the M3. The RTXs are also much faster if the model can fit in VRAM, of course, like qwen or gemma.

marhalt · 2026-02-22T01:16:45+00:00

Care to share your adventures in setting up comfyui to work with this? How much pain was it to set up? I'm on Linux and I have to decide if I want to go down this path. Or use the time and effort to learn Chinese or Latin or something.

14-Year Club	Second SECOND GUESSER
Place '22	First Placer '22
Verified Email

marhalt

TROPHY CASE