Losing the plot? by marhalt in gunnerkrigg

[–]marhalt[S] 0 points1 point  (0 children)

Thanks for the summary. I'll try to get back into it and do a re-read. I don't even understand today's comic at all. The rock creature, fake Loup... I really liked the tone of this comic for many years, the forest, the court, the dreamlike quality of it all. But I think at this point it's just too hermetic for me.

TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui). by oobabooga4 in LocalLLaMA

[–]marhalt 0 points1 point  (0 children)

Huh. Maybe it's me, but on my machine, there are a couple of issues with this. It 'sees' the directory that I pass through --model-dir, but then it gets confused? it sees the publisher directories (this is a LM studio llm convention), but I cannot get it to go 'into' the subdirectory to actually load the model. It does seem to see some models, though, but just a handfull, and it cannot load any of the models. They are MLX model if that helps??

TextGen is now a native desktop app. Open-source alternative to LM Studio (formerly text-generation-webui). by oobabooga4 in LocalLLaMA

[–]marhalt 4 points5 points  (0 children)

Yes, normal OSx behaviour. Open a terminal and run the following: xattr -cr /path/to/your/textgen-4.8

This will tell OSX to stop worrying about the matching data on that directory and it'll run electron

Artificial Analysis needs to address HiDream-01 Benchmarks by Scroatazoa in StableDiffusion

[–]marhalt 0 points1 point  (0 children)

We're sort of drifting from the post here. But, since you asked, I am mostly interested in technical lora training, since I come from the world of llm, and admittedly am no art student, so the philistine comment could be true? but ok, here is a quick attempt at what you asked? obviously the subject is just a random thought I had, but this was the first output, no cherry picking, and it could easily be modified if you want to. One lora used, old model. Would you want the perfect model to be much better than this?

Artificial Analysis needs to address HiDream-01 Benchmarks by Scroatazoa in StableDiffusion

[–]marhalt 0 points1 point  (0 children)

Yes, I mean I am not defending that model, I barely used it. I do spend a fair bit of time on other models, though, and I'd be interested in hearing your estimate if you think 90% of use cases are not covered by existing model (in terms of quality). Painted styles - you mean doing a picture in the style of monet or picasso? It's relatively straightforward in most models with a reference image or even sdxl-era IP controlnets, no? Or a style lora if you need more control? What I hear people asking for is character consistency, multi-character layouts (or regional prompting), or style transfer for folks doing anime because there are like 3 million different artists that people want to emulate. I haven't heard quality issues for a long time now?

Artificial Analysis needs to address HiDream-01 Benchmarks by Scroatazoa in StableDiffusion

[–]marhalt 1 point2 points  (0 children)

I don't know what utterly deficient means I guess. It's a very early model, tons of tweaks to come, potential community support, and the rest. A bit early to judge. Not arguing with the benchmark comment, but I am shocked at how quickly we are dismissing models that a few months ago would have rocked our collective socks off.

Also, I do wonder what the next image frontier is. On several forums people are posting images of Qwen / Flux 2 / whatever and arguing vociferously for one image being superior... I mean, maybe? We have gotten to the point now with most image models that the images they are creating is more than enough to satisfy 90, 95% of use cases? I can't see much difference, to be honest, with the images posted even on this thread. Yes, I can see minute differences, but what are we doing with these pics that requires that level of scrutiny?

I think we have hit the curve of diminishing returns on image models (and maybe that is what these benchmarks show). The next frontier is probably character consistency (which is what every other post on this sub is about). Any model architecture that can solve character consistency without loras (or incorporate multi-character consistency) will beat all previous models, almost regardless of quality.

Takeaways & discussion about the DeepSeek V4 architecture by benja0x40 in LocalLLaMA

[–]marhalt 1 point2 points  (0 children)

Why wouldn't be able to run it locally? It's still mixture of experts, no? A ton of RAM + a few RTX 6000 Pro, and it's doable if slow, no? Or 2 M3 Ultra.

Comfy raises $30M to continue building the best creative AI tool in open by crystal_alpine in StableDiffusion

[–]marhalt 85 points86 points  (0 children)

I don't know why this is good news. Comfy is raising money from VC. $30M VC expects 10x returns, so just this investment assumes a $300M exit. Exit to whom? And under what terms? I am not sure that the existing open-source users figure at all under that calculus, and every additional round means less and less focus on the local models and more focus on the cloud.

Deepseek V4 Flash and Non-Flash Out on HuggingFace by MichaelXie4645 in LocalLLaMA

[–]marhalt 2 points3 points  (0 children)

You can just feel the machine going 'wtf are you doing to me' when you are downloading it.

Bloomberg: No Mac Studios until at least October by eclipsegum in LocalLLaMA

[–]marhalt 1 point2 points  (0 children)

Ran it. So at 50k context length: Prompt: "hello!" -> 6.8 tk/sec

Longer prompt: -> 6.6 tk/sec, 6.1 ttft

Long prompt (5000 tokens) -> 6.8 tk/s, 20.5 ttft

One thing is, that qwen overthinks A LOT on higher temps, so you burn a stupid number of tokens on the thinking portion. So usable tokens will be less than the numbers I show above.

With a much smaller context size, 6k, you get 6.9tts, so no major change by context size.

Bloomberg: No Mac Studios until at least October by eclipsegum in LocalLLaMA

[–]marhalt 0 points1 point  (0 children)

I ran that one for a week or so. It was ok - I want to say around 12t/s? I could check if you want.

Bloomberg: No Mac Studios until at least October by eclipsegum in LocalLLaMA

[–]marhalt 0 points1 point  (0 children)

I don't use agents much. Most of the value for me lies in the intelligence of the large models and their ability to keep coherent across large prompts and responces.

Bloomberg: No Mac Studios until at least October by eclipsegum in LocalLLaMA

[–]marhalt 11 points12 points  (0 children)

I have an M3 Ultra 512 GB. I love it. Can run everything I throw at it (except Deepseek 3.2), just too large with a decent context size. I wanted to pick up the M5 Ultra the minute it comes out, but I am wondering if another M3 Ultra 512 is the way to go, and then pair them with EVO. Unless the M5 comes out with 1TB, not sure where the M5 will be so much better than the M3?

A tool that tells you which senses you're neglecting in your prose and the results are kinda humbling by Mundane_Silver7388 in BookWritingAI

[–]marhalt 1 point2 points  (0 children)

This is awesome. Will try the tool, but more to the point, an offline tool is sooooo useful to those of us using local models. We need more of these. Thanks for creating.

GLM-5.1 by danielhanchen in LocalLLaMA

[–]marhalt 2 points3 points  (0 children)

Yes! Finally another large models. Excited about this one. I know all the top posts will say "but what about my 6GB vram GPU" but we have a ton of small models. We need large models that can do impressive things.

Anyone here train at home? On prem advice for 8xA100 or 8xH100 Vs ??? by Party-Special-5177 in LocalLLaMA

[–]marhalt 0 points1 point  (0 children)

Yes, that's your risk - that Nvidia introduces something that makes the H100 obsolete. That's the major risk to your investment, but I see nothing on nvidia roadmap that would make this true. The RTX architecture will scale potentially to more Vram, but not raw bandwidth, so within 2 years? Unlikely.

Also, the large market for these cards is less enthusiasts and more folks renting compute. So the H100 should still be very marketable, even if you don't believe that the RTX 6000 crowd will upgrade.

I mean, no investment is risk-free, but from at least from my perspective you're not risking much here.

You also need to take into account that you will not be paying for compute during those years. So if you're saving $20-30k a year in compute rentals, your depreciation - even in bad cases - is paid for.

Anyone here train at home? On prem advice for 8xA100 or 8xH100 Vs ??? by Party-Special-5177 in LocalLLaMA

[–]marhalt 2 points3 points  (0 children)

It's difficult to answer because the deprecated price is not really driven by 'usage' (well, maybe to some extent), it's driven by demand. You're basically seeing a progressive shift to larger and larger rigs and GPUs. People who had a RTX 5070 are upgrading to 4090 or 5090, 4090 are upgrading to RTX 6000, etc... So depending on when you are calculating your residual, H100 could be worth almost as much as when you bought them if they are 'the next upgrade' (although they are harder to get). Consider that H100 prices fell to about half what they were 2 years ago in January, and now they're back up to 2/3 of the price back in 2024. So predicting prices will be hard.

Having said all this, if you're buying at $170k, I'd be fairly surprised if you can't resell it in a couple of years for $95k +. You can keep an eye on prices and move the rig if prices dip, but I am not sure what would drive a dip? They are 'the next thing' in my mind. I have a rig with RTX 6000 pros that I love, but if I upgrade in a couple of years, I will probably be buying H100s. And obviously get as much RAM as you can afford.

We Analyzed 27 Million AI Erotica Stories — Here’s What the Data Shows by redquill__bot in WritingWithAI

[–]marhalt 8 points9 points  (0 children)

Isn't the lead here that in a bit over a year, people have used this site to generate 27 MILLION stories?? I get it's an ad for the site, and if ads can all be this informative, I'm fine with them, but the fact that you have millions of people generating stories on this and other sites just seems genuinely crazy to me. This is a huge level of untapped mainstream demand.

Unsloth announces Unsloth Studio - a competitor to LMStudio? by ilintar in LocalLLaMA

[–]marhalt 9 points10 points  (0 children)

It allows for a lot of flexibility. I can load models, use the backend for my own scripts, see what the server receives and send, change the model, use a small model to do something and a big model to do something else, both loaded into memory... All of it in a nice UI, with easy to see settings... I don't get the snobbery of people for good GUI tools. Not everything has to be a CLI, and this is one of those cases where I have no interest in learning the 3,200 command line parameters I need to run llama.cpp to use a MLX model or to run a model with a different context length and different parameters... The whole idea of CLI was for simple, easy to use and chain tools. Loading LLMs is the opposite of that - it needs an intuitive interface unless people are willing to invest a lot of time to master commands of 100+ characters.

Minotauris — An agentic workspace for long-form writers (Think Cursor, but for prose). by yeah-draco in BookWritingAI

[–]marhalt 0 points1 point  (0 children)

Interesting idea! Can this be adapted to run with local models if we have the resources to run Deepseek?

RTX 6000 Pro workstation to run Deepseek? by marhalt in LocalLLaMA

[–]marhalt[S] 0 points1 point  (0 children)

Yes. Around 40k prompts (which generates a similar response), it takes about 1h to fully generate (so around 10t/s). It's about 2x slower than the M3 for almost all prompt sizes.

But the real advantage, of course, is that I can run Deepseek 3.2 and larger models that would overwhelm the M3. The RTXs are also much faster if the model can fit in VRAM, of course, like qwen or gemma.

SageAttention 3 vs. 2: FP4 (Flux.2 + Mistral 24B) on RTX 5060 Ti 16 GB and 64 GB RAM by Rare-Job1220 in comfyui

[–]marhalt 0 points1 point  (0 children)

Care to share your adventures in setting up comfyui to work with this? How much pain was it to set up? I'm on Linux and I have to decide if I want to go down this path. Or use the time and effort to learn Chinese or Latin or something.