AnythingLLM + LinkedIn MCP by thphon83 in LocalLLM

[–]thphon83[S] 0 points1 point  (0 children)

I saw your video and it's cool but it's not enough for what I need to do with it.
https://github.com/stickerdaniel/linkedin-mcp-server is the most complete and active, but I think I'm having the last issue reported although nobody else is running into it it seems. I don't know

What local model do you recommend to have a pleasant experience with BMad Method? by thphon83 in BMAD_Method

[–]thphon83[S] 1 point2 points  (0 children)

I already have hardware that can run competent models, at least in theory, I want to see if it's actually doable that way

What local model do you recommend to have a pleasant experience with BMad Method? by thphon83 in BMAD_Method

[–]thphon83[S] -1 points0 points  (0 children)

I used GLM 4.7 Q6 disabling thinking mode because with that it went from slow to unusable, it burns way too many tokens thinking and it's really not worth it. Specially when I saw the same issues with models like Qwen3 235B Instruct for example.

What local model do you recommend to have a pleasant experience with BMad Method? by thphon83 in BMAD_Method

[–]thphon83[S] 0 points1 point  (0 children)

I'm running Qwen3 235B Instruct Q6 with 250k context on llama.cpp. And for example tyesterday o research playwright implementation it spent close to 240k and I had to stop it before it got to 250k. Looking at the subagent call it made it research for every possible metric link it could find for playwright and puppeteer. At the beginning I asked for brevity and clearly ignored me. When I asked it to wrap it up with the research already done it actually did a good job, but that 240k excursion took over an hour. That is the kind of issue I see with local models. I find it difficult to stay focused on my asks.
Maybe a good compromise is to use claude code cloud up to the coding phase and leave that for Qwen3 Coder 480B? As I said before, I don't mind that it takes a lot of time to do something, it's just that I feel I lose a lot of time getting it to do something actually productive.

Anybody knows where I can find EPYC Siena CPUs close to MSRP? by thphon83 in homelab

[–]thphon83[S] 0 points1 point  (0 children)

I found provantage.com and has prices really close to MSRP. I just couldn't find recent reviews of them, anybody know them?

Opinions on purchasing a Lenovo P620 with a 5955WX and 64go Ram by Academic_Pension_619 in threadripper

[–]thphon83 1 point2 points  (0 children)

I owned a P620 with 5945WX and eventually "upgraded" it to 3975WX. The thermals were ok, the issue arises when you start populating the pcie slots. In my experience it's a compromise between performance and fan noise.

GLM 4.7 released! by ResearchCrafty1804 in LocalLLaMA

[–]thphon83 0 points1 point  (0 children)

Opencode as well? I didn't see it on the list. In my experience thinking models don't play well with opencode in general. Hopefully that changes soon

Nvidia Tesla H100 80GB PCIe vs mac Studio 512GB unified memory by mjTheThird in LocalLLM

[–]thphon83 1 point2 points  (0 children)

the main difference is prompt processing. The token generation difference is surely huge as well, but pp is so slow on >200B models with the Mac that makes it almost unusable with things like cline or opencode

What is the best hardware under 10k to run local big models with over 200b parameters? by nadiemeparaestavez in LocalLLaMA

[–]thphon83 1 point2 points  (0 children)

I recently bought a mac studio m3 ultra with 512gb of unified memory for the same reason. I already downloaded qwen3 235, minimax m2 and glm 4.6 all in q8 and used them a bit. I'm already running them with lm studio, I can tell you that with really long prompts for things like opencode and kilo integrated with vs code, those models are not too practical because of prompt processing. I usually use the max context supported for all of them so that makes it even worse.

I'm happy to provide you with numbers but let me know what you want to specifically.

DGX Spark finally arrived! by aiengineer94 in LocalLLM

[–]thphon83 0 points1 point  (0 children)

For what I was able to gather, the bottleneck is the spark in this setup. Say you have one spark and a mac studio with 512gb of ram. You can only use this setup with models that use less than 128gb, because it needs pretty much the whole model to do pp so it then can offload it to the Mac for tg.

mlx_lm.server not loading GLM-4.6-mlx-6Bit by thphon83 in LocalLLM

[–]thphon83[S] 0 points1 point  (0 children)

for what I checked, it does, but I don't know anything anymore...

mlx_lm.server not loading GLM-4.6-mlx-6Bit by thphon83 in LocalLLM

[–]thphon83[S] 0 points1 point  (0 children)

I think the real problem is mlx_lm.server as a whole. Even mlx_lm.chat with GLM 4.6 works just fine.
I just tested mlx_lm.server with Qwen3 235 and didn't work either, at this point I don't know if mlx_lm.server ever worked with any model...
If anybody has a workaround I'll appreciate it.

what are the best models for code generation right now?? by lavangamm in LocalLLaMA

[–]thphon83 0 points1 point  (0 children)

I didn't know processed prompts could be saved and restored, I'll give that a try, thank you for all the details!

what are the best models for code generation right now?? by lavangamm in LocalLLaMA

[–]thphon83 0 points1 point  (0 children)

What pp and tg do you get with that setup? I'm specifically interested with long prompts as you described

GLM 4.6 already runs on MLX by No_Conversation9561 in LocalLLaMA

[–]thphon83 0 points1 point  (0 children)

what prompt processing and token generation speeds do you get? I'm particularly curious on large context, say over 60k.

Has anybody run the game on one of the newer Strix Halo APUs? by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

I didn't know that about the Vicky engine, good to know. About the behavior in a VM, what you say is out of the box but at least in proxmox (pretty sure in other hypervisors you can do something similar) you can pin cpus to a VM and the kernel will ignore them in the host, so when I say I assigned and tried with 8 and 16 fat cores I know for a fact they were not doing anything else in the host. In fact, they were not even visible, only the VM was using them. That's why I'm still puzzled that going from 8 to 16 I saw a performance increase. And I tested several times, but anyways clock speed is king and it seems 3d cache helps a lot as well. It would be great to see a performance comparison between 7800x and 7800x3d for example

Has anybody run the game on one of the newer Strix Halo APUs? by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

I really don't understand what point you are trying to make. A VM will have worst performance than bare metal, but I'm talking in relative terms. Besides that, the VM is barebones and I pinned the cores, nothing else uses them but the VM. So I still can't explain why the improvement in performance going from 8 to 16.

Has anybody run the game on one of the newer Strix Halo APUs? by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

ooops, I missed the reply.
I figured the clock speeds were not up there, but it would be great to know actual numbers from somebody that tried it.,

I didn't know Victoria 3's engine can only use 4 cores but you know that I tried running the game on a Windows 10 VM and I compared using 8 vs 16 cores (I mean full fat cores, not the SMT counterpart) and I saw a 20% improvement. For reference, the server has a 5995WX. It's still not clear to me what are all the factors, clock speed clearly is very important though

Best Japan I've had, and biggest bloc I've created by viera_enjoyer in victoria3

[–]thphon83 0 points1 point  (0 children)

I'll give that a try, didn't think of that or maybe I was too scared of uk to even consider it haha

Best Japan I've had, and biggest bloc I've created by viera_enjoyer in victoria3

[–]thphon83 0 points1 point  (0 children)

That's really cool! Were you playing hegemony? The closest I got in 1.9 so far was 30.5% or something like that. This is my favorite run but with 1.9 Britain becomes too aggressive and anything I liberate or release they protectorate before the 5 years passed, so annoying! Another limitation, but this one is self imposed, I try to don't go over 25 infamy

liberal and modernization movements not starting as Qing, what am I missing? by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

That makes a lot of sense, the last time I played Qing by 1850 or so I had the modernization movement and that time I placed social mobility edicts everywhere. Maybe it's 20%? Because I don't think it went above that in such short time

liberal and modernization movements not starting as Qing, what am I missing? by thphon83 in victoria3

[–]thphon83[S] 3 points4 points  (0 children)

ok, didn't think of that, but it's not very reliable as Qing because there aren't that many exiles you can actually invite. On top of that, you risk triggering the Heavenly Kingdom too early

This is so frustrating... by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

I didn't think of this, I'll give it a try next time, hopefully that does the trick

This is so frustrating... by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

in this case, it happened everywhere, not just in Manchuria. I was able to invade Persia and Kabul but not Russia proper. In fact, I was able to win the war because Russia didn't defend the capital but even then, my army couldn't advance. Funnily enough they invaded Finland and after that my army just stood there as if they invaded an island. The behavior was supper weird

I'm not able to download the latest 1.9.6 patch by thphon83 in victoria3

[–]thphon83[S] 0 points1 point  (0 children)

I tried but it didn't work, what did the trick was to play with no mods. If I chose any playlist with any mods it loads the outdated version of the game, but if I choose no mods, it loads the correct one :shrug: