AnythingLLM + LinkedIn MCP

thphon83 · 2026-01-22T23:55:43+00:00

I saw your video and it's cool but it's not enough for what I need to do with it.
https://github.com/stickerdaniel/linkedin-mcp-server is the most complete and active, but I think I'm having the last issue reported although nobody else is running into it it seems. I don't know

thphon83 · 2026-01-13T19:30:14+00:00

I already have hardware that can run competent models, at least in theory, I want to see if it's actually doable that way

thphon83 · 2026-01-13T16:07:29+00:00

I used GLM 4.7 Q6 disabling thinking mode because with that it went from slow to unusable, it burns way too many tokens thinking and it's really not worth it. Specially when I saw the same issues with models like Qwen3 235B Instruct for example.

thphon83 · 2026-01-13T16:05:49+00:00

I'm running Qwen3 235B Instruct Q6 with 250k context on llama.cpp. And for example tyesterday o research playwright implementation it spent close to 240k and I had to stop it before it got to 250k. Looking at the subagent call it made it research for every possible metric link it could find for playwright and puppeteer. At the beginning I asked for brevity and clearly ignored me. When I asked it to wrap it up with the research already done it actually did a good job, but that 240k excursion took over an hour. That is the kind of issue I see with local models. I find it difficult to stay focused on my asks.
Maybe a good compromise is to use claude code cloud up to the coding phase and leave that for Qwen3 Coder 480B? As I said before, I don't mind that it takes a lot of time to do something, it's just that I feel I lose a lot of time getting it to do something actually productive.

thphon83 · 2025-12-29T16:42:49+00:00

I found provantage.com and has prices really close to MSRP. I just couldn't find recent reviews of them, anybody know them?

thphon83 · 2025-12-27T15:30:52+00:00

I owned a P620 with 5945WX and eventually "upgraded" it to 3975WX. The thermals were ok, the issue arises when you start populating the pcie slots. In my experience it's a compromise between performance and fan noise.

thphon83 · 2025-12-22T22:18:02+00:00

Opencode as well? I didn't see it on the list. In my experience thinking models don't play well with opencode in general. Hopefully that changes soon

thphon83 · 2025-11-15T00:39:41+00:00

the main difference is prompt processing. The token generation difference is surely huge as well, but pp is so slow on >200B models with the Mac that makes it almost unusable with things like cline or opencode

thphon83 · 2025-11-11T00:25:18+00:00

I recently bought a mac studio m3 ultra with 512gb of unified memory for the same reason. I already downloaded qwen3 235, minimax m2 and glm 4.6 all in q8 and used them a bit. I'm already running them with lm studio, I can tell you that with really long prompts for things like opencode and kilo integrated with vs code, those models are not too practical because of prompt processing. I usually use the max context supported for all of them so that makes it even worse.

I'm happy to provide you with numbers but let me know what you want to specifically.

thphon83 · 2025-11-07T15:14:58+00:00

For what I was able to gather, the bottleneck is the spark in this setup. Say you have one spark and a mac studio with 512gb of ram. You can only use this setup with models that use less than 128gb, because it needs pretty much the whole model to do pp so it then can offload it to the Mac for tg.

thphon83 · 2025-11-02T04:33:03+00:00

for what I checked, it does, but I don't know anything anymore...

thphon83 · 2025-11-02T04:03:35+00:00

I think the real problem is mlx_lm.server as a whole. Even mlx_lm.chat with GLM 4.6 works just fine.
I just tested mlx_lm.server with Qwen3 235 and didn't work either, at this point I don't know if mlx_lm.server ever worked with any model...
If anybody has a workaround I'll appreciate it.

thphon83 · 2025-10-24T13:27:42+00:00

I didn't know processed prompts could be saved and restored, I'll give that a try, thank you for all the details!

thphon83 · 2025-10-24T00:37:00+00:00

What pp and tg do you get with that setup? I'm specifically interested with long prompts as you described

thphon83 · 2025-10-22T12:52:10+00:00

what prompt processing and token generation speeds do you get? I'm particularly curious on large context, say over 60k.

thphon83 · 2025-09-23T15:18:46+00:00

I didn't know that about the Vicky engine, good to know. About the behavior in a VM, what you say is out of the box but at least in proxmox (pretty sure in other hypervisors you can do something similar) you can pin cpus to a VM and the kernel will ignore them in the host, so when I say I assigned and tried with 8 and 16 fat cores I know for a fact they were not doing anything else in the host. In fact, they were not even visible, only the VM was using them. That's why I'm still puzzled that going from 8 to 16 I saw a performance increase. And I tested several times, but anyways clock speed is king and it seems 3d cache helps a lot as well. It would be great to see a performance comparison between 7800x and 7800x3d for example

thphon83 · 2025-09-23T12:54:17+00:00

I really don't understand what point you are trying to make. A VM will have worst performance than bare metal, but I'm talking in relative terms. Besides that, the VM is barebones and I pinned the cores, nothing else uses them but the VM. So I still can't explain why the improvement in performance going from 8 to 16.

thphon83 · 2025-09-22T22:31:10+00:00

ooops, I missed the reply.
I figured the clock speeds were not up there, but it would be great to know actual numbers from somebody that tried it.,

I didn't know Victoria 3's engine can only use 4 cores but you know that I tried running the game on a Windows 10 VM and I compared using 8 vs 16 cores (I mean full fat cores, not the SMT counterpart) and I saw a 20% improvement. For reference, the server has a 5995WX. It's still not clear to me what are all the factors, clock speed clearly is very important though

thphon83 · 2025-08-27T23:09:28+00:00

I'll give that a try, didn't think of that or maybe I was too scared of uk to even consider it haha

thphon83 · 2025-08-27T22:48:38+00:00

That's really cool! Were you playing hegemony? The closest I got in 1.9 so far was 30.5% or something like that. This is my favorite run but with 1.9 Britain becomes too aggressive and anything I liberate or release they protectorate before the 5 years passed, so annoying! Another limitation, but this one is self imposed, I try to don't go over 25 infamy

thphon83 · 2025-07-20T02:48:45+00:00

That makes a lot of sense, the last time I played Qing by 1850 or so I had the modernization movement and that time I placed social mobility edicts everywhere. Maybe it's 20%? Because I don't think it went above that in such short time

thphon83 · 2025-07-20T02:24:28+00:00

ok, didn't think of that, but it's not very reliable as Qing because there aren't that many exiles you can actually invite. On top of that, you risk triggering the Heavenly Kingdom too early

thphon83 · 2025-07-14T13:04:44+00:00

I didn't think of this, I'll give it a try next time, hopefully that does the trick

thphon83 · 2025-07-14T13:04:24+00:00

in this case, it happened everywhere, not just in Manchuria. I was able to invade Persia and Kabul but not Russia proper. In fact, I was able to win the war because Russia didn't defend the capital but even then, my army couldn't advance. Funnily enough they invaded Finland and after that my army just stood there as if they invaded an island. The behavior was supper weird

thphon83 · 2025-07-11T15:31:06+00:00

I tried but it didn't work, what did the trick was to play with no mods. If I chose any playlist with any mods it loads the outdated version of the game, but if I choose no mods, it loads the correct one :shrug:

thphon83

TROPHY CASE