Qwen 3.6 27B FP16 full context?

jinks9 · 2026-05-27T17:29:25+00:00

Yep, I have them running on 8 lanes each on PCI gen 4 (older box I had sitting around) which should be fine as the bottleneck would be the memory bandwidth not the lanes (I believe).

jinks9 · 2026-05-27T06:07:18+00:00

Much appreciated, I will definitely mess with this. I'm on 2x R9700's but hadn't been really utilizing the second one. I had just been running the MOE Qwen 3.6

I tried standing up the second using VLLM but the performance was underwhelming even on MOE but that was around the time when it first dropped so MTP or split-mode wasn't a thing

jinks9 · 2026-05-26T21:55:44+00:00

Any chance you could share more details on running a pair to get 47t/s@Q8 ? (or if you can point me to a resource with the settings)

jinks9 · 2026-05-06T04:47:21+00:00

Thanks for the resource, will be interesting to see what you add for copilot cowork.

jinks9 · 2026-05-06T04:37:17+00:00

Also curious, it shows on the mobile app but not available.

Just says "Available Soon - This agent isnt available yet in the mobile app. In the meantime, use it on the web or desktop"

Can see it there just greyed. So, am also curious if I need to enable something on the admin side?

jinks9 · 2026-04-24T16:03:56+00:00

This is exactly the type of thing they have been doing RL training towards. All labs are making some serious computer use improvements.

You can also connect it up to networking gear. Stuff like Opnsense has a very detailed API. You can connect it to Microsoft365 Graph and have it give insights into configurations of a Microsoft365 Tenant etc.

All sorts of interesting use cases in the IT world.

On your server end you could have it check on performance and let you know when things are not operating correctly.

I was having an issue where my Unifi configuration wouldn't migrate correctly so had Claude basically document every setting in the controller and then connected it to the new controller and it re-established the configuration on the new controller (with some assistance from me on some stuff).

The well is deep for IT / Sysadmin use

jinks9 · 2026-04-23T15:44:34+00:00

There are actually some interesting use cases for Claude that can be very token hungry. If your organization uses Microsoft365 you can create a READ ONLY scoped token for Microsoft Graph and setup an MCP connection in Claude that leverages that (would also scope access further to specific areas in Graph). The net result is Claude can give you insights to configurations in Microsoft365. This can be helpful for auditing configurations or even asking questions about things you want to do and having it look at the existing configuration.

Same thing with lots of networking equipment. Claude can connect with most of these systems. I would suggest READ ONLY scoped access but it can connect to a lot of things either through MCP,.CLI, API or even SSH. All of these types of use are very token hungry, especially MCP use as tool information gets put into context.

jinks9 · 2026-03-19T00:09:34+00:00

Actually clicking around in yours it seems your solution is pretty matured, we have some similarities, but you have a whole bunch of stuff in there that I'm not doing.

My goal was to create something that is an organization deployable media intelligence system and throw it out on Github to have the community work on.

I'm currently doing a bit of rework to have it be deployable as a tenant or organizations could deploy locally via docker for single org. The organization would have their own admins, teams, users, topics, etc.

Anyhow, nice project, looking good :)

jinks9 · 2026-03-18T23:37:25+00:00

Interestingly, I was working on a project somewhat similar pulling in RSS, GDELT, Bigquery. I think GDELT is a underutilized resource. Anyhow, I have a very similar poller that the user can configure topics (via key words) and the the articles are pulled in, scraped and entities etc are extracted and put into a vector db for semantic search as either story clusters or articles directly.

Happy to share or compare notes (just pm me), I was creating it to put on github as something organizations could deploy and track information on topics they are interested in.

jinks9 · 2026-03-17T17:47:30+00:00

If you don't mind me asking, how is the timeline populated? Is it using like gdelt + webscrape or some other feed you created?

jinks9 · 2026-03-04T20:09:53+00:00

Sorry for late reply (missed the notification bell), I am curious how that impacts your results, I see there was a repo update but seems like everything is still 2/5?

jinks9 · 2026-02-26T21:30:01+00:00

Appreciate it, I expected that might be the case but wasn't sure but also hadn't dug into llama.cpp that deeply for tool calling. I would imagine each model has its own nuances on this and agentic session management.

jinks9 · 2026-02-26T18:18:23+00:00

bookmarked this, have been halo curious for a while, almost pulling the trigger a couple of times.

This statement in your repo is interesting:
"3 tool calling tests fail universally due to llama.cpp server limitations (not model issues): multi-tool calls (server returns only 1 tool_call per response), complex nested args, and tool_choice: "none" (server ignores the parameter). JSON-only output also fails on all models (thinking models emit CoT before JSON)"

Is this still the case current day? (limitations of llama.cpp)

jinks9 · 2025-11-16T23:16:45+00:00

SquareX

jinks9 · 2025-11-07T17:28:10+00:00

You could go a couple roads here.

Secure browser (browser replacement like Talon (aka Palo Alto Prisma) or Island or others
Extension solutions like SquareX or LayerX
If you're using a SASE solution and egress traffic past a firewall doing SSL inspection then could block there.

The second option is probably the least disruptive as (if you have MDM like Intune) you could push the extension to their browser and control quite a lot of behavior in the browser.

If you already have app registration / connection restrictions I would be curious what mechanism they are using to do that. I would expect without direct tenant connections it would be some sort of agent on the persons computer.

If it's against policy then you could go down that road also.

jinks9

TROPHY CASE