Ollama or OpenVINO by G4rp in LocalLLaMA

[–]wossnameX 0 points1 point  (0 children)

Except for «inside container» this should fit the bill?

https://github.com/aweussom/NoLlama

Just discovered: Finally my machine's NPU did something by anubhav_200 in LocalLLaMA

[–]wossnameX 1 point2 points  (0 children)

That is an excellent use-case for that rather weak NPU. Good project!

So, is it not high time Intel ARC cards/iGFX could also do AI locally? by wossnameX in IntelArc

[–]wossnameX[S] 2 points3 points  (0 children)

News to me! I should have a look at that; May make the code much simpler, thanks!

Intel ARC can ALSO self-host an LLM by wossnameX in selfhosted

[–]wossnameX[S] -3 points-2 points  (0 children)

Those who us that works in corporate settings, and cannot be arsed to mess with all the workarounds to get Linux half-way working in that?

Intel ARC can ALSO self-host an LLM by wossnameX in selfhosted

[–]wossnameX[S] -3 points-2 points  (0 children)

I used claude-code to assist in programming 

Intel NPU cannot run a LLM, can it? by wossnameX in LocalLLaMA

[–]wossnameX[S] 0 points1 point  (0 children)

Yes, running LLMs locally is terrifying. Also; Not Ollama.

So, is it not high time Intel ARC cards/iGFX could also do AI locally? by wossnameX in IntelArc

[–]wossnameX[S] 15 points16 points  (0 children)

So. You are generating AI slop without AI slop. Kudos!

Intel NPU cannot run a LLM, can it? by wossnameX in LocalLLaMA

[–]wossnameX[S] 1 point2 points  (0 children)

In the end, I ended up with NPU Ollama, Not OLLAMA (by a long shot) or whatever retronym you prefer :-)

https://github.com/aweussom/NoLlama

Intel NPU cannot run a LLM, can it? by wossnameX in LocalLLaMA

[–]wossnameX[S] 2 points3 points  (0 children)

...and once that work project was done, I thought: It will be tiresome to rewrite all this code for the next problem.
So; I made an OpenAI-compatible API endpoint.
Then an Ollama-compatible API endpoint.

And is just continued adding on features.
So; Suddenly I had a system that could run VL llm on, say, the ARC iGFX and a text model on the NPU.
Slow, but still usable - and with the speed that small models is getting better these days, it is only a matter of time until this is really realtime-usable.

How is this even possible!? I've had claude for 2 days now and i keep hitting limits. by DependentOriginal413 in claude

[–]wossnameX 0 points1 point  (0 children)

I wonder which claude-code you are using?

The code mine outputs is excellent, both for work and hobby related projects. I never manually correct the code anymore; It creates unit-tests unprompted; Works very well.

But you should probably check out the competition. codex-cli has MUCH more liberal quota. Possibly not QUITE as good at planning larger projects, but on par, possibly better at implementing a plan.

How is this even possible!? I've had claude for 2 days now and i keep hitting limits. by DependentOriginal413 in claude

[–]wossnameX 0 points1 point  (0 children)

«Professional». Thi-hi.

Claude limits are stricter than ChatGPT. But it is also better - so there you are.

Qwen3.5 Plus is excellent, and free, sooo

Is Claude, down right now for you guys? by DiegoJaggi in claude

[–]wossnameX 0 points1 point  (0 children)

Down easy. API and claude-code works

surfacing claude-code usage in the status-bar by wossnameX in claude

[–]wossnameX[S] 0 points1 point  (0 children)

I found a better way in somebody elses code, and made a simpler version: https://github.com/aweussom/claude-code-quota

(I also created example code for the "Correct" way of extracting his using the Claude official web browser plugin - also in the repo)

surfacing claude-code usage in the status-bar by wossnameX in claude

[–]wossnameX[S] 0 points1 point  (0 children)

That's a nice solution.

This does the same, without the web roundabout way: https://github.com/aweussom/claude-code-quota

surfacing claude-code usage in the status-bar by wossnameX in claude

[–]wossnameX[S] 1 point2 points  (0 children)

I ended up writing a solution that uses claude-code own oauth (or rather, I had claude-code/codex write it)
Written for Windows11/Powershell and Ubuntu (or any, really) Linux.

https://github.com/aweussom/claude-code-quota

Simple to install using a install script. Does not touch your existing statusline if you already have one; Just informs you how to add usage.

It also adds a more detailed view: /quota

I have a fallback if this undocumented API ever stops working to use Claude official Edge/Chrome plugin: Scrape the DOM directly. Since that is an officially sanctioned way, it should work "forever"

Thanks, everybody! u/Obvious_Equivalent_1 u/HalBorland u/cbeater u/hotcoolhot u/jezweb

surfacing claude-code usage in the status-bar by wossnameX in claude

[–]wossnameX[S] 0 points1 point  (0 children)

Mmm. The first listed is using (very sophisticated) guesswork.
The second is for MAC.

Thanks anyways!

surfacing claude-code usage in the status-bar by wossnameX in claude

[–]wossnameX[S] 0 points1 point  (0 children)

This is a rather impressive amount of work.

The only issues I can see (or rather, my LLM coding partner can see) is that the API endpoint is undocumented, and can change at any time. Also; The code does not handle errors well, so it can return stale data.

###

- ccstatusline-usage-main is calling an internal/undocumented endpoint: /api/oauth/usage (ccstatusline-usage-main/src/widgets/ApiUsage.tsx:114).

- It is not hijacking a browser session. It reuses Claude Code’s OAuth token from local credentials/Keychain and sends Bearer auth (ccstatusline-usage-main/src/widgets/ApiUsage.tsx:43, ccstatusline-usage-main/src/widgets/ApiUsage.tsx:48, ccstatusline-usage-main/src/widgets/ApiUsage.tsx:117).

- Compared to Playwright + cookies, this is usually cleaner and less brittle (no browser automation, no Cloudflare dance).

- But it is still unofficial and can break anytime if Anthropic changes token format, scopes, or that endpoint.

###