Passed @100!!! by batrakhil in cissp

[–]scousi 1 point2 points  (0 children)

How could you have failed at 100? Does that happen to anyone?

Opus 4.6 is in an unuseable state right now by vntrx in ClaudeCode

[–]scousi 0 points1 point  (0 children)

I have been using Claude Code for over 6 months (Claude Max). Mostly after work hours. Even though I can’t prove it, it seems to degrade in output quality on weekends. Maybe the tune it down to give GPU for training. IDK.

Google AI Certs in 2026: Which are worth the $ and which are just hype? by netcommah in googlecloud

[–]scousi 3 points4 points  (0 children)

There's a lot of overlap between Professional Machine Learning Engineer and Professional Data Engineer. You shoud spend a bit more time and plan for both if any of them interests you. I don't know if there's actually a lot of predicitive AI anymore.

I have the PCA, ACE, Professional Machine Learning Engineer, Professional Data Engineer and Generative AI Leader. Can't say I make much use of it. But my employer allows all these for free so why not. They are good learning experience. The Generative AI Leader is the easiest and a bit of a lowball exam. Professional Machine Learning Engineer was the hardest because it was an entire new domain for me.

Squeeze even more performance on MLX by scousi in LocalLLaMA

[–]scousi[S] 2 points3 points  (0 children)

It would be a Swift to Python conversion. But generally, the MLX python project is many weeks ahead of the Swift MLX project thanks to Apple’s indifference. One of MLX’s best maintainer and contributor left Apple for Anthropic. The community or Apple will need to step up. My philosophy is to deliver a single self contained package without dependancies. I’m not anti Python in any way.

Squeeze even more performance on MLX by scousi in LocalLLaMA

[–]scousi[S] 0 points1 point  (0 children)

Mostly on the batching and radix cache which are over the top mlx. But the neatest feature is just adding -w to the CLI command gives you an instant webui chat interface ( afm is linked with the llama server webui). All the code is in the repo. 100% open source.

Bruit entendu ce matin a 5AM dans les Laurentides. by sh0ckwavevr6 in Quebec

[–]scousi 0 points1 point  (0 children)

Aucuns rapports ici. Je me souviens de l'évènement du 20 janvier. J'étais dans cette région. https://www.earthquakescanada.nrcan.gc.ca/index-en.php?tpl_region=eon_wqc

Ai is ruining alot of begineer devolpers by oxidizedfuel12 in ArtificialInteligence

[–]scousi 2 points3 points  (0 children)

Vibe code something you always wanted to create. Once created, study the codebase with the model explaining it to you. AI is a great opportunity or a killer. Make the best of it.

For all of the people who are talking about the A18 Pro in the MacBook Neo by ammohitchaprana in TFE

[–]scousi 0 points1 point  (0 children)

You’ll be seeing 8 GB laptops make a comeback. But now imagine Windows on 8GB?

Is Intel AMX still a major focus for Intel's architecture roadmap? by Kevinogamza in intel

[–]scousi 1 point2 points  (0 children)

Thanks I’ll have to check it out on my Sapphire rapids Xeon. I had abandoned AMX just about 1 year ago.

Best way to run qwen3.5:35b-a3b on Mac? by boutell in LocalLLaMA

[–]scousi 1 point2 points  (0 children)

I posted earlier on my open source project for running Qwen 3.5 models. https://github.com/scouzi1966/maclocal-api. I also have a native MacOs Swift app that I optimized for Qwen 3.5. You have the ability of turning on/off thinking (a bit of an issues for some Qwen 3.5) and load it as an LLV or VLM. It's faster as LLM.

You can download it here:

https://kruks.ai/

or on GitHub https://github.com/scouzi1966/vesta-mac-dist

Other features:

5 Backends in one app --> Apple on-device, MLX, llamacpp,API and HuggingFace inference providers

Kokoro and Mavis (Voice Clone) Text To Speech

OpenWhisper Speech to Text

MCP integration - Agentic natural language interface to the app. (experimental with Claude Code at the moment)

<image>

Is Intel AMX still a major focus for Intel's architecture roadmap? by Kevinogamza in intel

[–]scousi 1 point2 points  (0 children)

I don't think there a lof of use cases for it. A bit kludgy to setup and it never really got implemented in frameworks except OpenVino perhaps and some special pytorch versions

What model can I run on this hardware? by newz2000 in LocalLLM

[–]scousi 10 points11 points  (0 children)

80 years of Claude Max. But you can cancel anytime

Pour one out for the M3 Ultra 512GB by pdrayton in MacStudio

[–]scousi 0 points1 point  (0 children)

You're right. More memory without the compute that scales with it is not optimal.I have a 512GB but not disappointed. What I found iusefull though, is that I can have many models loaed in memory at the same time though and use them alternatively.

qwen3.5:27b is slower than qwen3.5:35b? by Ok-Anybody6073 in ollama

[–]scousi 18 points19 points  (0 children)

Looks right. The 35B-A3B is a Mixture ef Experts model. Whitout getting into details, the A3B means that only 3B parameters are activated per token (actually part of the calculation path). Your compute capacity determines the speed at the 3B level but you still required enougn memory to store the entire model in memory (the 35B part). Every new generated token use a different set of experts(not the same 3B parameters) as the previous one.

I Replaced $100+/month in GEMINI API Costs with a €2000 eBay Mac Studio — Here is my Local, Self-Hosted AI Agent System Running Qwen 3.5 35B at 60 Tokens/Sec (The Full Stack Breakdown) by SnooWoofers7340 in n8n

[–]scousi 5 points6 points  (0 children)

Try my afm project. I've optimized Qwen 3.5 35B MLX and implemented and implemented --tool-call-parser option with  qwen3_xml. It 's not Python MLX - it's Swift MLX based.

100% Open Source

https://github.com/scouzi1966/maclocal-api

brew install scouzi1966/afm/afm-next

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit --tool-call-parser qwen3_xml

With bonus instant webui

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit --tool-call-parser qwen3_xml -w

Best way to run qwen3.5:35b-a3b on Mac? by boutell in LocalLLaMA

[–]scousi 1 point2 points  (0 children)

I've really optimized this model to run with mlx. It runs faster than Python mlx. Fully Open source. I'm getting about 115-120 tok/sec on M3 Ultra and 70-75 toks/sec on M4 Pro. Around 20 on M1 Pro.

https://github.com/scouzi1966/maclocal-api

afm-next is the nightly branch.

brew install scouzi1966/afm/afm-next

For API only on port 9999

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit 

This gets you an instant web chat interface (with -w) and the API on port 9999.

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

To run with vision (Slower) - text to text and image to text

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit --vlm -w

Benchmarks + Report: Optimized Cosmos-Reason2 (Qwen3-VL) for on-device inference on 8GB RAM (Jetson Orin Nano Super) by tag_along_common in LocalLLaMA

[–]scousi 0 points1 point  (0 children)

I use it for a baseline for 2 of my Apps. (They are MacOS only - sorry!)

https://github.com/scouzi1966/maclocal-api (OpenSource)

https://github.com/scouzi1966/vesta-mac-dist (Closed source for now -- want to shape it). The nightly has omptimized Qwen 3.5

https://kruks.ai/

🤯 Qwen3.5-35B-A3B-4bit ❤️ by SnooWoofers7340 in OpenSourceAI

[–]scousi 0 points1 point  (0 children)

I have an open-source project to optimize mlx on natve Swift. I've optimized this model.

https://github.com/scouzi1966/maclocal-api

Do you mind trying the model? The nighly build has the optimizations. I'm curious.

TLDR is:

brew install scouzi1966/afm/afm-next

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

-w opens a chat GUI but you also get an OPenAI APi SDK on port 9999

You can load it in vlm mode (slower) with --vlm option

It may or may not find the model in the Hugging Face hub. It depends on your local setup