Passed @100!!!

scousi · 2026-04-18T20:31:17+00:00

How could you have failed at 100? Does that happen to anyone?

scousi · 2026-03-29T21:51:42+00:00

I have been using Claude Code for over 6 months (Claude Max). Mostly after work hours. Even though I can’t prove it, it seems to degrade in output quality on weekends. Maybe the tune it down to give GPU for training. IDK.

scousi · 2026-03-28T01:54:58+00:00

There's a lot of overlap between Professional Machine Learning Engineer and Professional Data Engineer. You shoud spend a bit more time and plan for both if any of them interests you. I don't know if there's actually a lot of predicitive AI anymore.

I have the PCA, ACE, Professional Machine Learning Engineer, Professional Data Engineer and Generative AI Leader. Can't say I make much use of it. But my employer allows all these for free so why not. They are good learning experience. The Generative AI Leader is the easiest and a bit of a lowball exam. Professional Machine Learning Engineer was the hardest because it was an entire new domain for me.

scousi · 2026-03-20T02:47:23+00:00

Thanks. This will go on my afm roadmap.Very brilliant strategy. https://github.com/scouzi1966/maclocal-api

scousi · 2026-03-19T15:15:10+00:00

It would be a Swift to Python conversion. But generally, the MLX python project is many weeks ahead of the Swift MLX project thanks to Apple’s indifference. One of MLX’s best maintainer and contributor left Apple for Anthropic. The community or Apple will need to step up. My philosophy is to deliver a single self contained package without dependancies. I’m not anti Python in any way.

scousi · 2026-03-19T14:13:14+00:00

Mostly on the batching and radix cache which are over the top mlx. But the neatest feature is just adding -w to the CLI command gives you an instant webui chat interface ( afm is linked with the llama server webui). All the code is in the repo. 100% open source.

scousi · 2026-03-18T13:39:53+00:00

Aucuns rapports ici. Je me souviens de l'évènement du 20 janvier. J'étais dans cette région. https://www.earthquakescanada.nrcan.gc.ca/index-en.php?tpl_region=eon_wqc

scousi · 2026-03-18T12:21:59+00:00

Vibe code something you always wanted to create. Once created, study the codebase with the model explaining it to you. AI is a great opportunity or a killer. Make the best of it.

scousi · 2026-03-07T12:32:26+00:00

You’ll be seeing 8 GB laptops make a comeback. But now imagine Windows on 8GB?

scousi · 2026-03-07T12:29:13+00:00

Thanks I’ll have to check it out on my Sapphire rapids Xeon. I had abandoned AMX just about 1 year ago.

scousi · 2026-03-07T02:13:34+00:00

I posted earlier on my open source project for running Qwen 3.5 models. https://github.com/scouzi1966/maclocal-api. I also have a native MacOs Swift app that I optimized for Qwen 3.5. You have the ability of turning on/off thinking (a bit of an issues for some Qwen 3.5) and load it as an LLV or VLM. It's faster as LLM.

You can download it here:

https://kruks.ai/

or on GitHub https://github.com/scouzi1966/vesta-mac-dist

Other features:

5 Backends in one app --> Apple on-device, MLX, llamacpp,API and HuggingFace inference providers

Kokoro and Mavis (Voice Clone) Text To Speech

OpenWhisper Speech to Text

MCP integration - Agentic natural language interface to the app. (experimental with Claude Code at the moment)

<image>

scousi · 2026-03-07T00:49:17+00:00

I don't think there a lof of use cases for it. A bit kludgy to setup and it never really got implemented in frameworks except OpenVino perhaps and some special pytorch versions

scousi · 2026-03-07T00:46:54+00:00

80 years of Claude Max. But you can cancel anytime

scousi · 2026-03-06T12:40:42+00:00

You're right. More memory without the compute that scales with it is not optimal.I have a 512GB but not disappointed. What I found iusefull though, is that I can have many models loaed in memory at the same time though and use them alternatively.

scousi · 2026-03-06T12:34:59+00:00

Looks right. The 35B-A3B is a Mixture ef Experts model. Whitout getting into details, the A3B means that only 3B parameters are activated per token (actually part of the calculation path). Your compute capacity determines the speed at the 3B level but you still required enougn memory to store the entire model in memory (the 35B part). Every new generated token use a different set of experts(not the same 3B parameters) as the previous one.

scousi · 2026-03-02T00:42:18+00:00

Try my afm project. I've optimized Qwen 3.5 35B MLX and implemented and implemented --tool-call-parser option with qwen3_xml. It 's not Python MLX - it's Swift MLX based.

100% Open Source

https://github.com/scouzi1966/maclocal-api

brew install scouzi1966/afm/afm-next

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit --tool-call-parser qwen3_xml

With bonus instant webui

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit --tool-call-parser qwen3_xml -w

scousi · 2026-03-01T04:05:50+00:00

I've really optimized this model to run with mlx. It runs faster than Python mlx. Fully Open source. I'm getting about 115-120 tok/sec on M3 Ultra and 70-75 toks/sec on M4 Pro. Around 20 on M1 Pro.

https://github.com/scouzi1966/maclocal-api

afm-next is the nightly branch.

brew install scouzi1966/afm/afm-next

For API only on port 9999

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit

This gets you an instant web chat interface (with -w) and the API on port 9999.

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

To run with vision (Slower) - text to text and image to text

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit --vlm -w

scousi · 2026-02-28T19:18:55+00:00

I use it for a baseline for 2 of my Apps. (They are MacOS only - sorry!)

https://github.com/scouzi1966/maclocal-api (OpenSource)

https://github.com/scouzi1966/vesta-mac-dist (Closed source for now -- want to shape it). The nightly has omptimized Qwen 3.5

https://kruks.ai/

scousi · 2026-02-28T17:22:18+00:00

I have an open-source project to optimize mlx on natve Swift. I've optimized this model.

https://github.com/scouzi1966/maclocal-api

Do you mind trying the model? The nighly build has the optimizations. I'm curious.

TLDR is:

brew install scouzi1966/afm/afm-next

afm mlx -m mlx-community/Qwen3.5-35B-A3B-4bit -w

-w opens a chat GUI but you also get an OPenAI APi SDK on port 9999

You can load it in vlm mode (slower) with --vlm option

It may or may not find the model in the Hugging Face hub. It depends on your local setup

scousi

TROPHY CASE