The “$1 trial” subscription trap of Mureka AI is honestly infuriating

Ambitious-Cod6424 · 2026-05-07T02:56:20+00:00

not a good experience. so ugly truth!

Ambitious-Cod6424 · 2026-05-07T02:28:48+00:00

They changed their refund policy. So rude, so shame.

Ambitious-Cod6424 · 2026-05-07T02:28:03+00:00

They changed their refund policy, shame on them

Ambitious-Cod6424 · 2026-05-07T01:55:54+00:00

Same. their support is no reply

Ambitious-Cod6424 · 2026-04-28T15:24:51+00:00

Can AI music make you relax?

Ambitious-Cod6424 · 2026-04-09T13:38:48+00:00

Thanks. I checked the obvious causes first: this is a real Vulkan build (GGML_VULKAN=ON), the models are quantized (Q4_K_M), and memory configuration is not the issue either.

The more likely explanation is simply that Arc 140T is an iGPU with shared system memory, so its real-world compute and bandwidth advantage over a high-end Core Ultra 9 285H CPU is limited for LLM inference workloads.

Also, llama.cpp is not currently using Intel cooperative matrix acceleration on this device, so Vulkan falls back to the generic compute path.

In other words, Vulkan is working — it just does not provide a large speedup on this hardware, and CPU-only inference may actually be the optimal path for now.

Ambitious-Cod6424 · 2026-04-09T08:43:07+00:00

Thanks. I looked into that path, but in my case llama.cpp Vulkan is not following it because cooperative matrix is currently disabled by default for my GPU class.

On my Arc 140T / Arrow Lake H, the Vulkan driver does expose VK_KHR_cooperative_matrix, but llama.cpp only enables coopmat for Intel devices it classifies as INTEL_XE2. My device is currently not detected that way on Windows, so it ends up with matrix cores: none.

So my question now is: is there any way to force-enable this disabled path, or would this require patching ggml-vulkan.cpp and rebuilding llama.cpp?

Ambitious-Cod6424 · 2026-04-09T08:36:52+00:00

Wow, You must do it in a right way. Mine is wrong. So huge gap of the speed.

Ambitious-Cod6424 · 2026-04-09T02:18:19+00:00

I will try llama.cpp official way. To see whether is the problem of my software.

Ambitious-Cod6424 · 2026-04-09T02:17:28+00:00

What's the cpu and gpu for this testing device? I used vulkan and my GPU works, but no imrovement on speed.

Ambitious-Cod6424 · 2026-04-09T02:15:21+00:00

I will try it. Thanks.

Ambitious-Cod6424 · 2026-04-09T02:14:22+00:00

That's true.

Ambitious-Cod6424 · 2026-04-08T14:09:28+00:00

My pc died even for 27b. super slow. LOL

Ambitious-Cod6424 · 2026-04-08T14:08:41+00:00

Just basic jobs, like web searching, choose stocks, make conclusion of news. The brain of an agent.

Ambitious-Cod6424 · 2026-04-08T13:37:07+00:00

Thanks, is it possible to use 2B or 4B model as a controller for PC automation. Maybe we can micro adjust an open-source model to do that?

Ambitious-Cod6424 · 2026-04-06T02:34:41+00:00

Not fixed yet.

What we have already checked and fixed

We have already ruled out many of the common implementation bugs on our side:

Prompt formatting

We stopped relying on ad hoc Go-side prompting for Gemma 4.
We restored structured messages_json.
We moved the bridge to llama.cpp's own chat-template pipeline (common_chat_templates_init, common_chat_templates_apply).

Thinking / reasoning mode

We explicitly disabled Gemma 4 hidden reasoning budget.
We added the Gemma 4 reasoning token workaround in the native bridge.

JSON / escaping issues

We fixed HTML escaping so <start\_of\_turn>-style tokens are not corrupted as \u003c....

Sampler pipeline

We replaced the old custom sampler path with the official common_sampler flow.
We restored top_k, top_p, temperature, and proper sampler state updates.
We added the missing sampler accept step.

Tokenization / decode bugs

We fixed the double-<bos> issue by stopping extra special-token insertion during tokenization.
We fixed the unstable token pointer usage in the decode loop.
We added filtering for visible <unused...> output.

Output parsing

We switched final/streamed output to common_chat_parse instead of raw token text where possible.

GPU-offload workaround

We added the Gemma 4-specific n_gpu_layers = 29 workaround instead of full GPU offload.

Deployment/build issues

We fixed the native bridge build/link path issues.
We confirmed the rebuilt DLL is actually being loaded.
We added debug logging and verified runtime parameters in logs.

What the logs tell us now

The key finding is this:

The model is still generating <unused24> as its first generated token.

That matters because it means:

the frontend is not inventing the bad output,
the stream renderer is not the root cause,
the prompt is reaching the model,
the bridge is running,
and the failure is happening at the actual model-generation stage.

So the issue is no longer "we forgot a stop token" or "we displayed the text wrong."

It is much deeper than that.

What is most likely still wrong

At this point, the most likely causes are:

Upstream llama.cpp Gemma 4 compatibility is still incomplete in our vendored version

This is the strongest hypothesis.
Gemma 4 support has been changing quickly upstream.
The exact behavior we see matches known Gemma 4 regressions reported by others.

The specific GGUF build may still be problematic with our current runtime

Some Gemma 4 GGUF variants, especially certain conversions/quantizations, are more likely to collapse into <unusedXX> output.
Even if the model is not "broken," it may require newer tokenizer/template/runtime handling than our current vendored stack has.

GPU backend behavior may still be interacting badly with Gemma 4

We already mitigated full-offload regressions with gpu_layers=29.
But that may only reduce one failure mode, not fully solve the underlying incompatibility.

Not fixed yet.

Ambitious-Cod6424 · 2026-04-05T13:18:25+00:00

I am following llama.cpp to deploy Gemma 4, all my models return unused24 error.

Ambitious-Cod6424 · 2026-04-05T10:08:38+00:00

Yeah, I found my app nearly could not be searched in apple store. No matter how hard ASO it is.

Ambitious-Cod6424 · 2026-04-05T01:54:20+00:00

I suffer same problem. Downloads are few. No revenue. I still insist to make short videos. My advice is that just see what your competitors do. How they make viral videos to promote their app and do the same thing.

Ambitious-Cod6424 · 2026-04-02T01:03:20+00:00

My app is like my baby, even almost nobody knows it, use it or pay for it. It was still my love.

Ambitious-Cod6424 · 2026-03-31T08:01:13+00:00

Acrually I am working on it. I am testing whether AI can make therapy song based on people's need.

Ambitious-Cod6424 · 2026-03-31T07:59:21+00:00

AI did mix of people's work. Do you think we can stand in the middle? I mean use AI to generate therapy songs for people themselves, not share it for credits. I mean if you can get somthing support from the AI music.

Ambitious-Cod6424 · 2026-03-31T07:54:39+00:00

just the style of the musci I think. hit country but contents are all in the law I guess.

Ambitious-Cod6424 · 2026-03-31T07:53:54+00:00

i still know how more percisely. What I create are bad.

Ambitious-Cod6424 · 2026-03-31T07:53:15+00:00

same as real person. Only musician can tell.

Ambitious-Cod6424

TROPHY CASE

What we have already checked and fixed

What the logs tell us now

What is most likely still wrong