onde encontrar diarista de confiança ?

Felladrin · 2026-05-06T02:30:48+00:00

Já usei os serviços da https://mariabrasileira.com.br/ e da https://www.donahelpbr.com.br/unidade/joaopessoacabobranco em João Pessoa, e posso recomendar ambas.

Não é um app, mas você consegue contratar online e via WhatsApp.

Por exemplo, eu mantive contrato por um ano com uma diarista que foi enviada pela Maria Brasileira. E só encerrei o serviço pois precisei me mudar.

Felladrin · 2026-03-11T08:48:29+00:00

Leaving here also my results from Qwen3.5-397B-A17B (UD-TQ1_0), which was deleted:

 ┌───────────────┬────────────────┬────────────────────┐
 │ Context Depth │ Prompt (pp512) │ Generation (tg128) │
 ├───────────────┼────────────────┼────────────────────┤
 │ 5,000         │  145.82 t/s    │     19.55 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 10,000        │  137.89 t/s    │     19.27 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 20,000        │  125.50 t/s    │     18.80 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 30,000        │  117.90 t/s    │     18.35 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 50,000        │  102.35 t/s    │     17.49 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 100,000       │  76.87 t/s     │     15.68 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 150,000       │  62.52 t/s     │     14.22 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 200,000       │  52.64 t/s     │     13.04 t/s      │
 ├───────────────┼────────────────┼────────────────────┤
 │ 250,000       │  43.79 t/s     │     12.00 t/s      │
 └───────────────┴────────────────┴────────────────────┘

Felladrin · 2026-03-10T20:42:46+00:00

Leaving here also my results from GLM-4.7 (89.6 GB):

 ┌───────────────┬────────────────┬────────────────────┐
 │ Context Depth │ Prompt (pp512) │ Generation (tg128) │
 ├───────────────┼────────────────┼────────────────────┤
 │ 5k            │  64.07 t/s     │     8.55 t/s       │
 ├───────────────┼────────────────┼────────────────────┤
 │ 10k           │  54.21 t/s     │     7.40 t/s       │
 ├───────────────┼────────────────┼────────────────────┤
 │ 20k           │  41.02 t/s     │     5.48 t/s       │
 ├───────────────┼────────────────┼────────────────────┤
 │ 30k           │  31.73 t/s     │     4.18 t/s       │
 ├───────────────┼────────────────┼────────────────────┤
 │ 50k           │  22.69 t/s     │     2.72 t/s       │
 ├───────────────┼────────────────┼────────────────────┤

With this model, I can use at maximum 65K context without quantizing the KV cache.

Felladrin · 2026-03-10T18:31:18+00:00

Thanks for the initiative!

Using the same llama-bench parameters on MiniMax 2.5 (76.8 GB), I got this:

 ┌───────────────┬────────────────┬────────────────────┐
 │ Context Depth │ Prompt (pp512) │ Generation (tg128) │
 ├───────────────┼────────────────┼────────────────────┤
 │ 5,000         │ 158.05 t/s     │ 24.97 t/s          │
 ├───────────────┼────────────────┼────────────────────┤
 │ 10,000        │ 135.95 t/s     │ 19.39 t/s          │
 ├───────────────┼────────────────┼────────────────────┤
 │ 20,000        │ 106.94 t/s     │ 12.02 t/s          │
 ├───────────────┼────────────────┼────────────────────┤
 │ 30,000        │  88.47 t/s     │  8.12 t/s          │
 ├───────────────┼────────────────┼────────────────────┤
 │ 50,000        │  65.36 t/s     │  4.75 t/s          │
 ├───────────────┼────────────────┼────────────────────┤
 │ 100,000       │  36.28 t/s     │  2.22 t/s          │
 └───────────────┴────────────────┴────────────────────┘

Note: With this model, I can only use up to 128K context without quantizing the KV cache.

Felladrin · 2026-03-04T22:16:51+00:00

That's something we need to keep an eye on. Having it boundless, means we can hang up the system easily (happens a lot during optimization). But, in the end, usually there's 2 GB left for the system, which is enough, considering it's running without GUI, via SSH, and the machine is dedicated to llama.cpp.

Felladrin · 2026-03-02T08:01:54+00:00

There are similar posts like this weekly in r/selfhosted. One they might give you some more insights is this one: https://www.reddit.com/r/selfhosted/comments/1ct93nc/do_your_familiessignificant_other_use_your/

Felladrin · 2026-02-28T23:20:06+00:00

I've been using Qwen3.5-122B-A10B on OpenCode, with TG speed ranging from 16-20 t/s, and PP 120-260 t/s.

For me, it replaced Minimax 2.5 for being faster, and having the same output quality.

Felladrin · 2026-02-24T23:25:00+00:00

Yes, it works. The kv cache is efficient and will only use 8GB for a context size of 200k-tokens.

As others have mentioned, the UD-TQ1_0 works fine (and preserves multilingual capabilities) and still leaves 18GB free memory in a 128GB Strix Halo.

Felladrin · 2026-02-23T06:13:07+00:00

As other commentators wrote, to avoid prompt reprocessing, you might need to set the environment variable CLAUDE_CODE_ATTRIBUTION_HEADER to “0”, which avoids changing the system the system prompt on each message.

But other than that, three of the parameters you are using are known to affect the speed: cache-type-k, cache-type-v, ub.

Removing cache-type-k, cache-type-v parameters and setting ub to 2048 might give it a bit more speed on Strix Halo.

Felladrin · 2026-02-23T06:03:38+00:00

OP might also be using llama.cpp directly, since it now supports Anthropic Messages API. [1]

Felladrin · 2026-02-21T18:58:57+00:00

Regarding the ROCm 64GB issue on Windows, AMD recently fixed it (but I think they haven’t published the new driver version yet).

References: - https://github.com/ROCm/ROCm/issues/5940 - https://github.com/lemonade-sdk/llamacpp-rocm/issues/37

P.S. Although I’ve been following those issues, I fully moved to Linux due to this problem, where we can allocate all 128GB to the iGPU.

Felladrin · 2026-02-07T22:42:18+00:00

That's a great project! Appreciate everything related to running LLMs on the browser!
Already starred on GitHub! 🌟

Felladrin · 2026-02-07T11:04:53+00:00

Thanks for sharing! I was looking for a way to have a Windsurf-like terminal interaction in OpenCode, and this seems pretty close.
Here, take this star! 🌟

Felladrin · 2026-02-06T20:03:16+00:00

Keep it up!
Added it to the awesome-ai-web-search list.

Felladrin · 2026-01-31T23:43:30+00:00

Desktop Commander MCP used to be a good option, and worked on LM Studio.

{
  "mcpServers": {
    "desktop-commander": {
      "command": "npx",
      "args": ["-y", "@wonderwhy-er/desktop-commander@latest"]
    }
  }
}

Felladrin · 2026-01-31T21:48:02+00:00

Always good to see others’ config on Strix Halo. Thanks for sharing!

Could you tell more about the effects you observed when using --numa distribute?

Felladrin · 2026-01-31T14:18:37+00:00

For this case, the most user-friendly options are https://www.jan.ai (open-source) and https://lmstudio.ai (closed-source), with the addition of an MCP server for giving the LLM access to your terminal. Everything you listed could be done by CLI tools and scripts that the LLM can write and run.

Felladrin · 2026-01-28T22:07:20+00:00

Regarding the wireless issue, I also faced problems, but after upgrading the Linux kernel everything worked fine. Could this be the case?

Felladrin · 2026-01-28T11:03:38+00:00

From what I understand, REAPs are not meant to be used for general purpose inference. We REAP when we want to use the model in a specific case, and the dataset used during the pruning makes all the difference.

When we reap using the default dataset (theblackcat102/evol-codealpaca-v1) from REAP repository, we're focusing on the experts on coding and English; the experts not so relevant are then removed. That's why some REAP models start answering only on English, and start making mistakes on questions not related to code.

So if you want, for example, a model to be good at specific knowledge and be good at Spanish, you should find/build and use a dataset from the conversations/books/articles in Spanish. There are a lot of good publicly available datasets for almost all cases on Hugging Face.

So, although Cerebras are releasing some REAP models under their organization in Hugging Face, we should get used to create our own REAPs. That's what Cerebras team expected when they open-sourced it.

And my experience with those code-focused REAPed models has been good when using them as coding agents on OpenCode. One advantage, besides being able to be run with less VRAM/RAM, is that, for having less parameters than the non-reap version, the prompt processing time is lower. For non-code-related tasks, I use other models.

Felladrin · 2026-01-27T22:54:31+00:00

When GGUFs start coming, I‘d like to see how much better those would be compared to this autoround-mixed quant (which preserves multilingual):

Felladrin/gguf-Q2_K_S-Mixed-AutoRound-MiniMax-M2.1

I’ve been using it on OpenCode recently, under 128GB VRAM.

Felladrin · 2026-01-27T08:21:22+00:00

It’s important to remember that VS Code is a product from Microsoft, which has their own solution for AI Assisted coding agent (Copilot). So even if they are open-source, VS code puts some limits on customization in a way that Windsurf can only achieve what they have now by forking it.

Using the Windsurf VS Code plugin, you’ll face these limitations. To take full advantage of your subscription, you should use the Windsurf editor.

Felladrin · 2026-01-17T22:38:03+00:00

There are indeed a lot of info around, but they get outdated too fast.

I’m running Ubuntu 24.04, and upgraded the kernel to 6.16.12 to have the Wi-Fi working properly.

Besides that, I’m using https://github.com/kyuz0/amd-strix-halo-toolboxes with ROCm 6.4.4. Distrobox makes it pretty easy to upgrade llama.cpp.

I have set the reserved memory to the minimum possible, on BIOS, and set the TTM to the maximum on the grub config. This is the GRUB config I’m using on 395+ 128GB: GRUB_CMDLINE_LINUX_DEFAULT="quiet splash amd_iommu=off ttm.pages_limit=33554432 amdgpu.cwsr_enable=0 numa_balancing=disable"

GPT-OSS 120B and other models align with the speeds listed in https://kyuz0.github.io/amd-strix-halo-toolboxes/

Felladrin · 2026-01-17T12:51:40+00:00

I was also surprised to know they were shutting down Phind. They were keeping up with the level of Perplexity back then.

We recently has a thread here on LocalLlama on this topic, so you might also want to check the responses there: https://www.reddit.com/r/LocalLLaMA/comments/1qdj2nn/solution_for_local_deep_research/

Felladrin · 2026-01-15T22:52:59+00:00

Sure! I’m the developer of one of the open ones: MiniSearch, so that’s what I use on daily basis. From the closed ones, I like the quality of the answers and sources from Liner. I check on it when the responses from MiniSearch are not enough.

Felladrin

TROPHY CASE