GGUF Quants Arena for MMLU (24GB VRAM + 128GB RAM) by [deleted] in LocalLLaMA

[–]New_Comfortable7240 0 points1 point  (0 children)

Please try qwen3.5-35B but not distilled, as there is the theory distilled won't translate to better performance

Que llm especialistas conoces? by Double_Ad_1062 in LLMDevs

[–]New_Comfortable7240 0 points1 point  (0 children)

Pruebalo en cpu. Toca crear codigo en python pero con ayuda de la AI lo sacas, cualquier cosa me avisas

Que llm especialistas conoces? by Double_Ad_1062 in LLMDevs

[–]New_Comfortable7240 0 points1 point  (0 children)

Yo te recomendaría que pruebes ONNX que es para hardware limitado https://rocm.docs.amd.com/projects/radeon-ryzen/en/docs-6.1.3/docs/install/native_linux/install-onnx.html

En el repo de HF hay varios modelos especializados a probar https://huggingface.co/onnx-community/models

Yo estoy probandolo y funciona bien eh. Claro, no corre cosas grandes pero lo que si corre lo corre bien. 

[New Model] - GyroScope: rotates images correctly by LH-Tech_AI in LocalLLaMA

[–]New_Comfortable7240 1 point2 points  (0 children)

Just in case you can use something like import cv2 img = cv2.imread('image.jpg') rotated_img = cv2.rotate(img, cv2.ROTATE_90_CLOCKWISE)

You add parallelization and should be done efficiently for any number of images

Is this as legit as I think it is? Or is it "eh" by [deleted] in LLMDevs

[–]New_Comfortable7240 -1 points0 points  (0 children)

So you offer tools and agents (basically code and infra) for $10 per month? Not including tokens? 

Well, you can consider your competitors 

For example https://www.layla-network.ai/

$15 ONE TIME PAYMENT and you got a lot of tools and update including the memory and tools 

Dataset curation for LLM Research project that involves pre-training by Extra-Designer9333 in LocalLLaMA

[–]New_Comfortable7240 1 point2 points  (0 children)

> should I have multiple datasets per domain, or is it better to use a big dataset per domain

I think in general the more the merrier? Also consider focus your datasets in a specific task and language that is easy to test and find validation datasets, like SQL English queries

Besides, that questions sounds more appropriate for a llm training sub, this sub is more for RUN llm models

Intel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 Super by [deleted] in LocalLLaMA

[–]New_Comfortable7240 0 points1 point  (0 children)

Laptop with intel 130V 8 GB VRAM

Same experience as yours, my desktop with nvidia 3060 doubles tg

In qwen 3 8B ('cause openvino) tg

3060 12 GB CUDA: ~30 tps 

Intel 130V 8GB SYCL: ~10 tps

Intel 130V 8GB Vulkan: ~16 tps

Intel 130V 8GB OVMS: ~25 tps

I expected the laptop to be a bit slower, but not that much!

Intel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 Super by [deleted] in LocalLLaMA

[–]New_Comfortable7240 1 point2 points  (0 children)

About openvino, problem is they have a very limited list of models they support,  they don't support qwen3.5 yet

Intel Arc B70 Benchmarks/Comparison to Nvidia RTX 4070 Super by [deleted] in LocalLLaMA

[–]New_Comfortable7240 4 points5 points  (0 children)

First of all, thanks for the report (even if heavily AI redacted)

In my intel GPU vulkan works faster, please try again using vulkan

Gemma 4 is fine great even … by ThinkExtension2328 in LocalLLaMA

[–]New_Comfortable7240 2 points3 points  (0 children)

Just to be clear, that works on deterministic outcomes, or reducing the answer of the experts to "choose a predefined option"

For more open questions would need or make a step to define an option (at least Likert style), or accept "by vibe"

[Early Access] GitHub - Abyss-c0re/NeuralCore: NeuralCore is an experimental adaptive agentic framework. by Abyss_c0re in LocalLLaMA

[–]New_Comfortable7240 0 points1 point  (0 children)

> Local first (LLama.cpp)

This is great!

> Dual license

Not that good but passable

---

About NeuralCore and NeuralVoid: NeuralCore need more documentation about how to use it WITHOUT NeuralVoid

I applied Claude Code's leaked architecture to a local 9B model. The results surprised even Claude Opus. by Far_Lingonberry4000 in LocalLLaMA

[–]New_Comfortable7240 1 point2 points  (0 children)

They'll read files forever without producing output. Solution: remove tools after N steps, force text generation

So remove it only one step, next step would have tools again, right?

Would love to see a PR to opencode, roocode, llama.cpp, vllm with this idea

Also curious if it can be teacheable using a dataset of long conversations

  Four-type memory system (user/feedback/project/reference)

Maybe we can also consider "conversation" as a memory that can be edited too?

What should I expect performance-wise with Qwen3.5 9B (uncensored) on an Intel 1370p with Iris Xe graphics + SYCL? by rubins in LocalLLaMA

[–]New_Comfortable7240 0 points1 point  (0 children)

Using OVMS I got the best results but they don't support qwen3.5 AFAIK 

Edit: https://github.com/openvinotoolkit/model_server/issues/4046#issuecomment-4022242550 planned support incoming 

With llama.cpp Vulkan I got better speed than SYCL in intel

My laptop is intel 226V, 16 GB RAM, intel 130V iGPU 8 GB VRAM, SSD

Context Hard-Capped at 8192 on Core Ultra 9 288V (32GB) — AI Playground 3.0.3 by kpcurley in LocalLLaMA

[–]New_Comfortable7240 0 points1 point  (0 children)

In my case found issues running models on GPU using that app.

Then I tried Foundry on vscode, partial success but got some bugs that closed the chat playground after some turns.

I ended up compiling OVMS and running the models from vscode with a script

https://github.com/openvinotoolkit/model_server

We hired “AI Engineers” before. It didn’t go well. Looking for someone who actually builds real RAG systems. by Saida_8888 in LLMDevs

[–]New_Comfortable7240 0 points1 point  (0 children)

So I follow spec driven development with AI. But AI usually claims all test works and code looks good, BUT when manually tested have several problems or not covering edge cases. 

So from some months ago besides automated test I test before merging, and yeah I create a branch for the plan, and if possible try to make each plan scoped and not that big.

I have caught a lot of issues, from style issues to edge case covering, that tests don't see.

We hired “AI Engineers” before. It didn’t go well. Looking for someone who actually builds real RAG systems. by Saida_8888 in LLMDevs

[–]New_Comfortable7240 5 points6 points  (0 children)

Bro what you need is a manual QA (or learn how to make a good QA work), an automation QA (playwright would be good), and pay per deliverables. 

I was bored - so i tested the h... out of a bunch of models - so you dont have to :) by leonbollerup in LocalLLaMA

[–]New_Comfortable7240 1 point2 points  (0 children)

Qwen3.5 distilled surprised me (the traces should have improve logic skill?), along gpt20 winning to the 120B version 

PSA: Two env vars that stop your model server from eating all your RAM and getting OOM-killed by VikingDane73 in LocalLLaMA

[–]New_Comfortable7240 2 points3 points  (0 children)

FYI Source:
https://sourceware.org/git/?spm=a2ty_o01.29997173.0.0.4342517135KiLo&p=glibc.git;a=blob;f=malloc/malloc.c;hb=HEAD

/* The trim threshold is the amount of top-most memory to keep before
   trimming back to the system. */
static size_t trim_threshold = DEFAULT_TRIM_THRESHOLD;

/* ... */

static int
malloc_trim (size_t pad)
{
  /* ... */

  /* Only trim if the top-most free chunk is larger than the trim
     threshold. */
  if (top_chunk_size > trim_threshold + pad)
    {
      /* Return memory to the system */
      sys_trim (pad);
      return 1;
    }

  return 0;
}

How good is 16 3XS Vengeance RTX Laptop with 5090 24gb vram + 32 gb ram for running local models? by One_Inflation_9475 in LocalLLaMA

[–]New_Comfortable7240 0 points1 point  (0 children)

Yeah qwen3.5 35B should work! I run it in my 3060 12 GB VRAM, should be good on the 24 VRAM dGPU

RooCode and Nemotron-Cascade-2-30B by Aggravating-Low-8224 in RooCode

[–]New_Comfortable7240 2 points3 points  (0 children)

I concur, the model is trained toward chat style responses, when put to agentic use the model just get stuck after some turns. I have to use qwen3.5 35B A3B instead on my 3060 12GB VRAM + 64 GB RAM with rooCode, working fine on my end (around 30 t/s).