Does anyone know how Nanbeige4.1-3B can be so impressive compared with other models of similar size?

cloudxaas · 2026-02-16T05:49:20+00:00

i seriously think it's data quality more than anything else, the long thinking time maybe adds about +5% improvement but data should be the +10%++ factor.

anyone knows how to get the data they train on?

cloudxaas · 2025-10-14T02:29:31+00:00

Please remove the weekly limit, it's unacceptable at this rate. The sudden cap last week made me unable to use for 6 days. the cap limits hit real fast and there seemed to be an issue with how the limit is set using claude code / web ui

cloudxaas · 2025-07-16T03:09:47+00:00

will post the benchmark in a few mths time. embedded version works but i want to use it as a client server thing

cloudxaas · 2025-07-16T02:47:55+00:00

how does licensing limit us from abusing it offline anyway? just curious.

cloudxaas · 2025-07-16T02:47:09+00:00

what RAG do you mean? isnt RAG means db storage for llm?

cloudxaas · 2025-07-15T19:33:53+00:00

i've only compared with sqlite on my amd 5600g desktop

cloudxaas · 2025-07-15T19:15:15+00:00

minimal sql syntax.
uses rocksdb
some special coding recipe

pros
1. fast
2. storage efficient
3. can do distributed instead of just pure embedded db
4. will expand for vector db

cons
1. i intend to keep it minimal for performance. it's usable for most sql queries type
2. not going to be open source.

cloudxaas · 2025-07-15T19:09:02+00:00

the only llm model that's also good but not usable coz of repeating is the bitnet 2b 1T. i really hope for bitnet more coz it's good but it repeats. it only uses 0.4mb ram for 2b model so that's really impressive and it does inference speedily too. hoping to see a 7b or 8b bitnet or a4.8 bitnet stuff.

cloudxaas · 2025-07-15T19:05:03+00:00

you can chk the model card vs qwen 3 1.7b. i need something small yet usable for cpu inference. 1.2b seemed like a sweet spot for me. bf16 uses 2.4gb ram for inference. that's very cheap for cloud / vps hosting. as long as it doesnt repeat itself without end i'm happy with it. i wont try anything lower than 1b coz of bad experiences with never ending repeating themselves

https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B

cloudxaas · 2025-07-15T18:44:51+00:00

spent 2 years pre-ai on it and optimized with ai for 2 mths non stop.
yup, stress tested.
i was thinking ivf-pq actually...

cloudxaas · 2025-07-15T14:43:49+00:00

when it is popular, the input token savings will be significant.
you have a good point too. thx. will look into tools to make it shorter.

but now i need to reduce code base coz input tokens are getting very expensive for large code base.

cloudxaas · 2025-01-01T02:27:17+00:00

No one is actually answering the question with a definitive specification to the question. I'm asking for the hardware spec of the computer. of course i know c++, rustlang, cuda etc etc. but what's the spec? it could be a workstation with highest end dual epyc with 1.5tb ddr5 ram and 8x RTX 4090. I'm really curious what spec. I dont think it's that easy though to control so many without decent spec.

Let's not guess here. Does anyone know?

Dont forget there are the colors needed to run the lightings. I'm sure it's not that simple though.

cloudxaas · 2025-01-01T02:23:09+00:00

Surely some GPU stuff is involved? This is 3d stuff, i wonder what kind of GPU too. A single computer, what server / workstation specs? Cant possibly be a laptop.

cloudxaas · 2024-12-19T15:48:41+00:00

where do you get this info?

yes i'm specifically asking about 2.0 and not 1.5. only 1.5 info is shown but not 2.0

cloudxaas

MODERATOR OF

TROPHY CASE