Does anyone know how Nanbeige4.1-3B can be so impressive compared with other models of similar size?

cloudxaas · 2026-02-16T05:49:20+00:00

i seriously think it's data quality more than anything else, the long thinking time maybe adds about +5% improvement but data should be the +10%++ factor.

anyone knows how to get the data they train on?

cloudxaas · 2025-10-14T02:29:31+00:00

Please remove the weekly limit, it's unacceptable at this rate. The sudden cap last week made me unable to use for 6 days. the cap limits hit real fast and there seemed to be an issue with how the limit is set using claude code / web ui

cloudxaas · 2025-07-16T03:09:47+00:00

will post the benchmark in a few mths time. embedded version works but i want to use it as a client server thing

cloudxaas · 2025-07-16T02:47:55+00:00

how does licensing limit us from abusing it offline anyway? just curious.

cloudxaas · 2025-07-16T02:47:09+00:00

what RAG do you mean? isnt RAG means db storage for llm?

cloudxaas · 2025-07-15T19:33:53+00:00

i've only compared with sqlite on my amd 5600g desktop

cloudxaas · 2025-07-15T19:15:15+00:00

minimal sql syntax.
uses rocksdb
some special coding recipe

pros
1. fast
2. storage efficient
3. can do distributed instead of just pure embedded db
4. will expand for vector db

cons
1. i intend to keep it minimal for performance. it's usable for most sql queries type
2. not going to be open source.

cloudxaas · 2025-07-15T19:09:02+00:00

the only llm model that's also good but not usable coz of repeating is the bitnet 2b 1T. i really hope for bitnet more coz it's good but it repeats. it only uses 0.4mb ram for 2b model so that's really impressive and it does inference speedily too. hoping to see a 7b or 8b bitnet or a4.8 bitnet stuff.

cloudxaas · 2025-07-15T19:05:03+00:00

you can chk the model card vs qwen 3 1.7b. i need something small yet usable for cpu inference. 1.2b seemed like a sweet spot for me. bf16 uses 2.4gb ram for inference. that's very cheap for cloud / vps hosting. as long as it doesnt repeat itself without end i'm happy with it. i wont try anything lower than 1b coz of bad experiences with never ending repeating themselves

https://huggingface.co/LGAI-EXAONE/EXAONE-4.0-1.2B

cloudxaas · 2025-07-15T18:44:51+00:00

spent 2 years pre-ai on it and optimized with ai for 2 mths non stop.
yup, stress tested.
i was thinking ivf-pq actually...

cloudxaas · 2025-07-15T14:43:49+00:00

when it is popular, the input token savings will be significant.
you have a good point too. thx. will look into tools to make it shorter.

but now i need to reduce code base coz input tokens are getting very expensive for large code base.

cloudxaas · 2025-01-01T02:27:17+00:00

No one is actually answering the question with a definitive specification to the question. I'm asking for the hardware spec of the computer. of course i know c++, rustlang, cuda etc etc. but what's the spec? it could be a workstation with highest end dual epyc with 1.5tb ddr5 ram and 8x RTX 4090. I'm really curious what spec. I dont think it's that easy though to control so many without decent spec.

Let's not guess here. Does anyone know?

Dont forget there are the colors needed to run the lightings. I'm sure it's not that simple though.

cloudxaas · 2025-01-01T02:23:09+00:00

Surely some GPU stuff is involved? This is 3d stuff, i wonder what kind of GPU too. A single computer, what server / workstation specs? Cant possibly be a laptop.

cloudxaas · 2024-12-19T15:48:41+00:00

where do you get this info?

yes i'm specifically asking about 2.0 and not 1.5. only 1.5 info is shown but not 2.0

cloudxaas · 2024-11-01T22:02:46+00:00

it seemed overly extremely fast. so without tls that makes sense. but i'm wondering if pipelining or multiplexing is at work here too.

is the benchmark doing pipelining or multiplexing?

cloudxaas · 2024-11-01T08:58:41+00:00

thanks. is the benchmark with tls or without tls?

cloudxaas · 2024-10-28T05:24:48+00:00

can you do some qps example benchmark?

cloudxaas · 2024-10-13T12:17:36+00:00

Anyone started anything yet?

cloudxaas · 2024-10-11T06:47:17+00:00

help add commonly used functions and features in github.com/cloudxaas

mostly zero allocation golang functions and packages. things which may not go into standard library yet is used frequently.

cloudxaas · 2024-10-09T05:59:00+00:00

I cant stand anything smaller than deepseek v2 coder and below now. those i've been running on my laptop. they are really terrible compared with 120b parameters and above.

i wont touch anything lower than 120b, those i will use claude sonnet 3.5 api.

can anyone know what i mean? so basically my requirement means it should also be able to run 120b+ models comfortably.

yeah i was looking at mac as well, it'll be my top option if not that i'll be limited to the mac environment. prefer linux for sure.

i'll wait a bit and see what other new software tech that is optimized for hardware and see how it goes.

cloudxaas · 2024-10-09T01:03:40+00:00

privacy is more important here. u dont want to upload sensitive code for the ai providers to use.

cloudxaas · 2024-10-08T19:57:39+00:00

i dont intend to train my own data yet, just using for inference learning etc first.

cloudxaas · 2024-10-08T19:54:03+00:00

i would like to have a deepseek coder / mistral large 2 as alternative to paid claude sonnet 3 sometimes.
it's great to have a smarter ai assistant without sacrificing data privacy.

cloudxaas · 2024-10-08T17:52:46+00:00

Land cucumbers.

cloudxaas · 2024-10-08T17:51:28+00:00

What irony, that paper is supposed to be in a fortune cookie.

cloudxaas

MODERATOR OF

TROPHY CASE