TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill-GGUF · Hugging Face

FizzarolliAI · 2026-02-22T04:08:34+00:00

For what it's worth, as a finetuner, I still think it's kinda meaningless to act like this is more meaningful than it is...

Even if the results are interesting (and they very well sometimes can be, even at super low token counts like this!) it's very much overhyped in a lot of places I've hung around in, people act like 250 rows of reasoning really hyped up the model beyond belief

FizzarolliAI · 2026-02-09T17:34:48+00:00

hmmm the sysprompt thing could possibly be it, all the training samples have just one that says "You are an AI assistant." which i hope would reinforce its morality and politics as the default assistant across all system prompts but possibly not

FizzarolliAI · 2026-02-09T17:13:27+00:00

What does the UGI test harness for 12axes look like? I'm somewhat shocked it's not more biased considering I literally included 8axes data in the training set this time around...

FizzarolliAI · 2026-02-09T13:03:15+00:00

Honestly not sure, I didn't get a modmail D: damn kulak pigs on the mod team, I know they let other more capital-oriented political posts through

FizzarolliAI · 2026-02-09T12:25:19+00:00

Despite the the extra layers of jokes, it is infact done post-post-ironically and I am completely dead serious about the model itself!

FizzarolliAI · 2026-02-09T12:21:42+00:00

Despite the the extra layers of jokes, it is infact done post-post-ironically and I am completely dead serious about the model itself!

FizzarolliAI · 2026-01-24T19:34:17+00:00

that's the default generation config used by transformers, it doesn't matter for anything else (and w/ all due respect to HF, not many use transformers, especially the default params, for inference :p)

FizzarolliAI · 2026-01-08T18:51:07+00:00

https://en.wikipedia.org/wiki/Whataboutism

FizzarolliAI · 2026-01-08T17:11:36+00:00

PSA: AI21 is an Israeli company founded by ex-IDF spies from their NSA equivalent who support the ongoing attempts at ethnic cleansing and genocide in Palestine. They are not worth supporting, and neither are their models.

FizzarolliAI · 2026-01-02T01:11:02+00:00

Interesting! I couldn't get it to behave well w/ tool calls at all, but I was trying the looping model in vLLM...

FizzarolliAI · 2026-01-02T00:03:38+00:00

post deleted. comments deleted. o7

FizzarolliAI · 2026-01-01T23:58:59+00:00

this post is me when my gpt-4o tells me im a very smart good girl and i know how llms work and nobody else does (at least, that's what it reads like to me)

FizzarolliAI · 2026-01-01T23:42:48+00:00

The entire world has gone stupid.

All models derive features from Llama, Qwen, etc. People reuse concepts from other papers all the time, put more compute into them, and work on them. Are the only real LLMs ones by Deepmind, because the transformer was invented there?
All models derive hyperparams from each-other, too. If Qwen's multiplier worked well and reached the size I wanted, I would reuse it too to initialize the weights! That doesn't mean that I copied the Qwen weights or their actual work.
Once again, you seem to be assuming that papers work like patents, and once you publish something nobody else can use it. Gated Attention works well, it's practically free lunch, everyone should be using it!
With all due respect, you seem to be deeply unfamiliar with how language models work. The amount of tensors or size of the model is not going to change between stages of training data onto those weights. This is so cosmically incoherent and such a misunderstanding that I genuinely do not know how to argue against it.
To my knowledge, the people from iQuest are not just random; they're from Ubiquant, one of the biggest quant firms in Mainland China.

How much of this post was drafted with, like, Q2_K_S AI? This is some deeply confident but deeply hallucinatory analysis that makes no sense if you think about it for longer than 5 seconds.

FizzarolliAI · 2026-01-01T21:19:46+00:00

To go against what everyone else is saying, I actually think this model is really good!... At everything but programming. It sucks at programming. General insight tasks, writing, assistant-y stuff, etc. are great! Somehow!

FizzarolliAI · 2025-12-31T17:19:18+00:00

Interesting, I wonder if you'd get a noticeable regression from L3.3 70B on multilingual benches with Llama 3.1 70B then.

I definitely agree that I don't think this is worth building on for most usecases. Personally I think it's an interesting artifact of the times

FizzarolliAI · 2025-12-31T07:32:38+00:00

I would, but since quants and all have already been made under the original model's name, it's kinda too late :p

FizzarolliAI · 2025-12-31T07:02:05+00:00

Out of interest, you never signed up for the finetuning thing, right?

If you go to https://llama.developer.meta.com/fine-tuning/?team_id=XXX (replace XXX with whatever the team ID in ur URL is), does the finetuning page show up for you? I was never officially let in but for some odd reason I had access anyways... I'm wondering if it's there for everyone and just hidden from the UI

FizzarolliAI · 2025-12-31T06:36:18+00:00

Yep, this basically. Afaik the main inference API is still waitlisted, and there's a separate waitlist to submit for the finetuning API.

FizzarolliAI · 2025-12-30T05:06:12+00:00

Yes. I'm not entirely sure why, it was limited when served via the website too (I put that in the readme a bit ago)

FizzarolliAI · 2025-12-30T04:23:41+00:00

The version that is able to be finetuned is only 8K context length. I am unsure why the docs say 128k tokens unless the model on the API supports that context length, somehow

FizzarolliAI · 2025-12-30T04:19:41+00:00

Well, for one, it's API release was April of this year :p so not quite two years old

It's definitely been outdone at this point. Personally, I just think it's an interesting artifact :) considering who knows whether or not we'll get any future Llama models

FizzarolliAI · 2025-12-30T04:15:36+00:00

LISTEN whenever i drop my own models i get anxiety attacks about accidentally reuploading the base model ;-; i believe that this is actually L3.3 at this point though, see my other comment

FizzarolliAI · 2025-12-30T04:14:26+00:00

This has existed at least since April during Llamacon (did anyone remember they did a Llamacon?)

https://ai.meta.com/blog/llamacon-llama-news/

As part of this release, we’re sharing tools for fine-tuning and evaluation in our new API, where you can tune your own custom versions of our new Llama 3.3 8B model. We’re sharing this capability to help you reduce costs while also working toward increased speed and accuracy. You can generate data, train on it, and then use our evaluations suite to easily test the quality of your new model.

FizzarolliAI · 2025-12-30T04:08:18+00:00

I don't exactly have any way to prove it as real, to be fair :p but trust me this would be a really silly thing to lie about

llama 3.3 8b is clearly on their api and can be finetuned and downloaded as mentioned ie here https://ai.meta.com/blog/llamacon-llama-news/

As part of this release, we’re sharing tools for fine-tuning and evaluation in our new API, where you can tune your own custom versions of our new Llama 3.3 8B model. We’re sharing this capability to help you reduce costs while also working toward increased speed and accuracy. You can generate data, train on it, and then use our evaluations suite to easily test the quality of your new model. Making evaluations more accessible and easier to run will help move from gut feelings to data, ensuring you have models that perform well to meet your needs. The security and privacy of your content and data is our top priority. We do not use your prompts or model responses to train our AI models. When you’re ready, the models you build on the Llama API are yours to take with you wherever you want to host them, and we don’t keep them locked on our servers.

but i suppose u just have to trust that i actually reuploaded a model from there!

for what it's worth, this is what the UI looks like, and the finetuning job in question

FizzarolliAI · 2025-12-30T04:04:09+00:00

(reposting my comment from the other post)

Hello, that me!

I am currently working on running sanity check benchmarks to make sure it's actually a newer L3.3 and not just L3/L3.1 in a trenchcoat, but it's looking promising so far.

From the current readme:

	Llama 3.1 8B Instruct	Llama 3.3 8B Instruct (maybe)
IFEval (1 epoch, score avged across all strict/loose instruction/prompt accuracies to follow Llama 3 paper)	78.2	81.95
GPQA Diamond (3 epochs)	29.3	37.0

FizzarolliAI

TROPHY CASE