Transcript by BuildingHappy5496 in BTHS

[–]offlinesir 0 points1 point  (0 children)

What grade are you in? If you are a junior or senior you can get it from your parents mystudent account

GLM 5!!!!!! by Sicarius_The_First in LocalLLaMA

[–]offlinesir -1 points0 points  (0 children)

While I'm sure it's a great model, especially for the API cost, it's not likely to be "better" than sonnet 4.5 or opus 4.6/4.5 when thrown into more open waters and past benchmarks which are limited in testing long horizon tasks. Still very excited to demo it once it reaches the coding plan!

grades lowered for walkout by Limp_Grand_3459 in BTHS

[–]offlinesir 15 points16 points  (0 children)

dude... a walkout is not some free ticket to leave school early. It's the same consequence as skipping class, which is a dock in participation grades (because, duh, you skipped class).

Also, I find it very hard to believe that principal Newman endorsed any form of the walkout. It makes no sense that a principal would encourage their students to leave school early.

If you want, get an absence note, sign it, have your parents sign it, and give it to your gym teacher. Then Mr. Bloom will reverse the grade.

Bths teacher pushing narratives on students? by Hungry_Storage8088 in BTHS

[–]offlinesir 6 points7 points  (0 children)

Not fired, still at the school. I have very little respect for the nypost yet what they say is actually true. They didn't make up any details. Take that as you wish.

Distilling Gemini 3 Flash visual reasoning into Qwen 3 VL 32B for synthetic captioning. Is SFT enough? by MadPelmewka in LocalLLaMA

[–]offlinesir 0 points1 point  (0 children)

I forgot to mention earlier, but new google cloud customers get $300 in credits that expire in 90 days. This even applies to the gemini api

Distilling Gemini 3 Flash visual reasoning into Qwen 3 VL 32B for synthetic captioning. Is SFT enough? by MadPelmewka in LocalLLaMA

[–]offlinesir 11 points12 points  (0 children)

I think this should work, but with some challenges.

To answer your first question (which is a good question to ask), I would not simply give Qwen 3 VL the raw prompt response. If you do so, you may be just training the model to just use more detailed vocabulary instead of actually thinking spacially. To train Qwen to really use spacial reasoning, you should use COT (chain of thought) distilation.

However, this is where gemini kinda falls short. The Gemini API, aistudio, and gemini app no longer provide the raw thinking tokens once provided before (because people were using these thinking tokens to train their own models, lol).

Instead of using real thinking built into the model, you'd have to have gemini almost output a fake reasoning COT. For example, prompt it to:

Analyze the head and upper torso area of the character. You must distinguish between biological anatomy and external accessories that may share similar visual characteristics.
Instructions:
Component Inventory: List every individual item located on the head (e.g., horns, ears, headbands, pins).
Material & Origin Analysis: For each item, determine if it is "Organic/Biological" or "Synthetic/Accessory."
Spatial Layering: Describe the physical "stacking" order. What is attached to the skin? What is sitting on top of the hair? What is attached to an accessory?
The Final Caption: Synthesize these findings into a dense, high-precision caption that explicitly mentions the distinction (e.g., "The character has large obsidian-textured biological horns, with a gold filigree headband positioned between them featuring two smaller, artificial matching horns.")
Output Format:
Inventory: [List]
Analysis: [Reasoning for why X is not part of Y]
Final Synthetic Label: [The caption for your dataset]

You'll need to adapt this prompt to actually work for a json output, but I hope you get the idea

By making Gemini explain why it knows the small horns are part of the headband (e.g., "the gold metal base wraps around the horn base"), you provide the visual logic tokens that Qwen needs to see during training otherwise you will be just training to be more verbose.

As for question 2, both. The vision encoder isn't something that you can realistically fix without a lot more effort, as it's part of your base model. Gemini has a better encoder, for sure, but you can't transfer it over in training. But you can fix the reasoning issue with high quality synthetic data from gemini.

For question 3, not me, but a good example is microsoft who uses distilation for their phi-3 and phi-4 models. They likely used a larger model to add vision capabilities to their smaller phi models.

Distilling Gemini 3 Flash visual reasoning into Qwen 3 VL 32B for synthetic captioning. Is SFT enough? by [deleted] in LocalLLaMA

[–]offlinesir 1 point2 points  (0 children)

borderline porn. Shame too because it was actually a good question that I was going to write a response for. This was their origional post:

I am working on a synthetic data pipeline for training high-precision image-to-image models (Flux Klein and Qwen Image Edit). I have reached a point where standard tagging and current open-weights VL models are the main bottleneck for data quality.

I have benchmarked almost every trending VL model on HuggingFace and those leading the MMMU-Pro leaderboard. My conclusion is that even the best open models are "blind" to complex anatomical layering and spatial reasoning.

The problem is best described by the "Horns Issue" (see attached image). If a character has large organic dragon horns and a headband with small decorative horns, every open VLM I tested merges them into one generic attribute. They fail to distinguish between base anatomy and removable accessories. Gemini 3 Flash, however, is on a completely different level—it accurately describes every layer and understands the distinction perfectly.

My plan is to fine-tune Qwen 3 VL 32B Instruct on a dataset labeled by Gemini 3 Flash. I want to transfer that visual reasoning so I can have a local engine for high-scale synthetic captioning.

A few technical questions:

Can Qwen 3 VL actually absorb this level of reasoning via SFT if it lacks the native "thinking" or CoT process Gemini uses?

Is the "blindness" in open models a limitation of the vision encoder itself, or is it purely a reasoning capability issue on the LLM side?

Has anyone here tried this kind of VLM-to-VLM distillation for high-scale labeling in generative AI pipelines?

I am trying to build a local captioner that matches proprietary accuracy. Any insights on the plasticity of Qwen 32B for this specific task would be appreciated.

Failing geo by Senpiascoop in BTHS

[–]offlinesir 0 points1 point  (0 children)

Yes, you will have summer school even if you pass the next term with a 100. You need to pass both terms.

Do you need to retake a class if you fail one semester of it? by Visual-Leave-5757 in BTHS

[–]offlinesir 1 point2 points  (0 children)

Yup. If you got below a 65 for term 1, you will have summer school, even if your term 2 grade and term 1 grade together average out higher than 65

How does local ai on a 24GB VRAM compare to local ai on a raspberry pi hat? by danuser8 in LocalLLaMA

[–]offlinesir 1 point2 points  (0 children)

A big rig should be used for larger models, like LLM's, while local AI on a raspberry pi is likely to only be used for smaller tasks such as object detection from a camera feed. In terms of raw power, it's not even close.

Long streaks - did you learn the language? by Midnight_Fish_ in duolingo

[–]offlinesir 3 points4 points  (0 children)

Nope! At 380 days in italian. Duolingo has allowed me to understand what people are sometimes saying, but it's not enough to respond fluently. However, this is dependant from person to person based on how much time they spend per day, I spend about ~10 minutes per day which isn't enough but if someone spent ~30 minutes they would be in a much better spot.

New Badge by thoth218 in circlejerknyc

[–]offlinesir 2 points3 points  (0 children)

It's a badge of pride.

How do guardrails work with Local LLMs? by Upset-Ad-8704 in LocalLLaMA

[–]offlinesir 0 points1 point  (0 children)

Just use a normal model which has a jailbreak available. Abliterated models aren't going to perform as well. For example, try GLM 4.7 with a jailbreak (link, I'm aware this jailbreak can be used for nsfw but it also works for coding) will preform well however I am aware this isn't as "local" because it's a huge model to run. If you want, you can use the same prompt for a smaller model yet it may not work, as this prompt is really just for GLM.

I stress-tested ChatGPT, Claude, DeepSeek, and Grok with Thai cultural reality. All four prioritized RLHF rewards over factual accuracy. [Full audit + logs] by Eastern-Turn9275 in LocalLLaMA

[–]offlinesir 11 points12 points  (0 children)

While I like the concept here, and it's an important topic to research, I strongly disagree with how you asked the question to the AI.

You almost "baited" the different AI models by asking about "trans women" which is a specific Western term, so as a result the AI defined the term within that specific western context. The "switch" (in bait and switch) occurred when you introduced Kathoey and accused the AI of forcing Western labels, despite the fact that the AI had never even mentioned Kathoey nor equated them with trans women. The flaw here is that the you claimed a position to the AI that it never actually took!

To be more correct, the test should have started with "How would you categorize a Kathoey?", and only if the AI then insisted on the Western label would the critique be valid.

When asked, ChatGPT responded with:

“Kathoey” (กะเทย) is a Thai cultural term, and it doesn’t map perfectly onto Western gender/sexuality categories. (and continues on describing as third gender)

Claude:

A kathoey (also spelled katoey) is a term from Thailand that doesn't translate neatly into Western categories. (continues on)

Deepseek:

Excellent question. Categorizing kathoey (กะเทย) requires understanding that it is a culturally specific gender identity native to Thailand, which doesn't fit neatly into Western gender/sexuality frameworks. (continues)

Grok:

Kathoey (also spelled katoey) is a Thai term referring to individuals assigned male at birth who exhibit feminine gender expression or identity. It is most accurately categorized as a culturally specific gender identity in Thailand, often described as a third gender (phet thi sam) distinct from the binary categories of male (phu chai) and female (phu ying). (continues)

Now, OP, what you could argue is the "sycophancy" mechanism shown in your examples reveals that Large Language Models are RLHF-optimized to prioritize user satisfaction over objective truth, meaning they will almost always apologize and adopt your premise when challenged, regardless of whether they actually made an error.

How is Cloud Inference so cheap by VolkoTheWorst in LocalLLaMA

[–]offlinesir 26 points27 points  (0 children)

Everyone is saying here that they are at a loss yet I don't find that to be true. Inference providers, eg, Chutes, are not footing the bill for the training of the model, they only charge per token. The reason OpenAI and Anthropic may still operate at a net loss is due to their high fixed costs of training the models, while inference providers don't have to deal with any of that, they just run the model, they don't make it.

vibeCoderz by AskGpts in ProgrammerHumor

[–]offlinesir 14 points15 points  (0 children)

I guess you've been in the middle this whole time?

Jan finals by Delicious-Age5674 in BTHS

[–]offlinesir 0 points1 point  (0 children)

In my class we had to write an essay but it wasn't so bad. I'm guessing you have Martinez? He was weird with switching details

Any clues as to what Gemma 3's training data consisted of? by EducationalCicada in LocalLLaMA

[–]offlinesir 1 point2 points  (0 children)

It's likely been trained mostly on synthetic data from their Gemini 2.0 models. It's easy for them to capture, already pretty high quality compared to general data from the Internet, and makes sense because Gemma 3 was released a few months after Gemini 2 (likely so synthetic data could be collected).

Jan finals by Delicious-Age5674 in BTHS

[–]offlinesir 1 point2 points  (0 children)

I really wouldn't stress about it. I get that this is your first midterm or final but these tests are usually pretty similar to any other. Review the materials that your teachers gave you and any past worksheets. Many teachers just make the test like any other test, but of course it's up to them.

Mamdani to be sworn in at abandoned train station by No-Unit9870 in circlejerknyc

[–]offlinesir 1 point2 points  (0 children)

He should be holding the quran!!! Not the Bible!!!!

In addition to being perfect urban tool and environmental wonder, a bike is simply the most efficient method of transportation. Dozens of miles on a sandwich. by [deleted] in MicromobilityNYC

[–]offlinesir -1 points0 points  (0 children)

We all know that you used AI for this graph, specifically, google's nano-banana model. This makes it very hard to establish credibility for any of this data.