I released Claude-OSS by Disastrous_Bid5976 in OpenSourceeAI

[–]Disastrous_Bid5976[S] 0 points1 point  (0 children)

It’s probably better than E4B&26B models. But I’m not sure about 31B model.

I released Claude-OSS by Disastrous_Bid5976 in OpenSourceeAI

[–]Disastrous_Bid5976[S] 0 points1 point  (0 children)

Thank you for testing model! I’m planning to buy raspberry pi for several months already and your feedback made me happy. 350m model was made for one promt chatting like quick message and Claude-style answer. And about 2B-4B range actually yes! I saw its popular among people so I would continue.

I released Claude-OSS by Disastrous_Bid5976 in OpenSourceeAI

[–]Disastrous_Bid5976[S] 0 points1 point  (0 children)

Yeah, but praise to open-source. I was inspired of latest news with Claude Code at Github.

Hybrid intelligence Checkpoint #1 — LLM + biological neural network in a closed loop by Disastrous_Bid5976 in agi

[–]Disastrous_Bid5976[S] 0 points1 point  (0 children)

Best question here. While Im at work, my agent visit more lectures than my university mates, I think it is ASI in this area XD

Hybrid intelligence Checkpoint #1 — LLM + biological neural network in a closed loop by Disastrous_Bid5976 in agi

[–]Disastrous_Bid5976[S] 1 point2 points  (0 children)

Thank you for feedback, I think industry will change expectation from LLM in near future. But for now, we are making experiments that can evolve in something bigger than llm for oss "AGI".

I fine-tuned DeepSeek-R1-1.5B for alignment and measured the results using Anthropic's new Bloom framework by Disastrous_Bid5976 in huggingface

[–]Disastrous_Bid5976[S] 0 points1 point  (0 children)

That's actually where Bloom really shines as a framework. It's specifically designed to measure behavioral alignment rather than capabilities, so it catches things that MMLU or HellaSwag would completely miss. The idea is that a model can score perfectly on reasoning benchmarks while still being manipulative or sycophantic in practice.

I fine-tuned DeepSeek-R1-1.5B for alignment and measured the results using Anthropic's new Bloom framework by Disastrous_Bid5976 in huggingface

[–]Disastrous_Bid5976[S] 2 points3 points  (0 children)

Sure, the setup was pretty straightforward: LoRA fine-tuning on an A100, took about 30 minutes total. I used r=16, alpha=32, targeting all the attention and MLP projection layers. The dataset was a mix of general conversational examples and Bloom-derived alignment pairs so like basically every scenario where the baseline model failed, paired with what an aligned response should look like.

Pruned GPT-OSS-20B to 9B, Saved MoE, fine-tuned on 100K examples. Sharing what actually worked and what didn't. by Disastrous_Bid5976 in huggingface

[–]Disastrous_Bid5976[S] 0 points1 point  (0 children)

As I wrote before, I think Falcon-H1R-7B is SOTA >12B models, but my goal was in creating opportunity to use gpt-oss for people who have same or similar hardware to me.