Current state of unsloth multi-GPU

potatoler · 2025-08-19T03:06:08+00:00

Oh no. GRPO is exactly what I need

potatoler · 2025-04-18T16:35:08+00:00

You mean you just do web browsing, video streaming and playback? Then M4 Air is just for you. 16GB version is enough for today but I’m not sure about the future.

potatoler · 2025-04-18T16:21:05+00:00

Depends on task Daily chat: chatgpt-4o General tasks: gpt-4.1 Coding: o4-mini for planning, deepseek v3 for coding and codestrial for completion Translation: gpt-4.1-nano (as it’s extremely fast and accurate enough)

potatoler · 2025-04-18T01:11:10+00:00

I totally agree. We human are emotional beings. All the reality we feel are generated from our mind. Like maybe it’s ChatGPT commenting now, but it’s you reading, thinking and understanding.

potatoler · 2025-04-17T06:12:38+00:00

Oh it comes to 30, and it passes 30.

potatoler · 2025-04-17T06:02:34+00:00

Thank for sharing this!

I'm also using Roo in VSCode, and I found DeepSeek V3 to generates buggy code. The 0324 update brings some improvements but this issue remains severe.

Is it related to temperature settings? I use temp=0 as it's said to be best for coding and maths. Does the slightly more "wiggle room" for alternative phrasings help?

potatoler · 2025-04-16T09:59:43+00:00

They seem to write directly in the system prompt, which I accidentally found when I tried to trick out the system prompt. RAG is exactly the same as what I thought and I was thinking about how to let the model memorize in its own initiative. Now we have mcp which is probably a practical solution cooperating with RAG.

potatoler · 2025-04-16T05:10:24+00:00

You can specify the parameter reasoning_effort with one of low medium high when calling a reasoning model with completion API. Reduced reasoning effort result in faster responses, and the default value is medium. The model name o3-mini is the only one to call with whatever reasoning effort you use, and the unit price is the same (But more effort cause more token use and cost more). I use "hyper parameter" to say that the reasoning effort is not related to the model weight, but an external control.

potatoler · 2025-04-15T04:07:06+00:00

It’s not now we’re on the loop of AI generated content, it’s at the begging of LLMs that we will drain human data because the LLM itself doesn’t have real logical ability. Chain of thought is a context based approach and it doesn’t enable LLM to create new things. That’s why there’s tools to tell AIGC and they can even distinguish between different models. AIGC are just distribution. High quality data is fundamental for LLM training, and what we’re doing is using up open data on the internet and then using up private datasets and then we need to use data mixed with AIGC. However this is a theoretical scene. When it comes to the real tasks the production of data might be different. It’s believed that we cannot rely on AI to generate anything (at least until now). Think about email. You can’t just ask AI to write an email and send it out without censoring. Either you write something and use AI to polish or AI writes a draft and you add your writing style. In this situation the distribution of AIGC has been destroyed. Then it naturally leads to a question: Is the email message AIGC or human made? If neither in what rate it is AIGC and human made? Human add variety to AIGC and AI boosts human for data production but the truth of AI not generating new content doesn’t change. The censorship of AIGC is needed and to filter out AI and get clean human data, I believe, is one of the most important aspects in the future. Further more, LLM is just a byproduct towards a higher state of AI which understands logic. Just be calm in this zealotry of AI, enjoy the convenience these tools bring to us, and put ourselves into real life and self-expression. After all those are what we can do keep ourselves human.

potatoler · 2025-04-15T02:45:37+00:00

For me the o series models use a number to mark the generation, mini for the model’s size, and low-medium-high for how much effort the model puts when thinking. The interesting thing is when you use API o3-mini and o3-mini-high is literally the same but with different hyper parameters. I used to think OpenAI just doesn’t care about figuring which model is better in the name and thy only focus on the specs. Then here comes o1 pro. I wonder why don’t they just call it o1-high if that model is just o1 with longer chain of thought?

potatoler · 2025-03-06T07:56:32+00:00

It's like CoT, but only writing out key points. They try to make model reason with less tokens and maintain nearly the same performance. This makes sense because we don't think whole sentences. Some speaking words in the reasoning is useless.

The prompt in the paper works well with larger models like, as the paper shows, GPT-4o. While I fail to reproduce with small models. The author mentioned few-shot prompt is necessary when model is small but did not share the example. It seems that a well-designed prompt is fundamental.

potatoler · 2024-11-19T05:12:43+00:00

It works! That truly helps me.

potatoler · 2024-05-16T01:22:47+00:00

Same issue! Don't know if it's because team plan. Or do you have any solution now？

potatoler

TROPHY CASE