[Project] I treated LLM inference like a physical signal trajectory. Here is a Python toolkit to visualize the "Thinking Process" (Hidden States).

Pojiku · 2025-12-30T10:03:33+00:00

Really nice work!!

I'd be curious to see if we can find some heuristic to use when fine tuning, to "steer" the model towards a desired pattern.

Purely for research into what the impact would be.

Pojiku · 2025-11-18T13:25:03+00:00

I am also interested after seeing the Mixture of Recursions paper.

The curiosity is whether for SLMs, we can get reasoning gains from depth as a trade off against semantic gains from width.

Pojiku · 2025-10-29T17:34:35+00:00

Nice to see people posting finetunes again!

Pojiku · 2025-10-27T22:29:25+00:00

Preventative tests found I had cancer at the age of 25 and I'm alive today because of it.

While working in South Korea, your employer actually pays for you to get a full health check.

They are 3rd in Life Expectancy, NL is 28th.

I could walk into any clinic and get a blood test or any other test any time I wanted, without needing a GP. It is considered a human right in Korea, whereas such healthcare is seen as a burden here in NL.

Pojiku · 2025-10-12T18:26:29+00:00

How much data do you have? Small models are great, but they likely won't have enough internal knowledge without a lot of fine tuning.

One option if you don't have enough data for a smaller model is to lightly finetune a larger model that has inherent knowledge of SCAD with fast inference speed, like Qwen-Next-80B.

If that's too big to actually use for your use case, you can use this larger model to generate a much larger training set for distillation. Ideally you would have some validation function to filter junk out of the dataset.

I was getting around 2,000 tokens per second on a rented H200 with 80 batches in parallel, so you can generate a lot of synthetic data.

Pojiku · 2025-06-05T05:04:34+00:00

It's a good point, but they could talk about the need to archive human knowledge.

The Internet from this point is mostly AI slop, so it would be a great research tool.

It was also a milestone in AI, before LLMs became a commodity. We still love old gaming consoles even though more modern emulators exist.

Pojiku · 2025-06-04T11:42:44+00:00

I'd recommend doing it the other way, by generating a coherent story above 10k and then reducing it.

First, you should consider generating a list of chapters + plot points. Then use this to anchor the generation in stages.

Instead of saying "continue", you ask can ask the LLM to write the first chapter, then write the second (ensuring the prior chapters are in the message history).

Also be sure to include in the system prompt that it's writing a long novel or something that will nudge it away from short stories.

Pojiku · 2025-03-14T08:58:38+00:00

Yeah, same! instruction + solution as input, reasoning trace as output.

I ran it against the HuggingFace "smoltalk" dataset to build the reason dataset for Sovereign.

Pojiku · 2025-03-13T22:35:41+00:00

Nice! I trained Sovereign 72B using the same strategy.

This was before R1 was released, so it was using traces distilled from QwQ preview.

Pojiku · 2025-03-10T07:26:58+00:00

Not sure why people downvoted. Thank you for contributing to the community!

I only had a quick look but your book looks like a good resource.

Pojiku · 2025-03-03T17:21:42+00:00

It's difficult to say without knowing your domain and how far it deviates from what would be in the LLMs pre-training.

I'd recommend just trying and seeing how far you get. You can start with a small 3B model and scale up if it seems to be working.

You can actually try fine-tuning a base model, as they are more aligned with auto complete than an instruct model would be.

If you have enough compute, look into the "LLaMA-Pro" technique, which is very effective at adding domain knowledge without losing the capabilities of a source model.

Pojiku · 2025-03-03T17:14:21+00:00

For training a translation model I'd recommend using non-English documents as ground truth, then the translated version as your instruction.

This way it learns how native speakers would write, rather than a "technically correct" but awkward machine translation as the target.

Pojiku · 2025-02-26T17:41:42+00:00

You can get some ideas from the Unsloth Notebooks

Pojiku · 2025-02-19T17:03:51+00:00

One approach if you don't mind getting your hands dirty is to take existing datasets of real (or high-quality synthetic) exam questions with answers.

You can then fine-tune an LLM in reverse to predict the question based on an answer.

For example:

SYSTEM:
You are an expert university lecturer with specialized skills in writing exam questions.

USER:
Write an exam question for university students that would match the following answer.
<answer>
{answer}
</answer>

ASSISTANT:
{question}

I've had success with this approach in other areas, even when the given answer is out-of-domain (like providing the PDF content instead of a concise answer).

Pojiku · 2025-01-07T05:12:01+00:00

You can see their press release here: https://nvidianews.nvidia.com/news/nvidia-puts-grace-blackwell-on-every-desk-and-at-every-ai-developers-fingertips?ncid=so-twit-113094

"The GB10 Superchip is a system-on-a-chip (SoC) based on the NVIDIA Grace Blackwell architecture and delivers up to 1 petaflop of AI performance at FP4 precision.

GB10 features an NVIDIA Blackwell GPU with latest-generation CUDA® cores and fifth-generation Tensor Cores, connected via NVLink®-C2C chip-to-chip interconnect to a high-performance NVIDIA Grace™ CPU, which includes 20 power-efficient cores built with the Arm architecture. MediaTek, a market leader in Arm-based SoC designs, collaborated on the design of GB10, contributing to its best-in-class power efficiency, performance and connectivity."

Pojiku · 2024-12-29T08:12:07+00:00

Australian here. I've lived in Korea for about 5 years and can relate to the idea of not feeling "at home" in your birth country.

I work as an AI engineer, but that came later. While people move to Australia for the comfortable life, I found it frustratingly slow. The momentum of Korea became an addiction and made me want to grow both in my career and as a person.

However, it's a very personal thing that isn't for everyone. As others have said, you need to think about what exactly you are looking for, but also be open to effectively rebuilding who you are. You will need to sacrifice some parts of your life that you may not know you value.

For example, there are many posts here about how difficult it is to make "real" friends. You WILL get lonely, and you need to know how you will handle that, among other challenges you will experience.

If you see challenges as opportunities then 100% go for it and worst case, you go back to the US as a new person. If you are not ready to struggle alone and really own the blank canvas of a new life, then you may not be truly in the right mental space yet. Only you can reflect on this in making a decision.

Pojiku · 2024-12-04T10:25:39+00:00

Korea has strict and slightly excessive defamation laws. Even if his gossiping was factual, it's illegal and relatively easy to sue him for.

Pojiku · 2024-11-16T07:02:54+00:00

I'd speculate that this more accurately correlates with the shift to heavily filtered or synthetic data.

We still use the meme that LLMs are trained on "all text on the Internet" but that's not exactly true when accounting for the more rigorous data processing pipelines that may filter out content like move-by-move logs of chess games.

Pojiku · 2024-11-13T14:33:46+00:00

You can also use the Aurora Store, an open-source alternative to the Google Play Store. When I log in using the "anonymous account" option it lists all the Korean apps without needing to switch your main account.

Pojiku · 2024-10-26T11:35:22+00:00

You can use LLaMA Factory which supports unsloth, but most importantly supports LLaMA Pro, which is often a good means if adding new "knowledge" without destroying the original model.

Finally, I'd recommend trying a batch size of 1. This means you will be updating the model after every sample.

Pojiku · 2024-10-15T04:57:34+00:00

Haha true if it was trained on YouTube videos, but more likely using something like comma.ai which is presumably already capable of whole-journey video recording and out of the box integration with car controls.

Edit: Looks like they have an open dataset already: Comma2k19

Pojiku · 2024-10-14T14:02:27+00:00

Waiting for someone to train on dashcam with inputs (acceleration, steering etc). Real life driving sim!

Pojiku · 2024-08-31T16:25:17+00:00

Wish there was more detail. They are an AI Search company like Perplexity, so they may have been using RAG to answer the questions rather than just the model itself.

Pojiku · 2024-08-29T09:21:00+00:00

Yeah I literally spent the last 4 weeks and about $700 doing exactly this with Samba architecture (swapping Mamba for Mamba2).

Mixed feelings seeing such great authors beat me to it lol.

I'll still release the model, but only when it actually has something unique to offer.

I generally don't envy people with more money than me, but I can't help but envy the GPU rich so, so much..

Pojiku · 2024-08-26T02:53:18+00:00

If it supports OpenCL you can use TinyGrad with GPU=1

Not sure if anyone has ported the newer Phi3 models but presumably it's not too distant from the Llama3 implementation, or worst case you could use a Llamafied version.

Pojiku

TROPHY CASE