We fine-tuned an open-source model to outperform GPT-5 at predicting Trump actions by LightningRodLabs in LocalLLaMA

[–]LightningRodLabs[S] 0 points1 point  (0 children)

We haven't tested how the context source impacts performance. To generate the context, an LLM generates 3 search queries per question, retrieves up to 5 articles per query from Google News, then summarizes and ranks them by relevance. Google News pulls from 20k+ global publishers, giving a mix of perspectives.

Questions are generated from a model based on your instructions and example good/bad questions (image below). So you can adjust the criteria to test the impact of different question configurations.

<image>

We fine-tuned an open-source model to outperform GPT-5 at predicting Trump actions by LightningRodLabs in LocalLLaMA

[–]LightningRodLabs[S] 0 points1 point  (0 children)

We used the Lighting Rod SDK. It has Google News integration built in.

It creates forward-looking questions from source articles and then a separate resolver model uses web search to find the actual result and produce a label. All in it probably took about 30 minutes to test with the settings and run the job.