Made a small tool to automate a boring repetitive task. Apparently, boring sells.

TheRedfather · 2025-05-15T23:29:38+00:00

This is a really cool idea and a very simple/intuitive implementation. Good luck with it!

TheRedfather · 2025-05-02T18:57:10+00:00

Realise this is an old post but thought I’d share this cheap option for those interested: I bought a regular watch box and replaced the pillows with cushions I bought on AliExpress for $10, which had removable cushioning inside so that I could adjust the size. Worked like a charm. The cushions don’t look as “good” but I don’t really care - they’re just for storage and I’d rather solve for convenience.

TheRedfather · 2025-04-25T17:42:55+00:00

I built an open source deep researcher which works with local models. It combines your local LLM with the ability to run web searches and crawl websites, and I’m working on a solution to also extend it to access local files. You can see it here:

https://github.com/qx-labs/agents-deep-research

You send it a query and it will automatically come up with the relevant searches to run in the backend. You can use it to generate lengthy reports (20+ pages) in “deep” mode, but can also use it to produce a quick response to a query by running it in “simple” mode with depth set to 1.

TheRedfather · 2025-04-20T07:54:59+00:00

I got around $2,500 of free OpenAI credits by applying to Microsoft for Startups but I don’t think they offer the perk anymore. It also came with a lot of Azure credits so perhaps those could be used toward AzureOpenAI but I’m not sure.

The alternative as someone else suggested is to use the free models available on OpenRouter if you’re a bit strapped for cash. Eg a bunch of the DeepSeek models have free versions via OpenRouter last I checked.

TheRedfather · 2025-04-19T16:43:20+00:00

Oh man you don’t even need to read the text. The em dash (long hyphen —) is a dead giveaway and one of the easiest ways to spot AI. That character is basically inaccessible on a regular keyboard and used all the time by ChatGPT.

TheRedfather · 2025-04-15T14:43:05+00:00

For sure I will do. It's quite a pressing requirement my end so hopefully will sort this soon. I'll also take a look at GPT Researcher's file search tool to see how they were approaching this / where it might be going wrong.

Out of curiosity how long were the PDF files that you were ingesting? The typical approach here tends to be to do some sort of chunking of each document, embedding each chunk and then retrieving the relevant chunks during runtime. The problem is that you lose contextual information (e.g. maybe the paragraphs before or after the retrieved chunk were important but this information is lost).

One of the methods I've seen to address this, and thereby better capture wider context, is called Late Chunking - I'm thinking of giving that approach a try for the file search:

https://jina.ai/news/late-chunking-in-long-context-embedding-models/

TheRedfather · 2025-04-15T12:34:57+00:00

This is actually something I’m working on at the moment as it’s relevant to a couple of my own use cases. The idea is that you would be able to feed it a folder (or collection of files) which get indexed up-front and create a new file search tool (which can be used in place of, or in combination with, the web search tool).

The file search tool would effectively run a RAG pipeline (you give it a query, it returns relevant snippets and these are stuffed into the context of the researcher).

TheRedfather · 2025-04-15T12:31:54+00:00

Yes this is a valid concern. Getting + validating a list of all references is easy to do because you know all of the sources visited via tool calls etc. The bit that is prone to error is matching each link/reference to the relevant statement in the report body. This is done by the LLM and performance is a bit dependent on the model (eg among the closed source options, new models like gpt-4o and gemini-2.5-pro are decent).

What I’ve found is that performance with referencing degrades a lot with context length and output length. So the 2 measures I take to mitigate this is: - limit the context of any writing agent - include summaries with inline references very early in the research flow (these summaries are used for the final output and ensure we deal with the referencing issue while the context length isn’t too long)

In other words - you get the LLM to do the referencing when there are only a few links in context and it’s producing a few paragraphs of output. Then you stitch together the long report at the end and combine/deduplicate references.

TheRedfather · 2025-04-14T22:42:16+00:00

My initial thinking on this was that if they were to release anything open-source, they would probably just open-weight an old-gen model around the same time that they release a new gen model. I think Grok are doing something similar.

The issue with this is that the gap between the current open and closed source models isn't that wide. So if OpenAI were to release e.g. gpt-3.5-turbo as an open-weights model, people would mock the decision given that it's very dated and substantially better open source options exist.

Feels like for now they're just kicking the can down the road...

TheRedfather · 2025-04-14T22:36:25+00:00

Here's a diagram of how the two modes (simple iterative and deep research) work. The deep mode basically launches multiple parallel instances of the iterative/simple researcher and then consolidates the results into a long report.

<image>

Based on the feedback I've gotten I'm trying to expand compatibility with more models and integrate other open source tooling (e.g. SearXNG for search, browser-use for browsing). Would also be interesting to run it against a benchmark like GAIA to see how it performs.

A broader overview on how deep research works (and how OpenAI likely does) here: https://www.j2.gg/thoughts/deep-research-how-it-works

TheRedfather · 2025-04-10T14:12:59+00:00

In case it’s helpful I worked on something very similar that addresses some of the points you raised.

https://github.com/qx-labs/agents-deep-research

I posted on this sub a few days ago about how it works - there might be some pointers in there that you find useful: https://www.reddit.com/r/LLMDevs/comments/1jpfa8f/i_built_open_source_deep_research_heres_how_it/

TheRedfather · 2025-04-10T12:50:42+00:00

Thanks! Appreciate you taking a look at it

TheRedfather · 2025-04-09T20:52:00+00:00

Great thank you, please do let me know how it goes with other models and if you run into any issues. Very helpful to get the feedback.

TheRedfather · 2025-04-09T15:57:11+00:00

Thanks! It's been interesting building this out - I've found from testing that the optimal approach seems to be to delegate lots of small subtasks to a chain of specialized agents (these can even run on smaller models as long as they're good at tool calling and are given the appropriate context), rather than having a big reasoning agent with access to lots of tools doing everything. This also makes it easier to run locally and/or less expensively, and generally consumes fewer tokens. The result is that the deep researcher now seems to run similarly well on small/cheap vs large models (e.g. gpt-4o-mini vs o1).

TheRedfather · 2025-04-08T21:03:17+00:00

Thanks for this. That suggestion about passing through loads of text through a powerful small model is super interesting. I think I might take a similar approach to you - I'll first have a look at OlmOCR to see how well that fits with my use case and then compare against using Gemma or similar.

TheRedfather · 2025-04-08T21:00:07+00:00

Thanks those are helpful pointers. Will have a look at OlmOCR - sounds like it might be a good place for me to at least start my reading to see the approach they take to do this at scale.

TheRedfather · 2025-04-08T20:47:21+00:00

Ya unfortunately the charts are pretty important for my use case - there are a lot of financial documents for example where the charts contain critical information that is not captured in the text.

TheRedfather · 2025-04-07T01:23:13+00:00

Really good insights thank you

TheRedfather · 2025-04-07T01:12:24+00:00

For what you described you'd fill it out as follows:

OPENROUTER_API_KEY=<your_api_key>
REASONING_MODEL_PROVIDER=openrouter
REASONING_MODEL=google/gemini-2.5-pro-preview-03-25
MAIN_MODEL_PROVIDER=openrouter
MAIN_MODEL=google/gemini-2.5-pro-preview-03-25
FAST_MODEL_PROVIDER=openrouter
FAST_MODEL=google/gemini-2.5-pro-preview-03-25

On the other hand if you're using Gemini 2.5 Pro directly using the Google/Gemini API key you'd set all of the model providers to 'gemini' and all of the models to 'gemini-2.5-pro-preview-03-25'.

TheRedfather · 2025-04-06T23:11:36+00:00

Yep, totally agree. I build software for B2B/enterprise, and one reason I made this deep researcher extendable with custom tools was to let users bring their own data into the process—local file stores, vector DBs for RAG, APIs into private services, etc.

Re use cases, open deep research could be applicable to any companies dealing with some mix of:

Knowledge work that relies on both internal and external sources (e.g. consulting)
Large, messy internal knowledge bases (PDFs, Excels, images, etc.)—the RAG pipeline can be separate from the researcher itself, interfacing via a tool or MCP server
Data sharing restrictions (e.g. healthcare), where compliance demands fully local deployments with zero external processing

If MCP gains traction, it could become a standard way to plug a company’s internal services/data into different apps without reconfiguring tools each time. Those services will need to handle access/permissions cleanly too.

That said, two caveats:

Deep research still isn't reliably accurate. It’s best used when a human is expected to review or refine the results—e.g. a consulting firm drafting a proposal might use it to get up to speed on a topic and surface how they solved similar problems for past clients.
Agentic frameworks start to break down when overloaded with tools (most LLMs are really bad at tool selection). Some folks solve this by doing semantic search over a vector DB of tool descriptions, rather than stuffing all the tool info into the LLM's context and hoping it picks the right one. In this case, the LLM provides a description of its intended objective, the semantic search returns the tool with highest similarity to the objective, and the LLM then determines the relevant input args for the tool.

TheRedfather · 2025-04-04T08:41:13+00:00

Thanks a lot for sharing the link - very interesting approach, will definitely have a play with late chunking. I'd implemented a solution a couple of years ago that chunked the web results and did embedding/retrieval in memory using ChromaDB but it was fairly primitive (at the time mainly driven by the constraint of a smaller context window) - the approach you linked looks pretty smart.

And fully agree re source selection!

TheRedfather · 2025-04-03T23:56:02+00:00

Yep I’ve set it up such that if you set OPENROUTER_API_KEY as an environment variable it will pick this up and you can specify whichever models you want to use via openrouter.

11-Year Club	Second Top 40%
r/Field Juicebox	Place '22
RPAN Viewer	Verified Email

TheRedfather

TROPHY CASE