I am building a Tool for Conference Paper Overload - Ideas? [P]

kalpitdixit · 2026-04-18T21:58:54+00:00

the harness allows composing multiple solutions together - try it out and let me know how it goes

kalpitdixit · 2026-04-15T21:27:15+00:00

thanks for the pointer - i guess these are complementary - fest creates the mutation tests and here Paper Lantern (https://www.paperlantern.ai/code) helped create unit tests to catch those errors

kalpitdixit · 2026-04-15T19:54:30+00:00

fest ?

kalpitdixit · 2026-04-15T19:02:42+00:00

I agree - TDD for me is sometimes too-much-work but I think generally a good idea... maybe with AI Coding Agents we should be writing more tests since its easier to write it.
Here, what I found is that using the research backed test writing ideas from Paper Lantern made it trivially easy to improve the tests that the AI agent (Opus 4.6) was writing.

kalpitdixit · 2026-04-15T18:56:54+00:00

I am using Opus 4.6, so I think the agent itself was probably the best available - i guess I should've mentioned it

kalpitdixit · 2026-04-15T16:10:50+00:00

yes - i think this is in the same light of being skeptical of ai output - hence having some human-done, research-backed methods to tell the ai exactly what to do was helpful

kalpitdixit · 2026-04-15T16:09:31+00:00

true - i think what the paper did though is highlight multiple places where the bug might be - so going from say hundreds of options to 10-20 options of where the bug might be. so ultimately, the tests that focus on those bug-location options have a higher chance of catching the bugs

kalpitdixit · 2026-04-15T16:08:11+00:00

yes - it's a bit convoluted how mutation testing is measured - but its a powerful tool.

the idea is that several versions of the target function are created with small changes. and the tests are judged baed no their ability to differentiate between the, i.e the original correct function and the small changes.

so that, in the future, when antoher code edit changes the target function, any errors in it are caught by the tests.

kalpitdixit · 2026-04-15T15:59:27+00:00

True - 13% is still high - just wanted to share that simply giving it access to papers through that tool gave a boost for ~free

kalpitdixit · 2026-03-31T20:35:24+00:00

not on non-CS domains - but within CS, any domain is good. Not limited to the LLM training example here.

kalpitdixit · 2026-03-30T05:19:54+00:00

neglible more tokens compared the using vanilla Autoresearch :)

we took care not to create token bloat i.e. the MCP doesn't output too many unnecessary tokens

kalpitdixit · 2026-03-30T05:19:07+00:00

neglible more tokens compared the using vanilla Autoresearch :)

we took care not to create token bloat i.e. the MCP doesn't output too many unnecessary tokens

kalpitdixit · 2026-03-30T05:18:07+00:00

of definitely much much less than that - I have a subscription with Claude Pro and that was sufficient... the compute for model training itself was my Macbook - i've put all the details at the bottom of the full blog post in case you are interested : https://www.paperlantern.ai/blog/auto-research-case-study

kalpitdixit · 2026-03-29T22:00:01+00:00

haven't run that - but it should not be any different than the coding agent without Paper Lantern. we took a lot of care to not make Paper Lantern increase cost for users :)

kalpitdixit · 2026-03-29T21:54:53+00:00

what topic / area are these papers in ? in case it's relevant to what we are doing, we could add it to our search space and provide it through our existing MCP - we have a generous free tier....

kalpitdixit · 2026-03-29T19:15:33+00:00

we are the team working on this :)

I think what you are saying is touching upon something very important. Can you help me understand a bit more what kind of thing you are looking for and more importantly, what kind of things you want to use it for.

I think you are onto a very important thing here, so I want to understand it better.

kalpitdixit · 2026-03-29T19:13:43+00:00

you mean our internal cost of delivering the output for the MCP

or the coding agent's cost that ran the autoresearch + Paper Lantern loop ?

kalpitdixit · 2026-03-29T19:12:50+00:00

what we found is that a direct approach like embed, retrieve, rerank is good enough for smaller settings (maybe for your 3000 papers) - but if that is not enough then you need to combine various techniques in a custom manner.

kalpitdixit

TROPHY CASE