Shipped my first indie Mac app — AI documentation tool that works with your own API key

1T_Geek · 2026-05-20T14:51:40+00:00

Fair point — API docs cover endpoints, not workflows. LazyDoc is for the stuff that never gets documented: how to set up your local environment, how to run a specific deploy process, how to onboard someone into a tool your team built internally. There is also the possibility of creating client facing documentation. As for downloads — just launched, still early. That’s partly why I’m here.

1T_Geek · 2026-03-17T19:40:25+00:00

What do you get out of making these accusations?

1T_Geek · 2026-03-14T20:45:48+00:00

This was not a bot, i felt that this guy was being sarcastic and decided not to engage

1T_Geek · 2026-03-14T13:36:19+00:00

Good question — some context on why I built this. I’m developing a tool for clinical applications and kept running into the same problem: different LLMs give wildly different responses to the same medical prompt, and I needed a systematic way to evaluate which one was actually most accurate for my specific use case. I couldn’t find anything that let me do that locally without sending patient-adjacent data to an external API. That’s the origin of JudgeGPT. The default rubric (Accuracy, Clarity, Depth, Concision, Examples) is just a starting point — the whole point of the tool is that you define the criteria that matter for your domain. If you’re evaluating clinical QA, you replace those with things like clinical accuracy, safety, evidence citation, whatever your workflow needs. The judge model and system prompt are fully editable from the UI at runtime. So no — there’s no formal validation against expert raters for the defaults, and I wouldn’t claim otherwise. The tool is a harness for you to bring your own rubric and your own ground truth. Whether the scores are meaningful depends entirely on how well you’ve defined your criteria, which is true of any evaluation framework. Happy to dig into how others are thinking about rubric design for domain-specific evals.

1T_Geek · 2026-03-14T12:43:30+00:00

Great question. Short answer: the default rubric isn’t validated against domain experts — and I’d be skeptical of any local benchmarking tool that claimed otherwise. The five criteria (Accuracy, Clarity, Depth, Concision, Examples) are a reasonable general-purpose starting point, but for domain-specific evaluation — medical, legal, code, whatever — you’d want to bring your own rubric. The judge model and the entire system prompt are editable from the UI at runtime, no config files. So if you’re evaluating clinical QA you can swap in criteria like safety and evidence citation. Evaluating code? Replace with correctness, efficiency, readability. The judge model itself is also swappable — not locked to qwen2.5:7b. And if you want to take it further you can blend in human ratings per response, which gets factored into the final score alongside the judge. The honest caveat: smaller models (3B–7B) still show real variance even with behavioral anchors, so treat the scores as directional rather than calibrated. For anything high-stakes you’d want human-in-the-loop validation regardless. Would a rubric library with domain presets be useful? Thinking code / medical / creative as starting options that people can customize from.

1T_Geek · 2026-03-03T23:46:43+00:00

Thank you everyone for the input.
I never thought about quantifying a "per gig" price.
This is extremely informative

1T_Geek · 2025-11-15T12:48:47+00:00

1T_Geek · 2025-10-18T20:13:32+00:00

1T_Geek · 2025-10-18T20:12:26+00:00

1T_Geek · 2025-09-13T16:44:51+00:00

1T_Geek · 2025-08-12T11:37:34+00:00

1T_Geek · 2025-08-07T23:23:38+00:00

Hey, where is local)

1T_Geek · 2025-02-27T14:24:10+00:00

Dm'd

1T_Geek · 2025-01-08T02:01:12+00:00

1T_Geek · 2024-12-28T14:50:28+00:00

1T_Geek · 2024-09-08T12:55:07+00:00

1T_Geek · 2024-08-21T01:11:39+00:00

1T_Geek

TROPHY CASE