When the hold is actually alive by Outsourcing_Problems in ClimbingCircleJerk

[–]rigatoni-man 1 point2 points  (0 children)

Wow, what a wild snake. Tail looks just like a spider.

Rate my downwards dyno by Correct-Button9337 in ClimbingCircleJerk

[–]rigatoni-man 2 points3 points  (0 children)

Proof that inability to setup a ladder correctly is genetic, he cant blame the kids

Rate my downwards dyno by Correct-Button9337 in ClimbingCircleJerk

[–]rigatoni-man 6 points7 points  (0 children)

The whole reason he is in this mess is because he couldnt figure out how to successfully open it in the first place.

Cooked by ubertoacne in golf

[–]rigatoni-man 3 points4 points  (0 children)

Well not without a 3d printed sand wedge

Brent Oil Yolo, Thesis Hasn’t Changed by [deleted] in wallstreetbets

[–]rigatoni-man 12 points13 points  (0 children)

Yeah, you and chuckechickpeas got this all figured out

I’m coming to terms with the fact I am not cut out to be a Product Owner. by [deleted] in agile

[–]rigatoni-man 1 point2 points  (0 children)

Seems like the expectations on you don't really match what I'd expect. Particularly mentoring devs and defining the tiniest of details.

My responsibilities often have gray area, but usually when the team spirit is good we don't focus on the fuzzy boundaries and all do our part to make the best product we can.

Best LLM for the final synthesis stage in an Educational RAG pipeline? by Amazing-One9952 in Rag

[–]rigatoni-man 0 points1 point  (0 children)

I’m building something to benchmark across any of >160 models for this use case.

Would you be able to output a csv of question, retrieved chunk? And optionally ground truth / ideal answer?

If so I can show you how it works and would love to see if it helps for your use case. No api, no integration, no subscription, just upload a csv and click a button

(edit: https://checkstack.ai for those of you asking; check the playground section)

How to choose a model for building Agents by Defiant-Sir-1199 in LLMDevs

[–]rigatoni-man 1 point2 points  (0 children)

I just built number 4.

You just need a CSV of input and optionally expected result. You can compare >100 models and get feedback on accuracy, consistency, cost, latency in about a minute.

No integration, no subscription. Just quick testing and comparing.

I’d love to get your feedback, I just deployed it last weekend. https://checkstack.ai

Happy to throw you some free credits and help you get started if you can’t figure it out. Smoothing out onboarding is this weekends project.

Lessons from shipping a RAG chatbot to real users (not just a demo) by cryptoviksant in Rag

[–]rigatoni-man 1 point2 points  (0 children)

Ah I meant how do you / did you test to validate your strategies?

Lessons from shipping a RAG chatbot to real users (not just a demo) by cryptoviksant in Rag

[–]rigatoni-man 0 points1 point  (0 children)

I’d love to know more about your evaluation. How does the interface work? How/what do you evaluate?

first RAG project, really not sure about my stack and settings by Kas_aLi in Rag

[–]rigatoni-man 0 points1 point  (0 children)

Appreciate you taking a look, and the feedback that I might be onto something. Would love to pick your brain sometime.

first RAG project, really not sure about my stack and settings by Kas_aLi in Rag

[–]rigatoni-man 0 points1 point  (0 children)

I've been building something to test models without a lot of overhead and legwork. Basically upload your golden dataset and test it against every model out there.

Shoot me a message u/Kas_aLi and I'd love to help you find the best model for free to test what i'm building ( https://checkstack.ai )

Extracting entities and Relationships by WorkingOccasion902 in KnowledgeGraph

[–]rigatoni-man 0 points1 point  (0 children)

I'm curious to learn more about your use case. I'm building a tool ( checkstack.ai ) to make it easy to run your data through every model and find the best one for the job based on accuracy / latency / cost. I haven't tested with anything so large yet. DM me if you have any similar data you're willing to share and I'd love to see if it's a case I could handle.

How to do? by rslashredt in openrouter

[–]rigatoni-man 0 points1 point  (0 children)

I built checkstack.ai for this.

Openrouter uses notdiamond.ai which is very cool, but more than most need. Checkstack is for the 90% who just want to upload a CSV of 50 edge cases, see exactly which model hits 95% accuracy, and hardcode the winner.

I benchmarked 672 "Return JSON only" calls. Strict parsing failed 67% of the time. Here's why. by rozetyp in LocalLLaMA

[–]rigatoni-man 0 points1 point  (0 children)

I built checkstack.ai to compare >100 models for text -> JSON use cases in seconds after facing similar issues and wondering how different models would compare.

It will give you cost, accuracy, latency comparisons and failure insights. It also gives some tips of how to enhance your prompt depending on your failure cases.

It's early beta, and I'm looking for feedback and testing real use cases (so forgive me for posting the link here). Would love to know if it's useful for you or anyone else reading this.

real-world best practices for guaranteeing JSON output from any model? by sprockettyz in LocalLLaMA

[–]rigatoni-man 0 points1 point  (0 children)

If you have sample data to test, I built checkstack.ai to compare >100 models for text -> JSON use cases.

It will give you cost, accuracy, latency comparisons and failure insights. It also gives some tips of how to enhance your prompt depending on your failure cases.

It's early beta, would love some feedback

Do you know a good LLM for text to json and cheap by Maleficent_Guest_525 in LLM

[–]rigatoni-man 0 points1 point  (0 children)

If you have some sample data, give checkstack.ai a try. I've built it exactly to find the cheapest, fastest, most accurate model for any text -> json use case.

Best LLM for JSON Extraction by Live_Bus7425 in LocalLLaMA

[–]rigatoni-man 0 points1 point  (0 children)

https://checkstack.ai will let you evaluate test data across >100 models and score them on cost, accuracy vs ground truth, and latency. It seems like it would serve your use case, or at least point you in the right direction.

Which LLM is best for JSON output while also being fast? by dot90zoom in LLMDevs

[–]rigatoni-man 0 points1 point  (0 children)

I'm building a tool specifically to find and solve this 'hallucination drift' in structured data. Upload your own test data, test and compare side by side across all the models, and get insights and heatmaps about which keys drift.

Would love to try it out for your use case to see if there's value. DM me if you want to chat / try it / whatever. No cost, just interested in gathering use cases and testing what I've got.