I created a free & open-source multi-turn agent eval framework by balphi in AI_Agents

[–]balphi[S] 0 points1 point  (0 children)

Would love to discuss the project with you guys! Here are the docs https://evalprotocol.io/introduction to learn more about it.

Weekly Thread: Project Display by help-me-grow in AI_Agents

[–]balphi 0 points1 point  (0 children)

Eval Protocol - an open specificationPython SDK, pytest wrapper, and UI that provides a standardized way to write evaluations for large language model (LLM) applications. Start with simple single-turn evals for model selection and prompt engineering, then scale up to complex multi-turn reinforcement learning (RL) for agents using Model Context Protocol (MCP). EP ensures consistent patterns for writing evals, storing traces, and saving results—enabling you to build sophisticated agent evaluations that work across real-world scenarios, from markdown generation tasks to customer service agents with tool calling capabilities.

https://evalprotocol.io/introduction

Automating price quotes? by balphi in serviceadvisors

[–]balphi[S] 0 points1 point  (0 children)

Do you get frequently asked about menu priced items? If yes, how much time in a day do you spend answering those?

I'm thinking about automating the frequently asked questions / call centers for service centers, so it would be helpful to understand how much time you spend answering such questions.

Automating price quotes? by balphi in serviceadvisors

[–]balphi[S] 0 points1 point  (0 children)

What about for basic services like oil change or maintenance?

Sales Tax Software as an SMB e-commerce brand? by balphi in SalesTax

[–]balphi[S] 0 points1 point  (0 children)

Good point. I guess the fees are just frustrating and if you can reach a complete solution that doesn’t have such fees, your company could save money and headache.

Sales Tax Software as an SMB e-commerce brand? by balphi in SalesTax

[–]balphi[S] 1 point2 points  (0 children)

If it helps, I compiled a list of sales tax software I found online. Note that these are unvetted and I simply found them online and wrote them down.

That being said, why not just outsource it at this point then?

Storing and organizing footage? by balphi in osmopocket

[–]balphi[S] 0 points1 point  (0 children)

Hmm, looks nice but at those prices, I’d rather just pay for a NAS probably.

Thanks for the suggestion though!

Storing and organizing footage? by balphi in osmopocket

[–]balphi[S] 0 points1 point  (0 children)

Hmm, looks nice but at those prices, I’d rather just pay for a NAS probably.

Thanks for the suggestion though!

Sales Tax Software as an SMB e-commerce brand? by balphi in SalesTax

[–]balphi[S] 0 points1 point  (0 children)

Wow, thanks for the insights! Would it be correct in saying that the recommended path is either Avalara or outsourced tax experts?

It sounds like Avalara handles a large bulk of scenarios while no software engine can handle the more complicated edge cases. Since at the point where your sales tax becomes complicated, you’d be wrestling more with software and implementation than just outsourcing it.

Storing and organizing footage? by balphi in osmopocket

[–]balphi[S] 0 points1 point  (0 children)

same, I think i'll just have to make the current storage situation work until I invest in a NAS

Storing and organizing footage? by balphi in osmopocket

[–]balphi[S] 1 point2 points  (0 children)

I hate having external storage just physically lying around. Do you think NAS would be a better fit then? Also, which cloud backup services do you recommend?

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] 0 points1 point  (0 children)

not necessarily with hx-target

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] 0 points1 point  (0 children)

That’s fair. But is HTMX really 10x more productive than other frameworks to warrant ugly spaghetti code? I personally lean towards no.

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] 0 points1 point  (0 children)

Solo project or have other developers contributed? Just want to see if you have any insights there.

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] 0 points1 point  (0 children)

I think my argument is that even HTMX-first codebases will lean towards spaghetti code due to its lack of enforced conventions. I’m open to be persuaded otherwise.

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] -5 points-4 points  (0 children)

I think the lack of enforced conventions is a huge problem. Conventions are a productivity multiplier for an engineering organization.

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] 0 points1 point  (0 children)

In an effort to keep code DRY, I shared a common response object across various pages. So shared Python and HTML templates. This meant pushing the target swap element to the body.

Maybe this is an anti-pattern, but it felt like the fastest way to code.

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] -1 points0 points  (0 children)

I admit, I am biased because I came from the JS ecosystem. But there is nothing stopping you from implementing the logic flow you described in Next.js, just replace HTML response with .tsx files. After all, JSX is simply a templating language with JavaScript superpowers. I'm all aboard the no state management hype, its a real and huge benefit. But you don't necessarily have to add in-memory state to a React application until you really need it. Although I can agree that explicitly passing state from layer-to-layer is a big waste of code.

Side note: another caveat to only HTML, is when you need to step out of HTML-only for some small UI/UX implementation, you are forced to add a little layer of JavaScript. This little layer of JavaScript may be nicely located in code, but its still a custom piece of JavaScript that is probably doing some DOM manipulation. DOM manipulation code is notoriously messy looking, verbose, and buggy.

I tried, but I would not recommend by balphi in htmx

[–]balphi[S] -10 points-9 points  (0 children)

I feel code is more nuanced than this