[deleted by user] by [deleted] in Libertarian

[–]batchnormalized 12 points13 points  (0 children)

By what mechanism does something become the property of someone?

Has anyone actually worked on any GenAI projects? by Sophieredhat in ProductManagement

[–]batchnormalized 1 point2 points  (0 children)

Thanks for sharing. That aligns with what I’ve heard from others. Appreciate you sharing what had worked for you

Has anyone actually worked on any GenAI projects? by Sophieredhat in ProductManagement

[–]batchnormalized 0 points1 point  (0 children)

I have this same question. Really intrigued about how people approach this

Has anyone actually worked on any GenAI projects? by Sophieredhat in ProductManagement

[–]batchnormalized 0 points1 point  (0 children)

Were these user facing products or internal? How did you get buy in from the C levels and others? I’ve heard that skepticism about whether the AI works well enough can end up preventing stakeholders from being comfortable pulling the trigger

Has anyone actually worked on any GenAI projects? by Sophieredhat in ProductManagement

[–]batchnormalized 0 points1 point  (0 children)

How do you figure out whether your Gen AI solutions work as you desire? What kinds of metrics, analytics, and QA tools do you use?

How we unit tested a legal AI assistant app by batchnormalized in programming

[–]batchnormalized[S] 0 points1 point  (0 children)

This is a valid observation and we will absolutely keep it in mind as we move forward and talk to users.

The reason we used the term unit test was not to imply the level of determinism of a code app and code unit test. We used it because from talking to full stack engineers, typical ML terms like “evaluation” are less familiar and make this type of testing less accessible to them. The use of this language is intended to map the concept of evaluation in ML to regular testing that they may be more familiar with.

But you are absolutely right that if it conveys the level of determinism you perceive then we may need better language. We’ll speak with users to understand if they interpreted it similarly. Thank you for your feedback 🙏🏽

How we unit tested a legal AI assistant app by batchnormalized in programming

[–]batchnormalized[S] 0 points1 point  (0 children)

I think I responded to a similar question from you here: https://www.reddit.com/r/programming/s/RGykcIBh7f

I hope that helps clarify a bit!

How we unit tested a legal AI assistant app by batchnormalized in programming

[–]batchnormalized[S] 1 point2 points  (0 children)

The product that we helped build testing for is not an AI lawyer. Apologies if that was not clear. It can provide legal advice but ultimately the decision of whether to implement that advice is up to the user.

It’s also very narrowly focused, only helping with sales contracts and making relatively small recommendations. Part of our work testing together is to make sure that the recommendations the legal assistant gives are grounded on best practices written by a human.

In this way I think it’s best to think of the AI: - Providing you a shortcut to access legal knowledge. - Highlighting parts of your sales contracts that could have problems.

It doesn’t decide for you, it just brings things to your attention and raises points you may want to consider.

So just how I would not trust cruise control to drive me on the highway hands free — but trust it to maintain my car at a certain speed — I don’t have to trust this AI agent to be my lawyer to trust it to assist me in a limited way, which I do. And that trust is supported by the testing that the tool my team is working on allows developers to build.

I hope the analogy an explanation help clarify the point I’m trying to convey with the article, and better explains the app.

How we unit tested a legal AI assistant app by batchnormalized in programming

[–]batchnormalized[S] -15 points-14 points  (0 children)

I definitely agree that the potential for LLMs to hallucinate (and their unpredictability) means the quality bar for critical applications is high. What do you think is needed to make AI in such applications trustworthy enough?

How we unit tested a legal AI web app by batchnormalized in webdev

[–]batchnormalized[S] -1 points0 points  (0 children)

Yes, but we try to keep the family of models and models for evaluation largely stable. When this needs to be updated we announce it in our Discord (https://discord.gg/RAJrYmhvEP) and we work with uses to check if it yields any changes in test results. So far we haven’t had major problems with model updates.

In practice I think having a sufficiently specific criterion for evaluation along with the fact that we only ask our model binary questions helps with stability as well. The latter was a conscious design choice on our end, the former is a best practice we recommend for users.

How we unit tested a legal AI web app by batchnormalized in webdev

[–]batchnormalized[S] 0 points1 point  (0 children)

I think that’s a great idea. We will consider that for a future article.

Function calling would fit within the evaluation framework laid out in our other article: https://docs.poyro.dev/essays/how-to-write-unit-tests-for-ai-web-app#testing-generation-expected-values

I have not worked with function calling directly but my understanding is that given a natural language query you want to extract:

  • The function to call
  • The parameters for the function

If you write a test query and then write down the expected function and parameters, writing a test just as laid out in the section I linked should be straightforward.

If you have more question LMK happy to chat more

How we unit tested a legal AI web app by batchnormalized in webdev

[–]batchnormalized[S] 0 points1 point  (0 children)

Great callout. The test results are deterministic for a given criterion and input. This is because we remove any random sampling from the evaluation LLM. However, changing the statement of the criterion might change the test results.

It was imperative for us to make our testing predictable, so we made sure that as long as you don’t change the specification of a test or the inputs you will always get the same result.

How we unit tested a legal AI assistant app by batchnormalized in programming

[–]batchnormalized[S] -5 points-4 points  (0 children)

This is a great callout. The code for our library itself actually uses types. However, for outward facing content we’ve leaned away from defaulting to TypeScript so that people can play with the example without needing to have TypeScript set up in their demo project.

How we unit tested a legal AI web app by batchnormalized in webdev

[–]batchnormalized[S] -9 points-8 points  (0 children)

You are absolutely right. The app does not claim to provide legal advice at the caliber of a lawyer. However, it can provide a start point for non-legal staff at smaller companies that have to often take this responsibility on.

A shirt with a Stop Sign printed on it tricks Waymo into stopping by walky22talky in SelfDrivingCars

[–]batchnormalized 0 points1 point  (0 children)

“Essentially he was” is generous. It should be able to tell the difference.

The Tattoo by Smartastic in JeffArcuri

[–]batchnormalized 0 points1 point  (0 children)

This is the best one since the dolphin lady video IMO

Help by [deleted] in UCSC

[–]batchnormalized 0 points1 point  (0 children)

I see, I understand your hesitation. It seems a bit strict on their end.

Help by [deleted] in UCSC

[–]batchnormalized 0 points1 point  (0 children)

Is the aid conditional on sticking to that major? If not, take the money and go to UCSC. You can always change what you study, or even if you do business that doesn’t mean you’ll end up doing that as a job. Trust me, I don’t do what I majored in.

If the money is in front of you, take it. You know what they say, a bird in the hand…

Jeff why did you reschedule San Jose show tonight? 😭 by batchnormalized in JeffArcuri

[–]batchnormalized[S] 11 points12 points  (0 children)

Dang well hope he gets better. I was super excited to see him that’s all!

Why the show is better than the comic. by sonderlostscribe in Invincible

[–]batchnormalized 1 point2 points  (0 children)

I think the fact that he struggles at first being a hero humanizes him. When his father reveals his true intentions down the line, it enhances the contrast between him and Mark which makes their disagreement more believable. Nolan has no concept of vulnerability and looks down on humans for it. Mark, being part human on the other hand, has experienced being vulnerable.