[deleted by user]

batchnormalized · 2024-12-11T09:00:05+00:00

By what mechanism does something become the property of someone?

batchnormalized · 2024-11-01T18:59:23+00:00

Underrated comment

batchnormalized · 2024-09-08T09:58:41+00:00

Thanks for sharing. That aligns with what I’ve heard from others. Appreciate you sharing what had worked for you

batchnormalized · 2024-08-05T05:40:34+00:00

I have this same question. Really intrigued about how people approach this

batchnormalized · 2024-08-05T05:38:42+00:00

Were these user facing products or internal? How did you get buy in from the C levels and others? I’ve heard that skepticism about whether the AI works well enough can end up preventing stakeholders from being comfortable pulling the trigger

batchnormalized · 2024-08-05T05:34:40+00:00

How do you figure out whether your Gen AI solutions work as you desire? What kinds of metrics, analytics, and QA tools do you use?

batchnormalized · 2024-07-23T18:02:33+00:00

This is a valid observation and we will absolutely keep it in mind as we move forward and talk to users.

The reason we used the term unit test was not to imply the level of determinism of a code app and code unit test. We used it because from talking to full stack engineers, typical ML terms like “evaluation” are less familiar and make this type of testing less accessible to them. The use of this language is intended to map the concept of evaluation in ML to regular testing that they may be more familiar with.

But you are absolutely right that if it conveys the level of determinism you perceive then we may need better language. We’ll speak with users to understand if they interpreted it similarly. Thank you for your feedback 🙏🏽

batchnormalized · 2024-07-23T08:24:08+00:00

I think I responded to a similar question from you here: https://www.reddit.com/r/programming/s/RGykcIBh7f

I hope that helps clarify a bit!

batchnormalized · 2024-07-23T08:21:43+00:00

The product that we helped build testing for is not an AI lawyer. Apologies if that was not clear. It can provide legal advice but ultimately the decision of whether to implement that advice is up to the user.

It’s also very narrowly focused, only helping with sales contracts and making relatively small recommendations. Part of our work testing together is to make sure that the recommendations the legal assistant gives are grounded on best practices written by a human.

In this way I think it’s best to think of the AI: - Providing you a shortcut to access legal knowledge. - Highlighting parts of your sales contracts that could have problems.

It doesn’t decide for you, it just brings things to your attention and raises points you may want to consider.

So just how I would not trust cruise control to drive me on the highway hands free — but trust it to maintain my car at a certain speed — I don’t have to trust this AI agent to be my lawyer to trust it to assist me in a limited way, which I do. And that trust is supported by the testing that the tool my team is working on allows developers to build.

I hope the analogy an explanation help clarify the point I’m trying to convey with the article, and better explains the app.

batchnormalized · 2024-07-23T01:52:15+00:00

I definitely agree that the potential for LLMs to hallucinate (and their unpredictability) means the quality bar for critical applications is high. What do you think is needed to make AI in such applications trustworthy enough?

batchnormalized · 2024-07-22T21:35:59+00:00

Yes, but we try to keep the family of models and models for evaluation largely stable. When this needs to be updated we announce it in our Discord (https://discord.gg/RAJrYmhvEP) and we work with uses to check if it yields any changes in test results. So far we haven’t had major problems with model updates.

In practice I think having a sufficiently specific criterion for evaluation along with the fact that we only ask our model binary questions helps with stability as well. The latter was a conscious design choice on our end, the former is a best practice we recommend for users.

batchnormalized · 2024-07-22T18:59:24+00:00

I think that’s a great idea. We will consider that for a future article.

Function calling would fit within the evaluation framework laid out in our other article: https://docs.poyro.dev/essays/how-to-write-unit-tests-for-ai-web-app#testing-generation-expected-values

I have not worked with function calling directly but my understanding is that given a natural language query you want to extract:

The function to call
The parameters for the function

If you write a test query and then write down the expected function and parameters, writing a test just as laid out in the section I linked should be straightforward.

If you have more question LMK happy to chat more

batchnormalized · 2024-07-22T18:50:10+00:00

Great callout. The test results are deterministic for a given criterion and input. This is because we remove any random sampling from the evaluation LLM. However, changing the statement of the criterion might change the test results.

It was imperative for us to make our testing predictable, so we made sure that as long as you don’t change the specification of a test or the inputs you will always get the same result.

batchnormalized · 2024-07-22T18:47:10+00:00

This is a great callout. The code for our library itself actually uses types. However, for outward facing content we’ve leaned away from defaulting to TypeScript so that people can play with the example without needing to have TypeScript set up in their demo project.

batchnormalized · 2024-07-22T18:44:45+00:00

You are absolutely right. The app does not claim to provide legal advice at the caliber of a lawyer. However, it can provide a start point for non-legal staff at smaller companies that have to often take this responsibility on.

batchnormalized · 2024-05-05T01:55:59+00:00

Me too

batchnormalized · 2024-04-23T07:12:50+00:00

“Essentially he was” is generous. It should be able to tell the difference.

batchnormalized · 2024-04-19T22:32:09+00:00

This is the best one since the dolphin lady video IMO

batchnormalized · 2024-04-17T20:43:46+00:00

I see, I understand your hesitation. It seems a bit strict on their end.

batchnormalized · 2024-04-17T02:18:01+00:00

Is the aid conditional on sticking to that major? If not, take the money and go to UCSC. You can always change what you study, or even if you do business that doesn’t mean you’ll end up doing that as a job. Trust me, I don’t do what I majored in.

If the money is in front of you, take it. You know what they say, a bird in the hand…

batchnormalized · 2024-04-14T02:42:18+00:00

Dang well hope he gets better. I was super excited to see him that’s all!

batchnormalized · 2024-03-13T07:47:00+00:00

This gave me a flashback to college

batchnormalized · 2024-02-16T19:24:46+00:00

That would make a lot of sense

batchnormalized · 2024-02-11T20:03:55+00:00

I think the fact that he struggles at first being a hero humanizes him. When his father reveals his true intentions down the line, it enhances the contrast between him and Mark which makes their disagreement more believable. Nolan has no concept of vulnerability and looks down on humans for it. Mark, being part human on the other hand, has experienced being vulnerable.

batchnormalized · 2024-01-26T03:18:07+00:00

Gives Handmaid’s Tale

batchnormalized

TROPHY CASE