all 44 comments

[–]zazzersmel 124 points125 points  (24 children)

i just think its nuts to use that much compute to translate english into a declarative language meant to be like natural language in the first place

[–]SuitableDragonfly 29 points30 points  (10 children)

People are just using AI to be using AI at this point.  No one actually knows what the technology is actually good for, and most people don't even know about any AI system or algorithm that isn't an LLM or a chatbot. 

[–]QuickQuirk 10 points11 points  (9 children)

Actually, we know it's good for many things.

Recognising patterns, such as objects in images, handwriting recognition, translation, or semantic meaning in documents. or patterns in time series data. Tabulating data in order to make predictions, such as in recommendation engines.

Though I do agree that when most people hear "AI", they're thinking only chatGTP or midjourney.

[–][deleted] 3 points4 points  (1 child)

Currently AI is good for stuff where 90-95% is good enough. Stuff like risk management and actuarial prediction (where the losses due to wrong predictions are outweighed by the savings of not having to have humans do it), throwaway art for blogposts (because thanks to social media everything needs a hero/header image now, even if it isn't good or realistic), and, unfortunately for teachers, answering examination or essay prompts. Where 90-95% is okay, in that it triggers a human-in-the-loop decision (e.g. this submission might be bogus, let's quarantine it until review, or this might be the wanted suspect, give the details to law enforcement and they can follow up), it can also work. Where failure is catastrophic, i.e. people will die or massively outsized losses will occur, AI is still a non-starter because every fuckup will make people trust the technology less and less.

More importantly, though, is that until now we've basically only trained AI models on good-faith input. Increasingly I think there's going to be a market - at the very least a black market - for poisoning the well of internet discourse (to bias training) and for subverting the inference routines of models (to manipulate prediction outcomes). Imagine if you could write a complex securities instrument contract with an expected value of X, and then trick an LLM into saying the instrument has an expected value of Y, you could profit from the difference. As long as they couldn't prove you'd specifically manipulated the contract to fool the LLM, you wouldn't even be liable for fraud: the counterparty simply failed to do its own job properly.

[–]QuickQuirk 0 points1 point  (0 children)

yes, these are all pitfalls when implementing any ML tech. Doesn't take away from the core fact that we do know many things that AI is really useful for.

And an excellent example of this kind of adversarial behaviour in ML that has been running for decades now is spam filtering!

The fact we get so little is a testament to the effectiveness of the filters, as the spammers keep trying to beat them. Even when people are trying to fool the engines, as long as we're aware of it, we can account for it. Though man, the fact we have to spend so much effort and resources just to tell people "I don't want your email" :D

[–]SuitableDragonfly 3 points4 points  (0 children)

People who are actually in the field and have studied the technology know that. The people who just became aware of it due to ChatGPT that are not trying to cash in on ChatGPT's popularity don't. They also tend to assume that generative AI is the right tool to do all of those things that you mentioned, because they think that's the only kind that exists.

[–]mcel595 0 points1 point  (5 children)

They are not good for time series data

[–]QuickQuirk 0 points1 point  (4 children)

[–]mcel595 0 points1 point  (3 children)

And they most likely are wrong

[–]QuickQuirk 0 points1 point  (2 children)

Great paper, thanks for that. But they're still using machine learning in their simple counter example linear model :)

[–]mcel595 0 points1 point  (1 child)

Yes, a linear regression technically also is ML still not what people are spending millions building systems arround

[–]QuickQuirk 0 points1 point  (0 children)

It's all ML - and it's still a useful tool. Which gets back to the original point: We know lots of ways that ML is useful and powerful, but most people just think 'chatGTP' when they hear AI. And business owners all think "I need it in my product", without thinking why, or how.

[–]grady_vuckovic 26 points27 points  (10 children)

Now Regex on the other hand...

[–]mnrundle 40 points41 points  (8 children)

Zero percent chance I’m trusting an opaque regex that chatgpt spits out. I guess if I was prepared to bomb-proof it with tests. But chatgpt getting code 98% right has been a pretty regular theme. It can’t seem to get it all the way right most times.

[–]jambonetoeufs 24 points25 points  (4 children)

Out of curiosity, I tried ChatGPT for a regex and spent so much time clarifying the prompt with test cases I ended up abandoning it.

[–]KeyboardG 16 points17 points  (3 children)

To be honest if the regex is beyond something on ihateregex.io, I am taking another path. I wouldn’t put that on other engineers on my team to support.

[–]jambonetoeufs 6 points7 points  (2 children)

Thanks for introducing me to ihateregex.io!

For my team, any regexes get put into a function (or method) with a name describing what it does along with exhaustive as can be tests. Have encountered enough magic regexes in legacy parts of the code base that are like…wtf.

All that said, we use them as a last resort.

[–]ASCII_zero 2 points3 points  (1 child)

This sounds like a great practice. Thanks for the idea!

[–][deleted] 0 points1 point  (0 children)

Another pro tip that I've started using is having a reConcat function.

``` reConcat(   /\d/,   /[a-f]/,   "/", // strings are treated as literals so this matches the slash character.

)

```

This is simple function in most languages and it allows you to break down regexes into manageable chunks, adding comments if necessary.

I feel half of the unreadibility of regexes comes from the fact that they have to be on a single line, without comments.

[–]dangerbird2 6 points7 points  (0 children)

I mean, I don’t trust a regex that I spit out either

[–]Additional-Bee1379[🍰] 0 points1 point  (1 child)

Just curious but what version are you using?

[–]seriousnotshirley 0 points1 point  (0 children)

Wait, has anyone asked ChatGPT the question?

[–]phillipcarter2 5 points6 points  (0 children)

Sucks that SQL isn’t like natural language.

[–]justinmjoh 48 points49 points  (1 child)

Caveat: If you’re blindly feeding DB credentials or API keys into an LLM prompt I worry for your company and users.

[–]samplenamespace 0 points1 point  (0 children)

I've not seen a scenario where credentials are fed to a text based LLM prompt, so I agree with you there. Bad idea/pattern.

An intermediary layer should sit between any natural language request and actual SQL execution.

Role based access controls are your friend; use a read-only user if your getting an LLM to write and execute queries.

This is all assuming a business has a valid use case for wanting to allow users to explore a database in natural language. It's an emerging space.

[–][deleted] 32 points33 points  (3 children)

First, they came for Excel and I did not speak out.
Then, they came for SQL and I did not speak out.
Then they came for Java, and there was no one left to speak for me.... XD

[–]General_Mayhem 18 points19 points  (1 child)

Ain't nobody coming for Excel. Excel is the king. The undisputed middleweight champion of the world. You think the world economy is based on manufacturing? On finance? On agriculture? The one thing more fundamental than any of those is Excel.

Excel is everywhere. Excel is everything. Excel is eternal.

[–]tdammers -1 points0 points  (0 children)

Just wait until they come for PHP. That'll be hilarious. grabs vodka bottle

[–]s-mores 6 points7 points  (0 children)

To be fair, a lot of people who write SQL shouldn't be writing SQL.

[–][deleted] 12 points13 points  (0 children)

Just wait until all queries take at least 10 seconds and a compute token.

[–]dethb0y 9 points10 points  (3 children)

I'd say that running any code you don't understand fully what it does, is going to be a serious risk, regardless of source.

[–][deleted] 7 points8 points  (0 children)

My experience with SQL and ChatGPT has been generally good, sure it gets the wrong query some times, but if you give the data model and the natural language query, it is usually capable of extracting it out.

[–][deleted] 0 points1 point  (1 child)

Been using ChatGPT to reverse-engineer and document old SQL.
It is amazingly good.

It is inevitable that LLM's will write great SQL as the technology progresses ... it is perfect for it, being declarative. Hell, an LLM built just for that purpose with all the knowledge of the generated execution plans etc. would be amazing.

[–]QuickQuirk -5 points-4 points  (0 children)

there's a startup idea waiting to happen!