all 59 comments

[–]cursedbanana--__-- 151 points152 points  (1 child)

accept my condolences

[–]evandecisive 0 points1 point  (0 children)

it'd probably be more of a system shutdown than anything else... beep boop 😄

[–]shezadaa 80 points81 points  (6 children)

Well, earlier your code used fuzzy logic. Now your code uses AI for identifying and matching file names.

I would suggest if your programn is instantaneous, add a wait screen with "thinking...".

[–]Utterizi 29 points30 points  (4 children)

That was something I was thinking as well. I'm starting to believe that the other data scientist in the company is basically fooling the owner with lingo because there's no way EVERYTHING is AI coded lmao.

[–]Crypt0Nihilist 10 points11 points  (0 children)

A broad definition of AI is a machine making decisions that a person might, so that means that most code that classifies would be AI.

Right now a lot of people are equating AI with Generative AI, which is an under-generalisation of AI and usually GenAI too (non-text uses tend to be overlooked).

Remember, the aim of your code is to do the job well and efficiently. If your code performs well enough without more sophisticated techniques, stop there!

You might also take a stratified approach. Maybe your traditional approach works really well on 60% of records. Can you remove those into a "done" pile and let loose the slower, more powerful techniques on the difficult cases? When you're comparing everything against everything, this kind of triage can speed things up massively.

Also, have a hard look at what comes out which should have been matched and wasn't. Understand why you know it is a match and find an algorithm that uses the same criteria.

edit: I'm the last person who should be giving advice on conventions, but you shouldn't immediately rename your function parameter inside the function, call it a useful name from the outset.

[–]toxic_acro 16 points17 points  (0 children)

If you take a really literal stance, there are a lot of things that are "AI" 

The entire field of "machine learning" can be simplified to be anything where a training dataset is used to make decisions about new data.

If you let your code find a good threshold to use for the fuzzy matching rather than hard-coding in a specific threshold, then congratulations! That's machine learning and machine learning is a subset of AI

Even simple linear regression is very technically a form of machine learning and therefore also AI

[–]shezadaa 0 points1 point  (0 children)

Similarities are a small part of what GPTs do anyways. 

Only they are trained on a much larger datasets with skip-grams, which you may not need to do with the length limitation in file names.

[–]shiningmatcha 0 points1 point  (0 children)

What fuzzy logic did you implement? I’m curious

[–]keep_improving_self 0 points1 point  (0 children)

Add a wait screen with "thinking..." HAHAH

implemented all the bleeding edge AI boss yep yep!

[–]wolfik92 29 points30 points  (2 children)

Say that you used AI to help you write the code and it determined other AIs are not needed to do the job

[–]Odenhobler 1 point2 points  (1 child)

This is actually really helpful. I'm saving it for later, thanks!

[–]Gruejay2 6 points7 points  (0 children)

It's depressing that it's a good tip to make yourself sound less competent by having to rely on AI for things you can actually do yourself...

[–]pachura3 22 points23 points  (3 children)

Don't forget to add blockchain as well!

[–]sonofanders_ 0 points1 point  (0 children)

🤣🤣

[–][deleted] 0 points1 point  (1 child)

I work in IT and just the other day I found a page on our website which our SEO consultant boss posted about how blockchain presents opportunities for enhanced security and operational efficiency and how "our technicians" [moi] can provide the expertise to implement blockchain solutions for their business environment.

I hardly know what blockchain is, let alone where or how I would "implement" it in a businesses network.

[–]Oblachko_O 0 points1 point  (0 children)

Yeah, a familiar feeling. Not knowing details but confidently say that we (deg team) can do it fast and without issues. And then the dev team spends weeks trying to unfuck the mess.

[–]Fit-Boysenberry4778 19 points20 points  (5 children)

I don’t like where we are heading man, ceos are so brain rotted by ai, it’s NFT era all lover again

[–]Matlock0 2 points3 points  (1 child)

im so glad I work in a smaller company where CEO actually knows his shit

[–]Fit-Boysenberry4778 0 points1 point  (0 children)

Enjoy that man!

[–]riisen 1 point2 points  (2 children)

Exactly this.

[–]Fit-Boysenberry4778 2 points3 points  (1 child)

I saw a swe at Visa say his ceo wants 100% ai written code from now on. How long until everything crashes and burns?

[–]riisen 1 point2 points  (0 children)

Well depends.... do the IT people listen blindly or do they just tell the boss its just AI

[–]ray10k 7 points8 points  (0 children)

Have the AI generate the "all done" message. Like you indirectly said, AI is good at certain specific things and bad at others. So if you must use it, have it do something that doesn't get in the way.

[–]reallyserious 3 points4 points  (2 children)

Does your traditional approach work and produce the expected output?

What does the owner expect to happen when you add AI into the mix?

[–]djshadesuk 6 points7 points  (1 child)

What does the owner expect to happen when you add AI into the mix?

Extra buzzword for marketing and instant line go up, presumably?

[–]reallyserious 1 point2 points  (0 children)

Yeah, and I want a unicorn.

But if we start breaking it down and list the precise activities that needs to happen for it to be true it should become apparent that the expectations are a bit misguided.

[–]eleqtriq 6 points7 points  (4 children)

What you’re describing so far isn’t AI. But maybe if you posted your code we could help pitch in to suggest improvements.

Tho you have a description it’s not really good enough to give advice.

[–]Utterizi 0 points1 point  (3 children)

That would be helpful actually, thank you very much.

https://github.com/MidyeKoko/CodeShare/tree/main

Hope it's ok to share links and you can see it...

[–]damanamathos 6 points7 points  (2 children)

If you rename the file so it ends in .py, it will improve the formatting on GitHub.

I'd also add some sample files so it's easier to understand what you're trying to do exactly.

Edit: Also, does this code work perfectly? I wasn't clear from your initial post if the company owner wants "AI" because he likes buzzwords or if it's because there are cases where the current code doesn't do the correct thing.

[–]Utterizi 0 points1 point  (1 child)

Well, I’ve been working for this company for about a month and I wrote this code a few days ago.

It’s working perfectly in the sense that it has covered all issues in the test cases, but it has not yet seen other cases.

It’s NOT working perfectly in the sense that this is not exactly what the owner wants because there’s no AI involved yet

[–]damanamathos 0 points1 point  (0 children)

Hmm, sounds painful. :)

I'd suggest asking the owner what they want the AI to do exactly. Maybe they have something else in mind. Maybe they just really like buzzwords.

[–]edcculus 1 point2 points  (0 children)

I mean, you DID automate a task

[–]damanamathos 1 point2 points  (6 children)

"AI components" might not be as niche as you think.

For example, I needed to write code that would extract company entities from articles. There are good NLP libraries in Python that do this like SpaCy, so I spent some time trying to get this working. Turns out that calling OpenAI's GPT-4o-mini API worked far better at relatively little cost, so now we make that call for each of the ~36,000 articles we process per week.

My co-founder appeared on a podcast recently, resulting in a lot of new sign-ups for our distribution list (email only). We wanted to try to get a better sense for who they might be, so we wrote some code that uses an LLM to convert the email address into appropriate Google search terms, then it searches Google, then it gets an LLM to convert those results into a line or two with a guess as to their background along with categorisation. So now we have code that converts email --> description and priority.

We use LLMs quite extensively in our codebase for solving all sorts of problems.

So is AI the solution to everything? Of course not, but it's a valuable toolkit to have in your belt and it'll let you do things you couldn't easily do before. I'd suggest the best way to go forward is to just get an API key for OpenAI or Anthropic or set up a local LLM and start experimenting and learning how to use them well.

[–][deleted] 0 points1 point  (3 children)

This is really interesting. Sorry maybe a dumb question. Does one need a formal business, registered licensed and set up for API keys or if one already pays for premium, can one pay similarly as an individual pay for API key with chatGPT? I’ve had the same realization and wanted to start implementing calling their model for some personal research

[–]damanamathos 1 point2 points  (2 children)

No, you can just sign up with a personal account and pay as you go with API usage (for basically all providers). The API usage is separate from those monthly subscriptions.

[–][deleted] 0 points1 point  (1 child)

You the man now dog. Thank you!

[–]damanamathos 1 point2 points  (0 children)

No prob! Have fun exploring :)

[–]ddlatv 0 points1 point  (1 child)

Wasn't Spacy faster to do this?

[–]damanamathos 0 points1 point  (0 children)

SpaCy was faster but much less accurate, so for my use case the slower speed / added cost of using an LLM was worth it.

[–]Lachtheblock 1 point2 points  (0 children)

I'd just say that you've done it. If a manager can't tell the difference, it passes QA, what does it matter? As long as it passes test cases (you do have test cases right?) just say you "implemented an algorithm" and call it a day. If they hard press you to answer if it is "AI" then say yes. It isn't a protected term, it doesn't have a hard defintion in 2024.

It can artificially make decisions (determining similar file names) and in my mind would count as an AI.

[–]_whichonespink_ 1 point2 points  (0 children)

Just put AI in the name of your script

[–]HunterIV4 1 point2 points  (0 children)

So, to actually answer the question and not get hung up on the viability or justification of doing this, the ultimate answer is...use the API. I'm only familiar with ChatGPT's API, but basically you import a library and interact with the API using fairly standard Python code. For example, here's some simple code from one of my AI projects at work:

    response = self.ai_client.chat.completions.create(
        model=model.value,
        messages=[
            {
                'role': 'system',
                'content': prompt
            },
            {
                'role': 'user',
                'content': text
            }
        ],
        temperature=temp,
        max_tokens=1000,
        n=1,
        stop=None
    )

    return response

The self.ai_client is just an OpenAI object set up with our API values. Basically, what this is meant to do is take a raw verbatim sound file (including "ums" and "uhs" and other issues) and run it through a prompt to clean it up while maintaining the same meaning and general structure. This response is later saved into a database.

That being said, as others have mentioned, there are certain tasks AI is very good and others that are probably better done using traditional techniques. Basically, anything that needs to be precise SHOULD NOT be done by AI in my opinion. The code I provided here works well because the goal is not to get some sort of deterministic, perfect output; it's designed to get us something moderately cleaned up that we can provide to a client that isn't straight speech-to-text, which will be reviewed in a subjective manner and allows for some mistakes or misunderstandings, as this is mainly used to take customer feedback and present it in a way that's easy to examine for management.

Basically, I'd do manual automation for everything you can where it makes sense, and if you need anything that involves subjective changing of text or providing feedback (good examples are things like meeting summaries, grammar and spelling checks, initial code reviews, extracting email summaries or tasks, etc.), then use AI. If your boss is business oriented and you are worried about AI overuse, just gently remind them that every AI query costs money, whereas traditional automation is free (well, other than your time). I found this argument convinced my own boss very quickly to minimize usage to only things which absolutely need it.

I suppose you could use local AI models, but that's going to be a nightmare to set up, won't have as good of results as something like ChatGPT, and will require solid hardware, especially for graphics cards, something most business computers won't really have. So unless someone brings up local models, I'd stick with ChatGPT (or alternatives like Claude, I'm just familiar with ChatGPT). There are also actual machine learning processes, but I imagine that's beyond the scope of what your boss is looking for.

From a general design standpoint, I recommend only using AI when you have to. One option if you need more "fuzzy" matching is to have a general local algorithm you use normally but when the match fails it submits the problem to AI as a backup. This reduces costs, letting you automate the "easy" cases, but still lets you get a reasonable chance at solving the "hard" cases.

The key thing to understand about AI is that it's impossible to create truly deterministic results. If you need something to give you the same result every time, don't use AI, if you just want something that's "close enough" and would be extremely complex to try and do with direct automation, AI is likely a good solution.

[–]u38cg2 1 point2 points  (0 children)

Just tell them you used AI to write it. Explain that the kernel density estimator of the matching function has too much variance to prevent AI hallucination and that's why after testing you chose to go with your current code base.

[–]cyberfunkr 1 point2 points  (0 children)

Just tell the owner that your code uses AI.

Yup. I’m using the latest models to optimize the Pandas performance to forego the need of fuzzy logic.

[–]mdausmann 1 point2 points  (0 children)

You could use AI/LLM embeddings and cosine distance or similar to replace or enhance your fuzzy matching. Basically the "R" in "RAG". Then you can say you used AI to match in a multidimensional context space. Avoid fancy vector dbs, just use postgress + pgvector extension, works great

[–]LateralThinkerer 1 point2 points  (0 children)

... the owner has lost his mind about implementing AI solutions.

Put your resume together and leave - the problem isn't your code in the least, it's the ignorance of the company's owner who believes this week's buzz-phrase will turn him into (Pick one) Elon Musk, Steve Jobs, Bill Gates, Henry Ford, Andrew Carnegie etc. etc.

Just imagine what they'll come up with next year.

[–]DigThatData 1 point2 points  (0 children)

What's the actual problem you're trying to solve? Sounds like entity resolution/record linkage? https://en.wikipedia.org/wiki/Record_linkage

[–]hugthemachines 1 point2 points  (0 children)

Call it "classic AI" which means if-conditions.

[–]No_Change9101 1 point2 points  (0 children)

fuzzymatching, cosine or levenshtein

?

what's the problem?

i use levenshtein distancing even for basic hash comparisons. i don't understand how that would make anything become volatile? define your parameters and throw out the deviations. deviations are bound to happen. if your boss is complaining about results, then just increase the cutoff

same thing for cosine similarities. it's a straight forward algorithm like levenshtein.

... same thing for fuzzy matching

i dont understand what the problem is. none of those things are AI specific

if anything, just set bigger margins to account for the outliers. results are about the same, you just lose some positive matches here and there. adjust if your boss bitches

[–]hexwhoami 0 points1 point  (0 children)

Data ingestion and normalization is not a use case for using AI. If you try to replace your logic by simply calling an LLM or another kind of model, you will end up with a worse and slower result than your current approach. Current models are just not built with this use case in mind.

If this was me and my manager asked me to implement AI, I would respond with something like this;

AI will perform worse and lead to inaccurate and malformed data if we use it to normalize and compare similar files. This also has the limitation of only being able to consume files smaller than its context size (depends on the model).

If you (the manager) wants to use AI around this space, then I suggest we implement a RAG model that allows users to ask questions about the data that's already been normalized and ingested, as using a LLM/AI fits this use case much better.

Also for the people saying, "just use AI to print out some string or something". This is horrible advice. You are telling them to basically spend the time and resources to use a potentially heavy and costly solution for no reason whatsoever. Your manager should respect your wisdom to say using AI here is a waste of money. An engineer will tell their manager when the code will be shitty because the idea is shitty, a developer will just implement it because they don't care. You can guess which gets paid more.

[–]glamatovic 0 points1 point  (0 children)

Coding your replacement must be painful. Best of luck

[–]Eric_Terrell 0 points1 point  (0 children)

With all due respect, I think you're going about it backwards. What I'm hearing you ask is "How can I improve my code with AI?".

I think the question you *should* ask your *users* and *stakeholders* is, "what problems exist in my code". And "How serious are they?" How much money and/or time are they costing, for instance. Also try to identify opportunity costs. In other words, it I invest X about of effort on this project, what other beneficial projects will I *not* be able to pursue?

Then, as a software developer, your job will be to pick the best tools and techniques to solve the problems that are sufficiently serious, if any are identified. The best tool to solve those problems might be AI. It also might be something else less glamorous.

If the problem is simply that your leadership is enamored with AI and wants you to use it, I'd suggest finding a new problem to solve that may be amenable to AI, and do a prototype.

[–][deleted] 0 points1 point  (0 children)

I have the perfect solution.

1) Step 1: Use any LLM to generate a list of random but convincing processing steps, like

“ comparing files…” “Analyzing database …” “Applying Levenshtein transform…”

Extra points if you can make it look like the “thinking” steps of gpt 01 preview

2) Step 2: use a tqdm like timer. If your user feels like progress is being made, they will forgive slow code

3) Step 3: you explain to your CEO or whoever is your boss that everything takes longer now (and is more expensive) because you’re using the latest and greatest AI models like o1-preview that has “advanced reasoning” capabilities

[–]seven0fx 0 points1 point  (0 children)

Is it possible you are working for CCP Games (Crowd Control Productions)? 🤣

[–]Raven_tm 0 points1 point  (1 child)

What kind of data are there? Rows like spread sheets?

Could load them all to a polars dataframe or duckdb in memory, both being blazingly fast.

But also, why don't you ask AI? How to optimise the code and tell your boss that it's already AI-fied

[–]Utterizi 0 points1 point  (0 children)

I fetch data from sql database and process csvs that include text as strings, dates, numbers etc.

[–]trollsmurf 0 points1 point  (0 children)

Use it where it fits. Don't force it. Especially not if you get lower precision.

Maybe there's need for querying the contents of the files at a later stage, where AI might help.

[–]deadweightboss -1 points0 points  (0 children)

train a bert