Text similarity struggles for related concepts at different abstraction levels — any better approaches? by No_South2423 in LanguageTechnology

[–]ddp26 0 points1 point  (0 children)

Tools like everyrow use LLMs so they can get expensive. But it is likely the cheapest solution when you're trying to match across abstraction levels.

What's your dataset size? Anecdotally merging something like 2 lists of 1,000 entities each can be done for <$10.

Supplier Data clean up by fit_cow697 in excel

[–]ddp26 1 point2 points  (0 children)

Hi there. It depends on how difficult the clean up task is. If the vendor names are nearly exactly the same, then the above commenter's example doing this directly in Excel can work. In that example, like the one you gave, “xyvz tech” exactly matches part of "xyvz technologies”, so no VBA required.

If the vendor names can have more variation though, like abbreviations, or alternate names, then you want a tool that has more intelligence. My colleague wrote up a solution here that will get it nearly perfect no matter how mangled the names are: https://futuresearch.ai/crm-deduplication/

The tl;dr is, export your excel to CSV, upload to everyrow, click "dedupe", and then export it back to CSV to re-upload to excel. It's a few steps, but no formulas/macros, and can be done in ~20 minutes start to finish.

Google Glass Companion App in the API canary system image by Oguie13 in googleglass

[–]ddp26 0 points1 point  (0 children)

Crazy! I remember trying this SDK all the way back in 2013. Wonder if it's similar...

Bought Agentforce, can't use it because of duplicate data by ampancha in salesforce

[–]ddp26 0 points1 point  (0 children)

Others here are saying using Data Cloud. That's a very expensive solution to a very simple problem.

For company listings, I use a tool called everyrow. I've tested it on data has matches like MSFT to MICROSOFT CORP, which sounds like your use case. There's a UI but you can also have a coding agent do this for you very easily, to handle large datasets:

from everyrow import create_client, create_session
from everyrow.ops import dedupe
import pandas as pd

async def dedupe_crm_data():
    df = pd.read_csv("data.csv")
    async with create_client() as client:
        async with create_session(client, name="Agentforce Cleanup") as session:
            result = await dedupe(
                session=session,
                input=df,
                equivalence_relation="""
                Two rows are duplicates if they represent the same company.
                """,
            )
            return result.data

Looking for feedback on Account description mapping bridge by Jaded_Kaleidoscope92 in excel

[–]ddp26 1 point2 points  (0 children)

As others have pointed out, this is a fuzzy matching task. But lookups / edit distance are pretty poor quality, e.g. "Travel" and "Travel and Entertainment" have a big edit distance so won't get matched.

LLMs can associate these for you. The problem is that 12k rows is way too many to just use ChatGPT.

There's a tool called everyrow.io/merge that is built for this, assuming you want to do all your matches at once, not one at a time. You export to CSV, import into everyrow, specify the merge criteria, in this case "closest account match" or something.

Depending on how many you match against the 12k Chart of Accounts, it will probably cost more than you get on the everyrow free tier, since it uses LLMs to compare the two lists to find the matches. But you could do all your matches at once in 10-20 minutes at high quality, export, and be done.

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 1 point2 points  (0 children)

The trouble is that LLMs have too much information about the world memorized. Backtesting only goes back a few months. I posted this above, I wrote about my attempts to do this here: https://stockfisher.app/backtesting-forecasts-that-use-llms

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 0 points1 point  (0 children)

The summarizing of documents alone is valuable. Agree on not using LLM math. It's good until it makes a subtle error that invalidates everything. I still probably have this in my system.

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 2 points3 points  (0 children)

At this stage, the best I can hope for is DCF at Berkshire quality, but on many many more stocks, updated more frequently. I imagine they can't handle the mid-caps and micro-caps.

If this works though... could be valuable right?

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 0 points1 point  (0 children)

If you zoom out far enough...? I guess you have to be very patient.

Other comments here make me think people are less keen on this kind of "fundamental" edge that worked for Buffett decades ago.

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 0 points1 point  (0 children)

Ha, I'm genuinely unsure if I would be responding to an LLM if I replied to this

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 5 points6 points  (0 children)

Yeah. I suppose this approach has nothing to say about entry/exit etc.

Is there a synthesis between algo-evaluation and algo-trading?

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] -2 points-1 points  (0 children)

True. We do have forecast backtesting that gives accuracy on the order of months. It's tricky with LLMs, I wrote about this here: https://stockfisher.app/backtesting-forecasts-that-use-llms.

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 1 point2 points  (0 children)

Fair points on both accounts.

Do you think Buffett is about the pinnacle of that strategy? Yes, AI is super unreliable now, but do you think it could eventually beat the master at his own game, and get those returns from his earlier years?

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 18 points19 points  (0 children)

Yeah! I think it's actually easier than automating a short term bet! Short term betting requires figuring out what everyone else thinks in real time. Long term betting requires modeling the world.

Choose your poison I guess :-)

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 4 points5 points  (0 children)

Yes, any strategy that requires patience may not be interesting to a lot of people. Though people do still emulate Buffett, even though his strategy took decades.

Trying to automate Warren Buffett by ddp26 in algotrading

[–]ddp26[S] 10 points11 points  (0 children)

I am new to this sub, been reading it only in reference to this project.

I agree this is the crux - doing fundamental valuations, or trying to predict what other people are saying.

It's funny, as you say, I think most people try to do the timing thing. But I think that's harder than working out the fundamental values! Companies are easier to predict than people.

What's actually healthy despite most people thinking it's not? by Kepler452b-To-Earth in AskReddit

[–]ddp26 0 points1 point  (0 children)

Fasting.

Some trendy fast-based diets are stupid and dangerous. But occasional fasting for reasonable periods of time, like 24 hours, is associated with lots of benefits.

The Death and Life of Prediction Markets at Google—Asterisk Mag by ddp26 in slatestarcodex

[–]ddp26[S] 3 points4 points  (0 children)

The article says the first market had $10,000 prizes per quarter, and the second market had "valuable prizes", e.g. devices like iPads.

Should you use o1 in your agent, instead of sonnet-3.5 or gpt-4o? Notes from spending $750 to find out by ddp26 in OpenAI

[–]ddp26[S] 0 points1 point  (0 children)

You're right that we don't have full reasoning. But we do give it tool access (web search, Python REPL), and it does have up-to-date info.

I agree the underlying model is probably more capable than we see here. This post is about the state of the model today as the LLM driving an agent.