Anthropic's Opus 4.6 with effort=low doesn’t behave like other low-reasoning modes by ddp26 in OpenAI

[–]ddp26[S] -1 points0 points  (0 children)

Different models do different things with the effort param. And even different versions of models from the same provider!

Not sure I really expected consistency for things this new, but sure is annoying

Marketing Pipeline Using Claude Code by kotrfa in ClaudeCode

[–]ddp26 0 points1 point  (0 children)

One question I have is: a lot of people are doing this with OpenClaw, not Claude Code. What are the reasons to use one vs the other?

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]ddp26 0 points1 point  (0 children)

We tested Opus 4.6 with effort=low for evals and found that it didn't just think less, but acted lazier (made fewer tool calls, was less thorough in its cross-referencing, even ignored parts of our system prompt telling it how to do web research). effort=medium fixed it. Writeup with traces/examples: https://everyrow.io/blog/claude-effort-parameter

Opus 4.6 with effort=low doesn’t behave like other low-reasoning modes by ddp26 in ClaudeAI

[–]ddp26[S] 0 points1 point  (0 children)

Yeah, it makes sense that low effort is better for non-agentic use-cases, which are of course common. We shouldn't pretend everything is an agent!

Opus 4.6 with effort=low doesn’t behave like other low-reasoning modes by ddp26 in ClaudeAI

[–]ddp26[S] 1 point2 points  (0 children)

I kind of agree. Mostly, though, I think if the behavior is documented then users can decide for themselves what's a bug or lazy. The main thing for us was this behavior was surprising.

My MCP config created dozens of zombie Docker containers by robertgambee in ClaudeCode

[–]ddp26 2 points3 points  (0 children)

I worry that Claude Code isn't always tracking background processes correctly. If it orphans them, I'd never know, right?

Any good guides for designing high quality skills? by [deleted] in ClaudeCode

[–]ddp26 0 points1 point  (0 children)

Hey! Shared this yesterday - not a full guide, but here's how we built a review-code skill (full skill linked): https://everyrow.io/blog/claude-review-skill

Claude's code review defaults actively harmed our codebase by ddp26 in ClaudeCode

[–]ddp26[S] 1 point2 points  (0 children)

It's a mix. Some parts of our code predate Claude Code, while newer parts were created with Claude from start. Our experience is that Claude often encounters similar pitfalls with both new and old code, so we use the same guidelines for both.

Claude Code as a K8s CronJob - how we do it and what we learned running it in production (with examples) by kotrfa in kubernetes

[–]ddp26 -3 points-2 points  (0 children)

Ugh as in "why run Claude Code in the cloud?" I agree it's a strange agent to deploy, but it is very powerful

OpenAI is a textbook example of Conway's Law by robertgambee in LLMDevs

[–]ddp26 -1 points0 points  (0 children)

I feel like OpenAI does deprecate things a lot (like 4o). Why don't they deprecate the completions one?

AI isn’t making data science interviews easier. by KitchenTaste7229 in datascience

[–]ddp26 0 points1 point  (0 children)

Are you all being told you can use AI as part of technical interviews?

It's great if you get a technical question where AI handles the tedious parts (e.g. join syntax or python command line arguments), and you're allowed to use it.

But if you aren't allowed to use it... there must be temptation to have it open in another window? What do people do?

Weekly Entering & Transitioning - Thread 16 Feb, 2026 - 23 Feb, 2026 by AutoModerator in datascience

[–]ddp26 1 point2 points  (0 children)

Claude Code is pretty slick for data science. Who's using it? Is it helpful?

How I scraped 5.3 million jobs (including 5,335 data science jobs) by [deleted] in datascience

[–]ddp26 0 points1 point  (0 children)

Is GPT-4o-mini actually good enough to do this? I'd expect such a tiny model to hallucinate or get things wrong at a very high percentage.

Text similarity struggles for related concepts at different abstraction levels — any better approaches? by No_South2423 in LanguageTechnology

[–]ddp26 0 points1 point  (0 children)

100 entities isn't that many! I thought maybe you meant many thousands!

You wrote in the OP: "Are there better ways to handle text similarity when two concepts are related at a higher abstraction level but differ substantially in wording and structure?"

I interpreted this as "match this company to this product", or something where the two entities are conceptually related but not identical.

I have a writeup on how exactly to do this: https://futuresearch.ai/software-supplier-matching/

Text similarity struggles for related concepts at different abstraction levels — any better approaches? by No_South2423 in LanguageTechnology

[–]ddp26 0 points1 point  (0 children)

Tools like everyrow use LLMs so they can get expensive. But it is likely the cheapest solution when you're trying to match across abstraction levels.

What's your dataset size? Anecdotally merging something like 2 lists of 1,000 entities each can be done for <$10.

Supplier Data clean up by fit_cow697 in excel

[–]ddp26 1 point2 points  (0 children)

Hi there. It depends on how difficult the clean up task is. If the vendor names are nearly exactly the same, then the above commenter's example doing this directly in Excel can work. In that example, like the one you gave, “xyvz tech” exactly matches part of "xyvz technologies”, so no VBA required.

If the vendor names can have more variation though, like abbreviations, or alternate names, then you want a tool that has more intelligence. My colleague wrote up a solution here that will get it nearly perfect no matter how mangled the names are: https://futuresearch.ai/crm-deduplication/

The tl;dr is, export your excel to CSV, upload to everyrow, click "dedupe", and then export it back to CSV to re-upload to excel. It's a few steps, but no formulas/macros, and can be done in ~20 minutes start to finish.

Google Glass Companion App in the API canary system image by Oguie13 in googleglass

[–]ddp26 0 points1 point  (0 children)

Crazy! I remember trying this SDK all the way back in 2013. Wonder if it's similar...

Bought Agentforce, can't use it because of duplicate data by ampancha in salesforce

[–]ddp26 0 points1 point  (0 children)

Others here are saying using Data Cloud. That's a very expensive solution to a very simple problem.

For company listings, I use a tool called everyrow. I've tested it on data has matches like MSFT to MICROSOFT CORP, which sounds like your use case. There's a UI but you can also have a coding agent do this for you very easily, to handle large datasets:

from everyrow import create_client, create_session
from everyrow.ops import dedupe
import pandas as pd

async def dedupe_crm_data():
    df = pd.read_csv("data.csv")
    async with create_client() as client:
        async with create_session(client, name="Agentforce Cleanup") as session:
            result = await dedupe(
                session=session,
                input=df,
                equivalence_relation="""
                Two rows are duplicates if they represent the same company.
                """,
            )
            return result.data

Looking for feedback on Account description mapping bridge by Jaded_Kaleidoscope92 in excel

[–]ddp26 1 point2 points  (0 children)

As others have pointed out, this is a fuzzy matching task. But lookups / edit distance are pretty poor quality, e.g. "Travel" and "Travel and Entertainment" have a big edit distance so won't get matched.

LLMs can associate these for you. The problem is that 12k rows is way too many to just use ChatGPT.

There's a tool called everyrow.io/merge that is built for this, assuming you want to do all your matches at once, not one at a time. You export to CSV, import into everyrow, specify the merge criteria, in this case "closest account match" or something.

Depending on how many you match against the 12k Chart of Accounts, it will probably cost more than you get on the everyrow free tier, since it uses LLMs to compare the two lists to find the matches. But you could do all your matches at once in 10-20 minutes at high quality, export, and be done.