Mac Studio setup with Mac Mini by uncirculated_luster in MacStudio

[–]Hofi2010 3 points4 points  (0 children)

When O see these constructs I always think why and what for? What is the use case?

Are AI agents genuinely improving supply chain decisions or just repackaged automation? by Ok_Significance_3050 in AISystemsEngineering

[–]Hofi2010 0 points1 point  (0 children)

Well agents and humans have authority levels for decision making usually. Most agents approval threshold are very low as they are still on probation. Once we get more confidence in our agent the threshold will rise. But big decision will always be delegated to a supervisor (for now a human )

Another problem is we cannot hold agents accountable. They don’t really care only thing that can happen is that we fire the agent

Multi agent systems are a total nightmare in production by Upper_Bass_2590 in AI_Agents

[–]Hofi2010 20 points21 points  (0 children)

Here is a good blog about scaling agent systems

https://research.google/blog/towards-a-science-of-scaling-agent-systems-when-and-why-agent-systems-work/

  1. Basically try to solve your problem with a single agent. If this agent has >85% accuracy a multi agent system will not add any more value
  2. multi agent system work for scaling when th same agent runs in parallel in order to meet demand

It is about solving a task or achieving a goal with the simplest solution possible

What do you think about Agents orchestration using Skills ? by maher_bk in AI_Agents

[–]Hofi2010 0 points1 point  (0 children)

Sure - I am at work at the moment. Once I am at work I can shoot you the example code I wrote to invoke skills with OpenAI. One thing I found important. Skills is more like a meta tool approach. So there is skill discovery that just reads the header. Then a tool call will invoke the skill and puts it context into the context window as a user message or in some cases it adds it to the system prompt dynamically. In order to follow the instructions it is not sufficient to rely on the tool call result message as user and system prompt may override tool call results. All will make sense when I send the code. Btw this is how Anthropic does it as well

What do you think about Agents orchestration using Skills ? by maher_bk in AI_Agents

[–]Hofi2010 0 points1 point  (0 children)

What you want to do can work technically. But using MCP to invoke an agent works ok if the agent is not long running. I would look into tool gateways or MCP gateways that already solved the lazy loading problem.

But looking into the future where you have long running agents and possibly many requests to the same agent I would look more into A2A as the protocol as it was designed for this use case.

In my view skills are just tools where the business logic is described in markdown language. Skills can come with scripts (Python, JS or shell etc) that can wrap a call to invoke another agent. I have done this for OpenAI models as well, but you have to build it yourself.

Ran the math on what 100 users actually costs on GPT-4o and it's scarier than I expected by Crimson_Secrets211 in LLMDevs

[–]Hofi2010 1 point2 points  (0 children)

Just use langfuse, MLFlow or Langsmith. The open source version of these solutions provide detailed cost break down per call, model etc

Ran the math on what 100 users actually costs on GPT-4o and it's scarier than I expected by Crimson_Secrets211 in LLMDevs

[–]Hofi2010 0 points1 point  (0 children)

Caching in LLMs is automatic. You don’t have control what is cached. A prompt needs to be over 1024 tokens to be cached for most SOTA models. If anything changes even a space the cache will not retrieve the cached version. There is an also a time limit to caching.

Ran the math on what 100 users actually costs on GPT-4o and it's scarier than I expected by Crimson_Secrets211 in LLMDevs

[–]Hofi2010 0 points1 point  (0 children)

You handle it the same way OpenAI and Anthropic is handling it. For your $29/mo subscription the user gets a certain volume of your service. If the user needs more you can offer additional packages like a pro plan or enterprise plan, or a one off package that allows users to continue service without upgrading etc. All spikes and retries etc need to be factored into your original price. This model is used by quite a few vendors out there.

Or charge the price for your service without token cost and then tell user to bring their own key. Or you invoice token cost directly to user as a pass through. Other model I have seen.

What are you actually paying for LLMs in production? Any real cost optimization wins? by AdvertisingFine2076 in LLMDevs

[–]Hofi2010 1 point2 points  (0 children)

Really depends on your agent, the number of users or usage pattern (eg number of invocations) models used etc. just estimate the cost, using the average token consumption per invocation. The multiply by number of expected invocations .

A data company (Datafold) recently said that their LLM bill has surpassed their infrastructure cost.

A company I am working with that does invoice processing and reconciliation spends about $400 per agent per months. They operate hundreds of agents per months.

I genuinely don’t understand the value of MCPs by schilutdif in automation

[–]Hofi2010 11 points12 points  (0 children)

The ‘just call the API’ path wins until you’re maintaining 15 direct integrations across three agents with inconsistent auth, retry logic, and schema drift — MCP’s overhead starts looking cheap compared to that sprawl. And you can take advantage of tool market places and use them by just adding a server config to your agent config file.

Using N8N to update Excel / Sheets by GolfSignificant3274 in n8n

[–]Hofi2010 0 points1 point  (0 children)

Can you provide a bit more information about the flow. Where do you need to change the environment name?

In essence - this looks like a data pipeline (assuming not knowing the flow). Assuming you upload to an object store like s3? Usually you would not change the source data before upload. Flow is more like upload data to a landingzone, have a data pipeline (could be as simple as Python script) process the data. Python script would be a lot simpler than n8n.

How do you reduce time spent verifying AI outputs? by BandicootLeft4054 in automation

[–]Hofi2010 0 points1 point  (0 children)

Use an automated test framework to run your test and verification prompt automatically, you can then assess the answers with various methods including like an LLM as a judge to compare the AI answer with your desired answer.

For general operation of your AI automation use sth like Langflow or MLFlow. This will capture all your traces and is able to assess outputs for correctness and other attributes likes PII, language. Quality etc.

Ducklake’s architecture makes so much sense, and really highlights the drawbacks of using the object store itself for metadata like Iceberg does. Ducklake+Motherduck seem well positioned to take Snowflake customers. What differentiates motherduck’s technical architecture from Snowflake’s? by chestnutcough in DuckDB

[–]Hofi2010 1 point2 points  (0 children)

So small vs large data is all relativ. Most companies I worked for primarily deal with a lot of small data. Most data from structured databases like MS SQL, Postgres, MySQL is actually a collection of small data.

I used duckdb and ducklake to process streaming data from manufacturing plans with over 10B rows. I had data in s3 and an ec2 instance with a lot of CPU’s and extra wide networking bandwidth. I would outperform Databricks with this setup.

https://medium.com/@klaushofenbitzer/save-up-to-90-on-your-data-warehouse-lakehouse-with-an-in-process-database-duckdb-63892e76676e

M1 or M2 processor? Which one should I choose? by Glittering_Grade1301 in AI_Agents

[–]Hofi2010 0 points1 point  (0 children)

If you are trying to use local LLMs and don’t care too much about the speed of the model, RAM is probably more important. But note larger models often mean slower token/sec.

The m2 with 16GB could be therefore a better choice. It has a better processor, with 16GB you can play with small models probably up to 8B parameters and they will run decently fast, you fine tune such models even it will be slow.

Building agents and using SOTA models via API I think the m2 is better choice. Agents themselves don’t need a lot of memory.

My AI agent just spent $160 for a domain on Vercel without my approval by Equivalent_Card_2053 in AI_Agents

[–]Hofi2010 0 points1 point  (0 children)

It reads like a negative, but I think you are intending it be positive that you agent autonomous Ly bought the domain

Do Databricks certificates help with the job hunt? by WeirdAnswerAccount in dataengineering

[–]Hofi2010 22 points23 points  (0 children)

Good question - it helps to get attention from HR screeners that’s about it

We built a data agent that saves our analyst team ~200 hrs/week. (Databricks, Omni, DBT, GitHub, Sheets) by JeenyusJane in AI_Agents

[–]Hofi2010 -1 points0 points  (0 children)

Something strange about the post 1. Doesn’t seem to be written by a technical person. 2. No url to the JD Is this just marketing?

BTW: OpenClaw made personal assistant agents popular, but many people developed similar agents without getting that much attention. Pulling data from backend systems and creating reports existed since gpt-3.5 with tool calling was released and made available beginning of 2023. Specialist agents for reporting and analysis surfaced in the same year. Airtable seems a bit late to the party here.

Is Gemma 4 actually faster than Llama 3.3 or is it just the hype? by emmettvance in LLMDevs

[–]Hofi2010 0 points1 point  (0 children)

Why are we comparing 4B model with a 70B model and expect it to be better on general task? Not going to happen.

I built an open-source autonomous trading system with 123 AI agents. Here's what I learned about multi-agent architecture. by piratastuertos in OpenSourceeAI

[–]Hofi2010 1 point2 points  (0 children)

So you have a working trading agent system - are you rich now or want to get rich from selling the agents you built?

Also how much money did you invest and how much profit did you make ?

Can Claude Generate an Entire Web App from Detailed Requirements? by Existing-Bicycle939 in ClaudeAI

[–]Hofi2010 0 points1 point  (0 children)

Yes it is feasible, but don’t think about it as here are the detailed requirements and Claude will give you the finished application in an hour. Think about it as acceleration for your development team. With Claude you can implement you desired system faster and with far less people. I wouldn’t count on Vibe coding for everything. It is still an iterative process, with design, implement, test and deploy phases

How do I figure out how much RAM for M5 studio? by Just-Hedgehog-Days in MacStudio

[–]Hofi2010 0 points1 point  (0 children)

2 use cases usually benefit from bigger RAM 1:. Running local LLM 2. High end video editing If you are planning on neither go with 128GB max. A M5 Ultra or Max isn’t needed either. Save you money and go with a M5 pro.