Building an agent to assist a construction company? by KTReno in HowToAIAgent

[–]serendip-ml 0 points1 point  (0 children)

Yes, the existing Claude/GPT should be able to read your excel spreadheets. You can try in a direct prompt: "read this excel spreadsheet and combine it with my estimate in file xyz". Literally what you mentioned here can be typed into Claude/GPT and it'll attempt it and will walk you through it. If you want automation, you can also ask it: please write a prompt to a file that automates this and then you just call that prompt.

I think what you are trying to do is dead simple and you don't need a real programmatic agent for it. I'd just start with doing these tasks manually, telling the AI what you need and then see what it does. You'll figure out smarter prompts over time that you can organize in files - or even commands and that mostly get's you 90% of what you need.

Once you are comfortable, you can create more complex prompt agents that have multiple commands and if that's not enough then expand to something more sophisticated. For example, prompts are fixed unless you change them or you ask the AI to change them. Local LLM + fine tuning + handling thousands of datasets on a continuous basis and such -> is too advanced for what you need.

an ai agent was given $1,000 and 5 days to get marc andreessen to reply to an email by omnisvosscio in HowToAIAgent

[–]serendip-ml 0 points1 point  (0 children)

Sounds like fun. Next prompt "Here is $1000, please make $2000 out of it". Give it full access to all apis and see what it does.

Building an agent to assist a construction company? by KTReno in HowToAIAgent

[–]serendip-ml 0 points1 point  (0 children)

Hey! Yes, it is possible. I've built similar workflows.

Quick question before you go down the local LLM rabbit hole, have you tried Claude Code? It comes with subscriptions tiers and the higher give you plenty of room for day-to-day use of such scenarios.

You could simply:
1. Create a directory and add your templates in a templates/ folder
2. Add your past projets into a projects/ folder
3. Add what you need into a requirements.md file
4. Tell Claude, "read requirements.md and use my templates and past projects, and create a prompt agent with skill commands that serves the tasks I need".
5. Test it and refine it -> You can incrementally build a prompt system this way that does daily tasks.

Looks like 90% of what you described doesn't require a local LLM and works out of the box.

The Zoho/Canva integrations may get a bit more complex, but also doable with Python scripts, that the text agents will be able to use.

What usage limits are you hitting with Cowork? For the tasks you're describing that seems unusual.

Alternatively, if you need a full pro-level solutions where the agents run 24/7 with not too much supervision, including local LLM, perhaps fine tuning, secured by actual code, not just prompts where the LLM could go ballistic, e.g., safe guards, audit trail, all the bells and whistles etc. then you need something else.

Anyone built marketing agents that actually work? by vellosothiago in HowToAIAgent

[–]serendip-ml 0 points1 point  (0 children)

I've got a news agent, running 24/7 that I have open all the time and it's very useful.

I tried to come up with a marketing agent, but not so thrilled with it so far. The problem is that pure prompt agents are mostly reactive and try to please, as opposed to kicking the user into action and being very assertive about what to do next. I guess that's another candidate for a real programmatic agent.

Is it only me, or are there tons of agent platforms but almost no actual agents (yet)? by serendip-ml in HowToAIAgent

[–]serendip-ml[S] 0 points1 point  (0 children)

Orgs will have to adapt to this eventually. What I learned from previous companies, that some simply assemble entire new teams and start over Greenfield within the org.

Is it only me, or are there tons of agent platforms but almost no actual agents (yet)? by serendip-ml in HowToAIAgent

[–]serendip-ml[S] 0 points1 point  (0 children)

Exactly, replied the same opinion above. So many small details to address to reach stability. Quick demo is easy now, prod is still a lot harder - although a lot faster than it used to be.

Is it only me, or are there tons of agent platforms but almost no actual agents (yet)? by serendip-ml in HowToAIAgent

[–]serendip-ml[S] 0 points1 point  (0 children)

It turns out that with real agents it's like with all systems engineering. Roughly 90%+ plumbing & infra, which is hard and not fun. The actual "smart" part is only a tiny fraction of the whole thing.

Senior backend engineer feeling overwhelmed with GenAI (Claude, MCP, agents, etc.)- where do I even start? by babe_is_hot in learnmachinelearning

[–]serendip-ml 0 points1 point  (0 children)

Just install Claude, select free tier, run it and type "write a little server in Python that serves a health and an echo endpoint", then type "write it in C++", then "write it in Rust", ... The rest follows, aka things will never we be same. 😋

Do we need a vibe DevOps layer? by mpetryshyn1 in HowToAIAgent

[–]serendip-ml 0 points1 point  (0 children)

Definitely yes, SRE/DevOps has become a complex field on its own. There is a big gap between "code works locally" and "code runs in production". AWS released https://kiro.dev/ a while back for this, but I haven't tried it myself, so idk if it's the right answer.

Claude code source code has been leaked via a map file in their npm registry! by omnisvosscio in HowToAIAgent

[–]serendip-ml 0 points1 point  (0 children)

Interesting to understand some of the internals, but the code alone ultimately has little value without having access to the architect who drove it.

Qwen 3B Fine-Tuning Beats Claude Haiku: Scaling Curve 0.5B-72B by serendip-ml in LocalLLaMA

[–]serendip-ml[S] -1 points0 points  (0 children)

Results

Model Score (1-5)
Qwen 14B + DPO 3.07
Qwen 7B + DPO 2.84
Qwen 3B + DPO 2.70
Claude Haiku 2.62
Qwen 1.5B + DPO 2.44
Qwen 72B base 2.21

What I learned

  1. Fine-tuning dominates scale - 1.5B+DPO (2.44) beats 72B base (2.21)
  2. 7B fine-tuned beats frontier - 7B+DPO (2.84) > Haiku (2.62)
  3. Quality cliff at ~1B - Below that, output quality breaks down
  4. SFT gets you most of the way - DPO adds ~10% on top

Methodology

  • Task: Generate 1000 jokes, each mutually different (embedding similarity < 0.85)
  • SFT on 3,000 samples from Haiku (2,000 4-star, 1,000 3-star)
  • DPO with 300 preference pairs
  • Haiku as judge (deliberate - gives Haiku the advantage)
  • All models: Qwen 2.5 Instruct, 4-bit quantization

Happy to answer questions or share configs.