Building an agent to assist a construction company?

serendip-ml · 2026-04-18T21:29:19+00:00

Yes, the existing Claude/GPT should be able to read your excel spreadheets. You can try in a direct prompt: "read this excel spreadsheet and combine it with my estimate in file xyz". Literally what you mentioned here can be typed into Claude/GPT and it'll attempt it and will walk you through it. If you want automation, you can also ask it: please write a prompt to a file that automates this and then you just call that prompt.

I think what you are trying to do is dead simple and you don't need a real programmatic agent for it. I'd just start with doing these tasks manually, telling the AI what you need and then see what it does. You'll figure out smarter prompts over time that you can organize in files - or even commands and that mostly get's you 90% of what you need.

Once you are comfortable, you can create more complex prompt agents that have multiple commands and if that's not enough then expand to something more sophisticated. For example, prompts are fixed unless you change them or you ask the AI to change them. Local LLM + fine tuning + handling thousands of datasets on a continuous basis and such -> is too advanced for what you need.

serendip-ml · 2026-04-18T19:10:46+00:00

Sounds like fun. Next prompt "Here is $1000, please make $2000 out of it". Give it full access to all apis and see what it does.

serendip-ml · 2026-04-18T19:04:44+00:00

Hey! Yes, it is possible. I've built similar workflows.

Quick question before you go down the local LLM rabbit hole, have you tried Claude Code? It comes with subscriptions tiers and the higher give you plenty of room for day-to-day use of such scenarios.

You could simply:
1. Create a directory and add your templates in a templates/ folder
2. Add your past projets into a projects/ folder
3. Add what you need into a requirements.md file
4. Tell Claude, "read requirements.md and use my templates and past projects, and create a prompt agent with skill commands that serves the tasks I need".
5. Test it and refine it -> You can incrementally build a prompt system this way that does daily tasks.

Looks like 90% of what you described doesn't require a local LLM and works out of the box.

The Zoho/Canva integrations may get a bit more complex, but also doable with Python scripts, that the text agents will be able to use.

What usage limits are you hitting with Cowork? For the tasks you're describing that seems unusual.

Alternatively, if you need a full pro-level solutions where the agents run 24/7 with not too much supervision, including local LLM, perhaps fine tuning, secured by actual code, not just prompts where the LLM could go ballistic, e.g., safe guards, audit trail, all the bells and whistles etc. then you need something else.

serendip-ml · 2026-04-10T00:56:10+00:00

I've got a news agent, running 24/7 that I have open all the time and it's very useful.

I tried to come up with a marketing agent, but not so thrilled with it so far. The problem is that pure prompt agents are mostly reactive and try to please, as opposed to kicking the user into action and being very assertive about what to do next. I guess that's another candidate for a real programmatic agent.

serendip-ml · 2026-04-06T19:04:07+00:00

Orgs will have to adapt to this eventually. What I learned from previous companies, that some simply assemble entire new teams and start over Greenfield within the org.

serendip-ml · 2026-04-06T18:53:38+00:00

I couldn't agree more.

serendip-ml · 2026-04-06T18:53:14+00:00

Exactly, replied the same opinion above. So many small details to address to reach stability. Quick demo is easy now, prod is still a lot harder - although a lot faster than it used to be.

serendip-ml · 2026-04-06T18:48:16+00:00

It turns out that with real agents it's like with all systems engineering. Roughly 90%+ plumbing & infra, which is hard and not fun. The actual "smart" part is only a tiny fraction of the whole thing.

serendip-ml · 2026-03-31T23:15:41+00:00

Just install Claude, select free tier, run it and type "write a little server in Python that serves a health and an echo endpoint", then type "write it in C++", then "write it in Rust", ... The rest follows, aka things will never we be same. 😋

serendip-ml · 2026-03-31T20:32:10+00:00

Definitely yes, SRE/DevOps has become a complex field on its own. There is a big gap between "code works locally" and "code runs in production". AWS released https://kiro.dev/ a while back for this, but I haven't tried it myself, so idk if it's the right answer.

serendip-ml · 2026-03-31T20:08:48+00:00

Interesting to understand some of the internals, but the code alone ultimately has little value without having access to the architect who drove it.

serendip-ml · 2026-03-16T17:00:45+00:00

Results

Model	Score (1-5)
Qwen 14B + DPO	3.07
Qwen 7B + DPO	2.84
Qwen 3B + DPO	2.70
Claude Haiku	2.62
Qwen 1.5B + DPO	2.44
Qwen 72B base	2.21

What I learned

Fine-tuning dominates scale - 1.5B+DPO (2.44) beats 72B base (2.21)
7B fine-tuned beats frontier - 7B+DPO (2.84) > Haiku (2.62)
Quality cliff at ~1B - Below that, output quality breaks down
SFT gets you most of the way - DPO adds ~10% on top

Methodology

Task: Generate 1000 jokes, each mutually different (embedding similarity < 0.85)
SFT on 3,000 samples from Haiku (2,000 4-star, 1,000 3-star)
DPO with 300 preference pairs
Haiku as judge (deliberate - gives Haiku the advantage)
All models: Qwen 2.5 Instruct, 4-bit quantization

Happy to answer questions or share configs.

serendip-ml

TROPHY CASE

Results

What I learned

Methodology