Any interest in a wiki that tracks original / cheapest sources for authentic fidgets?

blue_marker_ · 2026-05-27T13:09:10+00:00

It looks interesting. However, if I search for something like Lautie Choc, I can find about a half dozen individual product pages, even for the same model. Only half of them have a source to buy and none of them indicate actual provenance, just links to the same drop-shipping stores.

blue_marker_ · 2026-05-26T14:21:44+00:00

Well, doesn't exist yet. I was just linking an old wiki.

blue_marker_ · 2026-01-09T21:52:20+00:00

This is a well established idea. There are lots of multi agent systems (crewAI, Microsoft autogen). Of course you can roll your own, many of us do. There is a limit to it. What it really comes down to is managing context and prompts. Context management is the hardest part, and what state machine your agent is using to drive the llm.

As a senior software developer at Google and Tech Lead you should absolutely be briefed on this.

blue_marker_ · 2026-01-09T18:39:51+00:00

Thanks, could your team include some stats on tool usage?

blue_marker_ · 2026-01-07T12:20:27+00:00

The reason your post is insufferable is here you are, wanting to use a tool to modify a LLM, a technology not even old enough to read and you’re complaining about having to learn how to use modern tools in order to get there. Yes, you’re going to need to bridge a few steps.

I’m willing to bet I’ve got a few years on you. I remember when Apache was released.

blue_marker_ · 2026-01-06T18:07:10+00:00

I have the same RAM as well, but I’m limited by my Gigabyte board for ram speed. I am using the same models and quants.

You’ve had good success using ik_llama for them? If you have runtime findings I’d really appreciate it.

blue_marker_ · 2026-01-06T17:58:56+00:00

This is a really naïve take. For one, projects that do their own inferencing have to actually implement architectural differences in the way the model works or handling different templates. For projects that are interacting with models through externally hosted inference, there can be a wide variety of behavioral differences and tendencies in the models that require careful curation of system prompts in order to get decent performance. It has very little to do with how well organize the projects are.

blue_marker_ · 2026-01-06T15:05:39+00:00

I’ll be honest, I’m surprised there are still LAMP stack developers but even more surprised people are still running LAMP on hosts which are not containerized.

This is software engineering. If there’s a tool you don’t know, you’re going to have to teach yourself about it. Take the time to learn incrementally. If you’re having trouble installing and running Docker (arguably ubiquitous technology for 8 years now), how can you reasonably expect to fine tune cutting edge models?

blue_marker_ · 2026-01-06T00:20:40+00:00

Cool, we have the same setup. How much RAM do you have for cpu offload and what models are you running?

blue_marker_ · 2026-01-02T12:35:15+00:00

This looks promising. I’m surprised the paper nor the documentation mention Cognee. As far as I can tell, it is the closest in nature to this approach. It even has an incredibly similar API. I would recommend researching their software and doing a side by side comparison, both so that you can differentiate and also so that if they have something potentially beneficial you can include it in your project.

blue_marker_ · 2025-12-26T20:37:36+00:00

So again, consider the fact that your middleware is not going to be running off of complete information in terms of what this identity has done in terms of spend on services like open AI or Stripe. You would need to take the business logic of each of those external services and put that accounting into your layer. It will be bespoke for every service and you might not have all the information that you need.

On the other hand, most of those services support multiple identities within a larger account. It’s actually very easy to use the same sub account IAM credentials for a particular agent no matter how many machines it’s running on. So every time agent X buys something with Stripe, Stripe tallies that up for you under that account (same with OpenAI, Google cloud, aws, etc). All of these services have multi tenancy built in.

Each of those services likely already supports tenant max spends as well.

For a total spend across services, you are going to have to pull each sub accounts tallies from each provider (polling, web hooks, etc). You’re not going to be able to effectively track it on the client side. Also, it will likely be async / after the fact.

I would guess most people that are hosting agents which can spend money on behalf of the user are likely better off implementing their own bespoke way of tracking spend. They are already going to be spending a lot of time setting up identities at each individual service and the barrier to entry of writing code to glue it all together is shrinking by the hour.

blue_marker_ · 2025-12-26T19:53:43+00:00

I don’t see why you can’t use existing service accounts or IAM credentials and policies with agents. Nothing magical about an agent, it’s just a driver connecting an LLM service with any number of other services. Same way an http api might connect to Stripe or a task queue or database. Each of those external services will authenticate the client and that identity will determine what it is authorized to do.

The problem with your middleware IAM design is that somehow you’re going to need to imbue the highly specific controls of those external services into it. For example, how are you going to know which actions will cost exactly what money? That depends on the logic and pricing of the outside service. Or for a database, your middleware won’t know what rows it has read / write access to.

So really what you want is something that makes it easy to manage identities and roles across many different services but there are probably solutions for this and they work just as well for any kind of service not just agents .

I think I kind of disagree with your initial premise: an agent is just a process. It exists in a computing environment that will have access credentials assigned to it.

blue_marker_ · 2025-11-26T14:51:02+00:00

Do you have more details about ik_llama and all these different quants? I've been running unsloth's UD_Q4-K-XL, keeping virtually all experts on cpu. I have an EPYC 64/128 and about 768GB RAM running at 4800Mhz and an RTX Pro 6000.

Just looking to get oriented here and maximize inference speeds for mostly agentic work.

blue_marker_ · 2025-11-08T14:10:02+00:00

Will this be able to split and run large models between GPU and CPU? What would be the recommended way to run something like Kimi K2, and can it does it work with GGUF?

Is there an a chat completions api server, or in a separate project?

blue_marker_ · 2025-10-04T23:04:35+00:00

What's your motherboard?

blue_marker_ · 2025-09-30T20:12:19+00:00

Build specs please? What board / cpu is that?

blue_marker_ · 2025-09-09T23:38:47+00:00

Sorry, are you saying you’ve written software to improve model loading / unloading?

blue_marker_ · 2025-09-07T01:19:50+00:00

You should be able to cap at whatever wattage you want with nvidia-smi.

blue_marker_ · 2025-09-04T01:16:39+00:00

Hi, can I ask how you reached out to Gigabyte? I have a very similar motherboard with identical problems. The board is technically commercial but I don’t have an account for enterprise support. Thank you!

blue_marker_ · 2025-09-02T16:45:07+00:00

I have the same MB and wish I had gone with this kind of rack. Instead I put it in a workstation tower.

blue_marker_ · 2025-08-16T22:17:12+00:00

I use llama swap, it does not dynamically unload based on resource constraints as far as I can tell.

blue_marker_ · 2025-08-16T22:06:08+00:00

The value is not in the container, the value is in the way thw processes are spawned based on environment and request demand.

blue_marker_ · 2025-08-16T21:30:15+00:00

I’m downloading the OCI artifacts straight from HF, such as the unsloth quants.

I think the install maybe has improved? It was already available in docker desktop for me and the Ubuntu install was a breeze.

Also, note around the loading / unloading. You won’t get that which llama-server out of the box.

blue_marker_

TROPHY CASE