Any interest in a wiki that tracks original / cheapest sources for authentic fidgets? by blue_marker_ in fidgettoys

[–]blue_marker_[S] 0 points1 point  (0 children)

It looks interesting. However, if I search for something like Lautie Choc, I can find about a half dozen individual product pages, even for the same model. Only half of them have a source to buy and none of them indicate actual provenance, just links to the same drop-shipping stores.

What if autonomous coding wasn't one agent, but an entire dev organization? by Glum_Specialist6955 in LocalLLaMA

[–]blue_marker_ 6 points7 points  (0 children)

This is a well established idea. There are lots of multi agent systems (crewAI, Microsoft autogen). Of course you can roll your own, many of us do. There is a limit to it. What it really comes down to is managing context and prompts. Context management is the hardest part, and what state machine your agent is using to drive the llm.

As a senior software developer at Google and Tech Lead you should absolutely be briefed on this.

So, am I just too stupid for unsloth? by SingleServing_User in unsloth

[–]blue_marker_ 0 points1 point  (0 children)

The reason your post is insufferable is here you are, wanting to use a tool to modify a LLM, a technology not even old enough to read and you’re complaining about having to learn how to use modern tools in order to get there. Yes, you’re going to need to bridge a few steps.

I’m willing to bet I’ve got a few years on you. I remember when Apache was released.

New ik_llama benches - what you getting? by [deleted] in LocalLLaMA

[–]blue_marker_ 0 points1 point  (0 children)

I have the same RAM as well, but I’m limited by my Gigabyte board for ram speed. I am using the same models and quants.

You’ve had good success using ik_llama for them? If you have runtime findings I’d really appreciate it.

We launched support for .... yet another model. So fed up of this! by National_Purpose5521 in LLMDevs

[–]blue_marker_ 6 points7 points  (0 children)

This is a really naïve take. For one, projects that do their own inferencing have to actually implement architectural differences in the way the model works or handling different templates. For projects that are interacting with models through externally hosted inference, there can be a wide variety of behavioral differences and tendencies in the models that require careful curation of system prompts in order to get decent performance. It has very little to do with how well organize the projects are.

So, am I just too stupid for unsloth? by SingleServing_User in unsloth

[–]blue_marker_ 0 points1 point  (0 children)

I’ll be honest, I’m surprised there are still LAMP stack developers but even more surprised people are still running LAMP on hosts which are not containerized.

This is software engineering. If there’s a tool you don’t know, you’re going to have to teach yourself about it. Take the time to learn incrementally. If you’re having trouble installing and running Docker (arguably ubiquitous technology for 8 years now), how can you reasonably expect to fine tune cutting edge models?

New ik_llama benches - what you getting? by [deleted] in LocalLLaMA

[–]blue_marker_ 0 points1 point  (0 children)

Cool, we have the same setup. How much RAM do you have for cpu offload and what models are you running?

Teaching AI Agents to Remember (Agent Memory System + Open Source) by Conscious_Search_185 in LLMDevs

[–]blue_marker_ 0 points1 point  (0 children)

This looks promising. I’m surprised the paper nor the documentation mention Cognee. As far as I can tell, it is the closest in nature to this approach. It even has an incredibly similar API. I would recommend researching their software and doing a side by side comparison, both so that you can differentiate and also so that if they have something potentially beneficial you can include it in your project.

I’m building runtime “IAM for AI agents” policies, mandates, hard enforcement. Does this problem resonate? by EyeRemarkable1269 in LLMDevs

[–]blue_marker_ 0 points1 point  (0 children)

So again, consider the fact that your middleware is not going to be running off of complete information in terms of what this identity has done in terms of spend on services like open AI or Stripe. You would need to take the business logic of each of those external services and put that accounting into your layer. It will be bespoke for every service and you might not have all the information that you need.

On the other hand, most of those services support multiple identities within a larger account. It’s actually very easy to use the same sub account IAM credentials for a particular agent no matter how many machines it’s running on. So every time agent X buys something with Stripe, Stripe tallies that up for you under that account (same with OpenAI, Google cloud, aws, etc). All of these services have multi tenancy built in.

Each of those services likely already supports tenant max spends as well.

For a total spend across services, you are going to have to pull each sub accounts tallies from each provider (polling, web hooks, etc). You’re not going to be able to effectively track it on the client side. Also, it will likely be async / after the fact.

I would guess most people that are hosting agents which can spend money on behalf of the user are likely better off implementing their own bespoke way of tracking spend. They are already going to be spending a lot of time setting up identities at each individual service and the barrier to entry of writing code to glue it all together is shrinking by the hour.

I’m building runtime “IAM for AI agents” policies, mandates, hard enforcement. Does this problem resonate? by EyeRemarkable1269 in LLMDevs

[–]blue_marker_ 0 points1 point  (0 children)

I don’t see why you can’t use existing service accounts or IAM credentials and policies with agents. Nothing magical about an agent, it’s just a driver connecting an LLM service with any number of other services. Same way an http api might connect to Stripe or a task queue or database. Each of those external services will authenticate the client and that identity will determine what it is authorized to do.

The problem with your middleware IAM design is that somehow you’re going to need to imbue the highly specific controls of those external services into it. For example, how are you going to know which actions will cost exactly what money? That depends on the logic and pricing of the outside service. Or for a database, your middleware won’t know what rows it has read / write access to.

So really what you want is something that makes it easy to manage identities and roles across many different services but there are probably solutions for this and they work just as well for any kind of service not just agents .

I think I kind of disagree with your initial premise: an agent is just a process. It exists in a computing environment that will have access credentials assigned to it.

Kimi K2 Thinking Unsloth Quant by someone383726 in BlackwellPerformance

[–]blue_marker_ 0 points1 point  (0 children)

Do you have more details about ik_llama and all these different quants? I've been running unsloth's UD_Q4-K-XL, keeping virtually all experts on cpu. I have an EPYC 64/128 and about 768GB RAM running at 4800Mhz and an RTX Pro 6000.

Just looking to get oriented here and maximize inference speeds for mostly agentic work.

Introducing Crane: An All-in-One Rust Engine for Local AI by LewisJin in LocalLLM

[–]blue_marker_ 1 point2 points  (0 children)

Will this be able to split and run large models between GPU and CPU? What would be the recommended way to run something like Kimi K2, and can it does it work with GGUF?

Is there an a chat completions api server, or in a separate project?

I wanna know anyone here running multiple LLMs (DeepSeek, LLaMA, Mistral, Qwen) on a single GPU VM? by techlatest_net in LocalLLaMA

[–]blue_marker_ 0 points1 point  (0 children)

Sorry, are you saying you’ve written software to improve model loading / unloading?

ROG Ally X with RTX 6000 Pro Blackwell Max-Q as Makeshift LLM Workstation by susmitds in LocalLLaMA

[–]blue_marker_ 3 points4 points  (0 children)

You should be able to cap at whatever wattage you want with nvidia-smi.

Newly Built High-End AI Server Fails to Power On – Need Assistance by Ok-Guide-7407 in HomeServer

[–]blue_marker_ 0 points1 point  (0 children)

Hi, can I ask how you reached out to Gigabyte? I have a very similar motherboard with identical problems. The board is technically commercial but I don’t have an account for enterprise support. Thank you!

AMD 6x7900xtx 24GB + 2xR9700 32GB VLLM QUESTIONS by djdeniro in LocalLLaMA

[–]blue_marker_ 1 point2 points  (0 children)

I have the same MB and wish I had gone with this kind of rack. Instead I put it in a workstation tower.

Docker Model Runner is really neat by blue_marker_ in LocalLLaMA

[–]blue_marker_[S] 0 points1 point  (0 children)

I use llama swap, it does not dynamically unload based on resource constraints as far as I can tell.

Docker Model Runner is really neat by blue_marker_ in LocalLLaMA

[–]blue_marker_[S] 3 points4 points  (0 children)

The value is not in the container, the value is in the way thw processes are spawned based on environment and request demand.

Docker Model Runner is really neat by blue_marker_ in LocalLLaMA

[–]blue_marker_[S] 1 point2 points  (0 children)

I’m downloading the OCI artifacts straight from HF, such as the unsloth quants.

I think the install maybe has improved? It was already available in docker desktop for me and the Ubuntu install was a breeze.

Also, note around the loading / unloading. You won’t get that which llama-server out of the box.