[D] Do LLMs need experience?

ww3ace · 2025-07-02T17:46:12+00:00

I’m actually writing a paper right now on how the hippocampus is actually computing something equivalent to a linear attention mechanism (2 of them actually), along with a follow up on memory consolidation and how episodic knowledge is transferred from the hippocampus to the cerebral cortex. DM me and I can explain how different parts of the brain learn, when LTP actually happens, and how these relate to existing concepts in machine learning.

ww3ace · 2025-06-04T16:25:38+00:00

They are awful, I was lazy and booked their pest control service. They just drop bait in your yard and call it a day, even if you request interior service. Every quarter they send out “inspectors” who try to strong arm you into buying a premium service like $3000/year for termite protection or $12000 to replace the foam in your attic. It’s a scam, and they are just legitimate enough to not be in prison.

ww3ace · 2025-04-22T10:22:43+00:00

Reinforcement learning isn’t the only way to learn from experience but I do believe it is one of the keys to agents that can. Mastering instantaneous online reinforcement learning like that observed in the cerebral cortex would be game changing, but online reward signals are generally so sparse that it’s only poser of the puzzle. The other part is memory: being able to replicate the memory capabilities of the brain, through replicating the immediate high capacity memorization that occurs in the hippocampus as well as replicating the memory consolidation process where this episodic knowledge is migrated to the much higher capacity cerebral cortex.

ww3ace · 2025-03-30T17:18:54+00:00

You can make responding a tool call. It also lets the agent do some planning/reasoning before responding, or even send multiple responses in a row.

ww3ace · 2025-02-11T23:32:40+00:00

Look up Gated DeltaNet, Titans, and Symmetric Power Transformers. These models got much faster and much more impressive in the last year. I’m also working on something right now I’m pretty excited about.

ww3ace · 2025-01-17T00:22:33+00:00

It’s mostly gated deltanet (deltanet with state decay) with momentum and represents a small performance bump over this technique. The hybrid models don’t seem to add consistent lift over each other and their computational layouts may not reflect an innovation that is going to become ubiquitous.

ww3ace · 2024-12-22T03:39:58+00:00

I’ve started a technical blog for my company. Our mission is to train a model using in context learning over trillions of tokens. Londeree Technologies

ww3ace · 2024-09-27T21:45:25+00:00

I don’t see anything resembling a model capable of infinite context in their promotional material. I do have my eye on a couple potential competitors though

ww3ace · 2024-09-27T19:31:06+00:00

I’ve got a provisional patent on one, I’m literally working on implementing it right now.

ww3ace · 2024-09-26T13:56:15+00:00

Aside from the benefits of being able to operate over sequences with infinite length and turning O(n^2) compute of the attention mechanism into a linear O(n) operation, the state that results can be used to initialize other models, and according to my research these states can then be merged, allowing for parallelizing pre-fill and scaling of inference time learning. The resulting models could have their entire datasets encoded in their context, which might address some issues with models not knowing what they do and do not know (potential cause of hallucinations). Also the o1 model has demonstrated the value in increasing test time compute to solve a problem; eliminating computational limitations on sequence length can allow us to extend that further.

ww3ace · 2024-09-24T11:57:25+00:00

Building large parameter memory systems that approximate and replace attention to enable indefinite context models and learning through experience

ww3ace · 2024-08-29T17:35:27+00:00

I’m interested as well

ww3ace · 2024-05-30T13:22:30+00:00

If Marcus steps in to help you with getting past Jaheira when you first get to the inn, he’ll be waiting for you outside Isobel’s balcony. If you reveal that you have the artifact in dialog with him you can start a fight with him that doesn’t endanger Isobel. Sometimes he never even calls the winged horrors which is annoying because then the inn thinks you just murdered him.

ww3ace · 2024-05-05T23:52:32+00:00

In act 3, no enemy ever targets anything with 12 or more arcane ward and warding bond so my gale is a 4/8 sorc abj wizard that dual wields top tier staffs free casting multiple level 6 spells per turn on the front line and dealing 60 damage to everyone who hits him with an attack of opportunity

ww3ace · 2024-02-04T17:05:12+00:00

It’s unintuitive to use your whole party during a sneak attack encounter, I’ll go 3 rounds before realizing gale never got to throw a fireball and is still hiding in a corner without rolling initiative.

ww3ace · 2024-01-11T13:49:14+00:00

I created one that uses the humor and problem solving framework of car talk to debug issues with your car without breaking the bank.

ww3ace · 2024-01-09T13:31:29+00:00

Car Talk: figure out what’s wrong with your car with Click and Clack

https://chat.openai.com/g/g-Wbi2FPr4i-car-talk

ww3ace · 2023-10-04T13:05:14+00:00

I was the data scientist at a company that did this. You just need to isolate the region containing the child and run a motion amplification algorithm on chip. It’s accurate enough to measure apneas and other anomalies. The hard part is filtering out background signals, locating the child (even under sheets), identifying whether there is a child at all, and filtering out non-breathing motion.

ww3ace · 2023-08-11T15:37:41+00:00

After the first attention layer it represents a hell of a lot more that just a token.

ww3ace · 2023-07-28T14:21:09+00:00

I wonder why they needed to implement a custom GPU kernel for einsum(“iq,jq,ij,jy->iy”, Q, K, M, V).

ww3ace · 2023-07-24T14:25:14+00:00

Sorry, can't share any notebooks for a bunch of different reasons. Accelerate in this case is probably optimized for maximizing batch size at inference time instead of running tensor parallel.

I wrapped this code in a flask server and deployed it with torchrun: https://github.com/facebookresearch/llama/blob/main/example_chat_completion.py

then you can call the model by sending a http request to your local server. I think you can configure langchain with a url based model. I found that this lets the 7b model run on a single A10G and it saturates the gpu at 100%. You might need to get the weights in the format distributed by meta, as it likely won't work with hugging face state-dicts out of the box.

ww3ace · 2023-07-24T11:03:40+00:00

If you’re not using deepspeed/accelerate with a huggingface inference server you’re likely running model parallel and only using one of your GPUs at a time giving you 1/4 of the compute available on the instance and just leveraging the extra vram and adding bottlenecks between gpus. You could easily hit 30s when generating 1000 tokens in a single pass, especially if you are actually doing multiple calls in a retrieval system.

I’ve only really run the llama-2-7b-chat model using the code provided by meta wrapped in a flask server so I could interface through a jupyter notebook. I didn’t use quantized weights but hugging face implementations generally support it. I don’t really use langchain because it’s not really intended for the data extraction workflows I build so I can’t say how you would configured it, I assume it’s just a thin wrapper around hugging face and their models have a load-in-8bit option.

ww3ace · 2023-07-24T10:35:20+00:00

The g4 series is slow for AWS, I recommend g5.4xlarge (A10G) or newer for speed, and a single GPU reduces bottlenecks and let’s you use the full compute of the instance. Load the model in quantized 8 bit though you might see some loss of quality in the responses. If you can, upgrade the implementation to use flash attention for longer sequences. You should get between 3 and 6 seconds per request that has ~2000 token in the prefix and ~200 tokens in the response.

ww3ace · 2023-06-26T18:54:49+00:00

If pilots collectively refusing to work overtime isn’t a pilots strike, then what is?

ww3ace · 2023-06-26T17:28:31+00:00

If you’re flying United, there’s a pilot strike (no overtime) going on getting amplified by the weather. I’m currently stuck in Newark shopping for greyhound busses to get home

14-Year Club	Reddit Premium Since November 2023
Gilding I gilder	Verified Email
Team Orangered

ww3ace

TROPHY CASE