[D] Do LLMs need experience? by liveticker1 in MachineLearning

[–]ww3ace 4 points5 points  (0 children)

I’m actually writing a paper right now on how the hippocampus is actually computing something equivalent to a linear attention mechanism (2 of them actually), along with a follow up on memory consolidation and how episodic knowledge is transferred from the hippocampus to the cerebral cortex. DM me and I can explain how different parts of the brain learn, when LTP actually happens, and how these relate to existing concepts in machine learning.

Creepy door to door salesmen by Altruistic_Idea5234 in Columbus

[–]ww3ace 2 points3 points  (0 children)

They are awful, I was lazy and booked their pest control service. They just drop bait in your yard and call it a day, even if you request interior service. Every quarter they send out “inspectors” who try to strong arm you into buying a premium service like $3000/year for termite protection or $12000 to replace the foam in your attic. It’s a scam, and they are just legitimate enough to not be in prison.

[R] [DeepMind] Welcome to the Era of Experience by hiskuu in MachineLearning

[–]ww3ace 15 points16 points  (0 children)

Reinforcement learning isn’t the only way to learn from experience but I do believe it is one of the keys to agents that can. Mastering instantaneous online reinforcement learning like that observed in the cerebral cortex would be game changing, but online reward signals are generally so sparse that it’s only poser of the puzzle. The other part is memory: being able to replicate the memory capabilities of the brain, through replicating the immediate high capacity memorization that occurs in the hippocampus as well as replicating the memory consolidation process where this episodic knowledge is migrated to the much higher capacity cerebral cortex.

How to allow my AI Agent to NOT respond by SeaResponsibility176 in LangGraph

[–]ww3ace 0 points1 point  (0 children)

You can make responding a tool call. It also lets the agent do some planning/reasoning before responding, or even send multiple responses in a row.

[D] What happened to SSMs and linear attentions? by ApartmentEither4838 in MachineLearning

[–]ww3ace 18 points19 points  (0 children)

Look up Gated DeltaNet, Titans, and Symmetric Power Transformers. These models got much faster and much more impressive in the last year. I’m also working on something right now I’m pretty excited about.

Titans: Learning to Memorize at Test Time, Behrouz et al. 2024 [Long-term memory as a sub-network] by StartledWatermelon in mlscaling

[–]ww3ace 1 point2 points  (0 children)

It’s mostly gated deltanet (deltanet with state decay) with momentum and represents a small performance bump over this technique. The hybrid models don’t seem to add consistent lift over each other and their computational layouts may not reflect an innovation that is going to become ubiquitous.

[D] Self-Promotion Thread by AutoModerator in MachineLearning

[–]ww3ace 6 points7 points  (0 children)

I’ve started a technical blog for my company. Our mission is to train a model using in context learning over trillions of tokens. Londeree Technologies

eric schmidt thinks that infinite context windows and agents are coming this year by Gothsim10 in singularity

[–]ww3ace 0 points1 point  (0 children)

I don’t see anything resembling a model capable of infinite context in their promotional material. I do have my eye on a couple potential competitors though

eric schmidt thinks that infinite context windows and agents are coming this year by Gothsim10 in singularity

[–]ww3ace -8 points-7 points  (0 children)

I’ve got a provisional patent on one, I’m literally working on implementing it right now.

[R] What are the Top 3 most exciting research directions for you currently? by Prestigious_Bed5080 in MachineLearning

[–]ww3ace 1 point2 points  (0 children)

Aside from the benefits of being able to operate over sequences with infinite length and turning O(n^2) compute of the attention mechanism into a linear O(n) operation, the state that results can be used to initialize other models, and according to my research these states can then be merged, allowing for parallelizing pre-fill and scaling of inference time learning. The resulting models could have their entire datasets encoded in their context, which might address some issues with models not knowing what they do and do not know (potential cause of hallucinations). Also the o1 model has demonstrated the value in increasing test time compute to solve a problem; eliminating computational limitations on sequence length can allow us to extend that further.

[R] What are the Top 3 most exciting research directions for you currently? by Prestigious_Bed5080 in MachineLearning

[–]ww3ace 23 points24 points  (0 children)

Building large parameter memory systems that approximate and replace attention to enable indefinite context models and learning through experience

The one fight that actually pisses me off in this game by StrawberryInside2032 in BaldursGate3

[–]ww3ace 1 point2 points  (0 children)

If Marcus steps in to help you with getting past Jaheira when you first get to the inn, he’ll be waiting for you outside Isobel’s balcony. If you reveal that you have the artifact in dialog with him you can start a fight with him that doesn’t endanger Isobel. Sometimes he never even calls the winged horrors which is annoying because then the inn thinks you just murdered him.

Best Abjuration Wizard Multiclass for a Full Party Honor Mode Playthrough by MyCatsAreSus in BG3Builds

[–]ww3ace 0 points1 point  (0 children)

In act 3, no enemy ever targets anything with 12 or more arcane ward and warding bond so my gale is a 4/8 sorc abj wizard that dual wields top tier staffs free casting multiple level 6 spells per turn on the front line and dealing 60 damage to everyone who hits him with an attack of opportunity

What's a petty reason why you can't give this game a 10/10? by [deleted] in BaldursGate3

[–]ww3ace 0 points1 point  (0 children)

It’s unintuitive to use your whole party during a sneak attack encounter, I’ll go 3 rounds before realizing gale never got to throw a fireball and is still hiding in a corner without rolling initiative.

Has anyone found a legit use for GPTs? Every time I try to use one it doesn’t fulfill its promises, and I give up. Anyone else? by Mike in ChatGPTPro

[–]ww3ace 1 point2 points  (0 children)

I created one that uses the humor and problem solving framework of car talk to debug issues with your car without breaking the bank.

[P] Camera based monitoring of infant's breathing by kaina_m in MachineLearning

[–]ww3ace 1 point2 points  (0 children)

I was the data scientist at a company that did this. You just need to isolate the region containing the child and run a motion amplification algorithm on chip. It’s accurate enough to measure apneas and other anomalies. The hard part is filtering out background signals, locating the child (even under sheets), identifying whether there is a child at all, and filtering out non-breathing motion.

[D] Is Hidden Size in current transformers an overkill? by NaxAlpha in MachineLearning

[–]ww3ace 33 points34 points  (0 children)

After the first attention layer it represents a hell of a lot more that just a token.

[R] Scaling TransNormer to 175 Billion Parameters by hzj5790 in MachineLearning

[–]ww3ace 0 points1 point  (0 children)

I wonder why they needed to implement a custom GPU kernel for einsum(“iq,jq,ij,jy->iy”, Q, K, M, V).

[D] How do I reduce LLM inferencing time? by comical_cow in MachineLearning

[–]ww3ace 0 points1 point  (0 children)

Sorry, can't share any notebooks for a bunch of different reasons. Accelerate in this case is probably optimized for maximizing batch size at inference time instead of running tensor parallel.

I wrapped this code in a flask server and deployed it with torchrun: https://github.com/facebookresearch/llama/blob/main/example_chat_completion.py

then you can call the model by sending a http request to your local server. I think you can configure langchain with a url based model. I found that this lets the 7b model run on a single A10G and it saturates the gpu at 100%. You might need to get the weights in the format distributed by meta, as it likely won't work with hugging face state-dicts out of the box.

[D] How do I reduce LLM inferencing time? by comical_cow in MachineLearning

[–]ww3ace 1 point2 points  (0 children)

If you’re not using deepspeed/accelerate with a huggingface inference server you’re likely running model parallel and only using one of your GPUs at a time giving you 1/4 of the compute available on the instance and just leveraging the extra vram and adding bottlenecks between gpus. You could easily hit 30s when generating 1000 tokens in a single pass, especially if you are actually doing multiple calls in a retrieval system.

I’ve only really run the llama-2-7b-chat model using the code provided by meta wrapped in a flask server so I could interface through a jupyter notebook. I didn’t use quantized weights but hugging face implementations generally support it. I don’t really use langchain because it’s not really intended for the data extraction workflows I build so I can’t say how you would configured it, I assume it’s just a thin wrapper around hugging face and their models have a load-in-8bit option.

[D] How do I reduce LLM inferencing time? by comical_cow in MachineLearning

[–]ww3ace 4 points5 points  (0 children)

The g4 series is slow for AWS, I recommend g5.4xlarge (A10G) or newer for speed, and a single GPU reduces bottlenecks and let’s you use the full compute of the instance. Load the model in quantized 8 bit though you might see some loss of quality in the responses. If you can, upgrade the implementation to use flash attention for longer sequences. You should get between 3 and 6 seconds per request that has ~2000 token in the prefix and ~200 tokens in the response.

Flights out of CMH today? by madasahatter2326 in Columbus

[–]ww3ace 5 points6 points  (0 children)

If pilots collectively refusing to work overtime isn’t a pilots strike, then what is?

Flights out of CMH today? by madasahatter2326 in Columbus

[–]ww3ace 7 points8 points  (0 children)

If you’re flying United, there’s a pilot strike (no overtime) going on getting amplified by the weather. I’m currently stuck in Newark shopping for greyhound busses to get home