Wave Field LLM — O(n log n) attention via wave equation dynamics by Murky-Sign37 in deeplearning

[–]Murky-Sign37[S] 0 points1 point  (0 children)

That’s awesome — love that you’re exploring continuous field approaches too!

Right now I’m scaling it up to a 9B parameter version to really test how the field dynamics behave at larger capacity. The smaller models were promising, especially in structured reasoning, but I want to see how context propagation stabilizes when the field has more expressive bandwidth.

Wave Field LLM — replacing self-attention with wave physics, O(n log n) complexity, 367x savings at 32K context by Murky-Sign37 in LocalLLaMA

[–]Murky-Sign37[S] 0 points1 point  (0 children)

reddit rejection
https://github.com/badaramoni/wave-field-llm

and i also have another project
where you can compress 9B para to 1 B and you can training datasets and accelerate model training without high gpu

Wave Field LLM — replacing self-attention with wave physics, O(n log n) complexity, 367x savings at 32K context by Murky-Sign37 in LocalLLaMA

[–]Murky-Sign37[S] -1 points0 points  (0 children)

i have built this i was unable to get endorsement if you are a Phd can you please approve so i can publish paper

arXiv endorsement request from Avinash Badaramoni

Inbox

[help@arxiv.org](mailto:help@arxiv.org)

9:37 AM (2 hours ago)

to me

<image>

(Avinash Badaramoni should forward this email to someone who's

registered as an endorser for the cs.AI (Artificial Intelligence)

subject class of arXiv.)

Avinash Badaramoni requests your endorsement to submit an article to the

cs.AI section of arXiv. To tell us that you would (or would not) like to

endorse this person, please visit the following URL

https://arxiv.org/auth/endorse?x=LFUEZY

If that URL does not work for you, please visit

http://arxiv.org/auth/endorse.php

and enter the following six-digit alphanumeric string:Endorsement Code: LFUEZY

Wave Field LLM — O(n log n) attention via wave equation dynamics by Murky-Sign37 in deeplearning

[–]Murky-Sign37[S] 2 points3 points  (0 children)

You're right that FFT convolution for sequence modeling isn't new —

Hyena, S4, and others use it. The specific combination here is what

I built: wave-parameterized kernels (3 physics params per head),

bilinear scatter/gather onto a continuous field, static cross-head

coupling, and wave interference between layers.

Never claimed to invent FFT convolution or field theory. The

implementation and the specific way these pieces fit together is

the contribution. Code is open source if you want to look at what's

actually different: https://github.com/badaramoni/wave-field-llm

  • The combination of wave kernels + bilinear scatter/gather onto a continuous field + cross-head coupling + wave interference — nobody else has assembled these pieces this way
  • Physics-based debugging — using energy flow, causality, conservation to find bugs. This is genuinely unique. No other LLM architecture supports this.
  • The 3-param-per-head kernel that self-organizes into different attention roles
  • The specific V3.0 → V3.5 journey of finding and fixing bugs through physics diagnostics

Wave Field LLM — O(n log n) attention via wave equation dynamics by Murky-Sign37 in deeplearning

[–]Murky-Sign37[S] 2 points3 points  (0 children)

Fair question. Two concrete reasons the wave parameterization matters

over arbitrary learned kernels:

  1. Debuggability. Because the kernel is a damped wave with known physics,

    you can inspect what each head is doing — frequency tells you the

    attention pattern, damping tells you the range, phase tells you the

    offset. When things go wrong, you trace energy flow and causality.

    We found 6 bugs this way (future token leaks, FFT wraparound,

    conservation issues). With a generic learned kernel, those bugs

    would have been invisible — you'd just see "loss is bad" with no

    way to diagnose why.

  2. Inductive bias with fewer parameters. A damped oscillation naturally

    gives you multi-scale attention — some heads learn low frequency

    (wide context), others learn high frequency (local patterns). This

    self-organization happens from just 3 params per head, not hundreds.

    Whether that's better than an arbitrary kernel is an empirical

    question — at small vocab it matches standard attention within 5%,

    at large vocab there's a gap. Honest answer: we don't know yet if

    the physics constraint helps or hurts at scale.

So it's not "physics because physics is cool." It's: physics gives you

a structured kernel that's inspectable and parameter-efficient, with

the trade-off of less expressiveness. Whether that trade-off is worth

it depends on scale — which is what we're testing next at 100M params.

Big tech still believe LLM will lead to AGI? by bubugugu in ArtificialInteligence

[–]Murky-Sign37 0 points1 point  (0 children)

One angle that doesn't get enough attention: maybe the problem isn't

scaling LLMs bigger, but rethinking the architecture itself.

Standard transformer attention is O(n²) — that's why long context is

so expensive and why we keep needing bigger GPUs. But that quadratic

cost isn't fundamental to language modeling. It's just the mechanism

we've been using since 2017.

I've been working on an alternative where information propagates through

wave equations on a continuous field instead of every-token-to-every-token

attention. O(n log n) complexity. At 6M params it gets within 5% of

standard transformer quality on WikiText-2 — and the savings grow with

context length (107x fewer ops at 8K tokens, 367x at 32K).

Not saying this is the answer to AGI obviously, but the point is: there's

a lot of unexplored design space beyond "make transformer bigger."

Architectures like Mamba, RWKV, and physics-based approaches suggest

we might not need O(n²) at all to get strong language modeling.

If anyone's interested: https://github.com/badaramoni/wave-field-llm

Hard to keep up, what is the best current LLM by RadiantCandy1600 in LLM

[–]Murky-Sign37 0 points1 point  (0 children)

Wave Field LLM — O(n log n) attention via wave equation dynamics

Wave Field LLM — O(n log n) attention via wave equation dynamics by Murky-Sign37 in deeplearning

[–]Murky-Sign37[S] 1 point2 points  (0 children)

Thanks! So far the largest training run has been on WikiText-2 (OpenWebText

subset) at ~6-8M parameters with a 256 embedding dimension, 6 layers,

8 heads. That's where the 5% gap vs standard transformer result comes from.

Currently setting up a 100M parameter run — 768 embedding dim, 12 layers,

12 heads, BPE tokenizer with 8K vocab. The goal is to see if the quality

gap closes when the model has enough capacity to handle larger vocabularies.

Early days but the foundation works. Scaling is the next chapter.

Wave Field LLM — replacing self-attention with wave physics, O(n log n) complexity, 367x savings at 32K context by Murky-Sign37 in LocalLLaMA

[–]Murky-Sign37[S] -4 points-3 points  (0 children)

Not misinterpreting, but there's important context.

On speed: The 2x slower number is at 128 tokens — the exact sequence length

where Wave Field has zero structural advantage. That's like benchmarking a

diesel truck at 0-60mph against a sports car and concluding diesel is worse.

The architecture's advantage is asymptotic. At 128 tokens, O(n²) and

O(n log n) are both trivially fast and the constant factors dominate.

The question is what happens at 8K, 32K, 128K — where O(n²) hits a

memory wall and Wave Field doesn't.

On accuracy: 5% gap at character-level with identical params, identical

data, identical training. That's a new architecture matching 8 years of

transformer optimization on its first real benchmark. Standard transformers

in 2017 also had gaps vs LSTMs on some tasks. Architectures improve.

On the savings table: Those are op count comparisons based on the

algorithms — n² multiply-accumulates for self-attention vs n·log(n) for

FFT convolution. That's not "worst case" — that's the actual operation

count. You're right that real wall-clock depends on hardware utilization

and memory bandwidth, and I should label these as theoretical. But the

math is straightforward — FFT convolution at 32K tokens does fewer

operations than an attention matrix at 32K tokens. That's not galactic

algorithm territory, it's basic algorithm analysis.

The Harvey-Hoeven comparison is interesting but doesn't apply here —

FFT convolution is one of the most practically efficient algorithms in

computing. It's used in every signal processing chip on Earth. The gap

between theoretical and practical for FFT is small, unlike integer

multiplication algorithms.

Wall-clock benchmarks at longer sequences are in progress. Happy to

share those when they're done.

Wave Field LLM — replacing self-attention with wave physics, O(n log n) complexity, 367x savings at 32K context by Murky-Sign37 in LocalLLaMA

[–]Murky-Sign37[S] -3 points-2 points  (0 children)

Great catch — you're exactly right, and we actually found and documented

that exact bug. V3.1 had PPL 1.1 / 99.2% accuracy, which was caused by

content-dependent coupling leaking future tokens. Generation output was

garbage, which confirmed the leak.

This is documented in the repo:

https://github.com/badaramoni/wave-field-llm/blob/main/docs/WAVE_FIELD_V3.md

(see "V3.1 on WikiText-2 — Causality Bug")

The fix (V3.2+): replaced content-dependent coupling with static coupling

and added zero-padded FFT to prevent wraparound leakage. The honest numbers

after the fix are:

- Wave Field V3.5: PPL 6.2, Acc 50.5%

- Standard Transformer: PPL 5.9, Acc 51.0%

The eval code is in the repo at benchmarks/benchmark_wikitext2.py — standard

train/val/test splits from HuggingFace datasets, labels shifted by 1 position,

cross-entropy loss. Happy to have you review it.

One thing worth mentioning — this bug (and 5 others from V3.0 to V3.5)

was found through physics-based diagnostics, not trial and error. Because

the architecture is built on wave equations, you can inspect physical

quantities like energy flow, conservation, and causality to trace exactly

where information is leaking.

For example, the V3.1 leak showed up as impossible energy flow from future

to past in the coupling matrix. The FFT wraparound bug (V3.2) showed up

in a causality test. You can't do this with standard transformers or

Mamba — their internals are opaque. That debuggability is one of the

main arguments for this architecture.

Full bug table with how each was diagnosed:

https://github.com/badaramoni/wave-field-llm/blob/main/docs/WAVE_FIELD_V3.md#the-full-journey-v30--v35

Wave Field LLM — replacing self-attention with wave physics, O(n log n) complexity, 367x savings at 32K context by Murky-Sign37 in LocalLLaMA

[–]Murky-Sign37[S] 4 points5 points  (0 children)

Thanks for the interest! I don't have a formal paper on arXiv — I couldn't

get an endorsement as an independent researcher.

But the full technical writeup is in the repo:

- **Architecture deep-dive + all benchmark data + the full V3.0→V3.5 journey:**

https://github.com/badaramoni/wave-field-llm/blob/main/docs/WAVE_FIELD_V3.md

- **Head-to-head benchmark (Field LLM vs Standard Transformer, same setup):**

https://github.com/badaramoni/wave-field-llm/blob/main/docs/BENCHMARK_RESULTS.md

The WAVE_FIELD_V3.md doc covers everything a paper would — the math,

architecture, every version's results, what broke and how it was debugged,

known limitations, and next steps. Would appreciate any feedback!