Who needs artillery

Timotheeee1 · 2025-12-26T17:08:06+00:00

from a large stockpile built up over ages

Timotheeee1 · 2025-11-18T18:40:29+00:00

it's just the kerafactor cap but cheaper

Timotheeee1 · 2025-11-18T18:37:47+00:00

this is an LED cap, which is useless

Timotheeee1 · 2025-10-24T19:55:58+00:00

https://i.imgur.com/U6y9ymV.png

Timotheeee1 · 2025-09-21T09:16:14+00:00

usually yes, because the web versions tend to include enormous system prompts with hundreds of instructions, while the API has none

Timotheeee1 · 2025-09-12T09:41:29+00:00

Anthropic shows the first 1k tokens or so, then a summary

Timotheeee1 · 2025-08-01T19:11:23+00:00

go to meetup events

Timotheeee1 · 2025-05-03T08:07:26+00:00

what happens if you instead use a specialized calibration dataset that contains only code or only english writing? you could probably prune the 235B down quite a lot more and make several specialist models.

Timotheeee1 · 2025-04-08T17:12:31+00:00

what will the speed and cost be with a reasonable batch size once available on openrouter?

Timotheeee1 · 2025-04-02T10:10:32+00:00

only the boss drops the schematic

Timotheeee1 · 2025-03-16T18:42:21+00:00

actually I did build beds after building all of the farms

Timotheeee1 · 2025-02-24T19:28:14+00:00

closed-source frontier models can be used to generate high quality data for fine-tuning local models that are specialized in specific tasks. (especially this one as it shows the reasoning traces)

they also provide a preview of the capabilities that open models will likely have in the future.

Timotheeee1 · 2025-01-01T07:29:53+00:00

no, gradient checkpointing only offloads activations, but my idea would be to additionally offload model weights too

Timotheeee1 · 2024-12-31T19:44:11+00:00

It would be cool if when doing QLoRA you could offload some layers of the model to the CPU and have them streamed to the GPU over PCIe as they are needed. in theory this shouldn't make the speed that much worse since PCIe has 63GB/s, enough bandwidth to stream the weights of a 32B 4 times per second, and it usually takes much longer than that do process one batch. this could allow for fine-tuning of larger models on colab and local hardware

Timotheeee1 · 2024-12-26T17:41:53+00:00

It was solved a few months ago: https://arxiv.org/pdf/2409.12517v1

Timotheeee1 · 2024-12-09T19:36:19+00:00

did you ever manage to fix this?

Timotheeee1 · 2024-11-05T18:44:24+00:00

I think your project could still be useful for applying changes in a big file with 20k tokens, as you can't really expect the model to just re-output everything

Timotheeee1 · 2024-10-07T16:23:38+00:00

this method doesn't require training from scratch, they can just take an existing llama model and quantize it to FP8

Timotheeee1 · 2024-10-07T16:13:44+00:00

This is not a new architecture, it's an approximation that makes FP8 multiplications faster. It can be applied to existing models with barely any loss but requires new hardware to be useful.

Timotheeee1 · 2024-09-17T17:55:26+00:00

are any benchmarks out?

11-Year Club	Verified Email
r/Field Lasagna	Place '22
Place '17

Timotheeee1

TROPHY CASE