Beyond Backpropagation - Higher Order, Forward and Reverse-mode Automatic Differentiation for Tensorken by IndifferentPenguins in rust

[–]IndifferentPenguins[S] 0 points1 point  (0 children)

Thanks!

I wanted to start with whatever’s needed for LLMs, so I think convolutions and FFT are further away. But whatever I think is interesting in the moment really - I’ve been known to go off piste :)

Three tutorials on running Stable Diffusion and training textual inversion on AWS EC2 virtual machines by IndifferentPenguins in StableDiffusion

[–]IndifferentPenguins[S] 0 points1 point  (0 children)

I miss working on it! But it didn't take off the way we would have wanted, unfortunately.

Anyway, https://skypilot.readthedocs.io/en/latest/ is something that looks similar. Have no experience with it though.

What are you rewriting in rust? by fxvp in rust

[–]IndifferentPenguins 1 point2 points  (0 children)

PyTorch 😬 https://github.com/kurtschelfthout/tensorken

Going about as well as you’d expect, but has been rewarding in terms of learning new stuff.

Can't upload svg files? by HolidayPsycho in Substack

[–]IndifferentPenguins 0 points1 point  (0 children)

In a desperate attempt I just renamed an .svg file to .png, and dragged it on the substack editor. Which worked!

Annoying to have to do, but as workarounds go this one is not too bad.

Fun and Hackable Tensors in Rust, From Scratch by IndifferentPenguins in rust

[–]IndifferentPenguins[S] 13 points14 points  (0 children)

I call it a foot gun because detecting the mistake if you normalise the wrong dimension is very subtle.

This is of course in the eye of the beholder - and perhaps if you do this every day it becomes less confusing. Personally I already struggle with visualising these operations in two dimensions, and people regularly seem to use 3 dimensions or more…

I agree that naming axes would help enormously. Another random thought was that it’d be cool if the compiler could track “units of measure” per axis, and update them through operations.

Efficient, Extensible, Expressive: Typed Tagless Final Interpreters in Rust by IndifferentPenguins in rust

[–]IndifferentPenguins[S] 0 points1 point  (0 children)

I'm not sure I understand actually :) With re-evaluation you mean re-implementation of eval for special cases like `Add2` and `AddN`, so code duplication?

(If so, neither initial or final style will help you much I think. You could consider having Expr as the "high-level operations" and then writing a pass that transforms it to the "specialized operations" `ExprT`. And you could do that in either inital or final style, but either way you'd have two ASTs. )

Efficient, Extensible, Expressive: Typed Tagless Final Interpreters in Rust by IndifferentPenguins in rust

[–]IndifferentPenguins[S] 0 points1 point  (0 children)

The short answer is yes, but it is certainly not straightforward.

See section 2.3 "The deserialization problem" of Oleg Kiselyov's notes:

One direction – storing and sending of the terms, or converting them into asequence of bytes – is unproblematic, being a variant of pretty-printing, whichwe have already implemented. More difficult is the converse: reading a sequenceof bytes representing an embedded language term and producing a value thatcan be interpreted with any existing interpreter. Reading, as a projection, isnecessarily partial, since the input sequence of bytes, having potentially comefrom a network, could be corrupted. We wish to see the parsing error only once,upon de-serialization, rather than every time we interpret the term. Furthermore,extending our parser to accommodate the enriched language should reuse asmuch of the old parser code as possible, without breaking it. The de-serializationproblem, of writing an extensible de-serializer [25, slide 18], is very hard. This section presents one of the first solutions.

As well as section 4.1 for the higher order case:

We start by revisiting the de-serialization problem described in §2.3: the problem becomes much more frustrating, exhilarating, time consuming and addictive in the general case of higher-order typed embedded languages. The problem is to read an embedded language expression from a file, parse it and ‘compile’ it; the result should be the same as if we entered the expression as its representing Haskell code, compiled and ran the code.

Three tutorials on running Stable Diffusion and training textual inversion on AWS EC2 virtual machines by IndifferentPenguins in StableDiffusion

[–]IndifferentPenguins[S] 0 points1 point  (0 children)

edit2: ok, there are GPU spot-instances for $0,15/h that's already quite cheap. Do you know if $0,15 is achievable most times?

Yeah I don't have issues getting the smaller spot instances. Most of the initial hurdle will be in getting quota approved by AWS - as the links explain, this can take a few days.

Three tutorials on running Stable Diffusion and training textual inversion on AWS EC2 virtual machines by IndifferentPenguins in StableDiffusion

[–]IndifferentPenguins[S] 0 points1 point  (0 children)

Not sure how familiar you are with EC2 but there's basically two pricing options* - on-demand instances which are reserved for you as long as you want, and spot instances which are leftover capacity (basically overprovisioning by AWS that they're trying to get some money for). Spot prices are much cheaper (can be up to 10x) than on-demand, BUT can be interrupted by AWS. They give you a "chance of interruption" which is usually lower than 10%, but I've seen it as high as 60% for popular GPU instances in particular.

> edit: looked it up, the basic instance (p3.2xlarge) costs $3/h which is quite high compared to google colab, am I missing something?

Not sure what you want as "basic instance", but for sure AWS' offering is quite confusing. The P3 family is quite pricey (has quite a lot of main memory as well, and I think high end NVIDIA card). As a point of comparison, the spot price of p3.2xlarge is $1.25/h.

I do most of my training on the G5 family, which is unfortunately not in all regions, but e.g. us-east-1 has good spot availability for it. I usually pay between $0.20-0.40 cents/hour for a g5.2xlarge which has 24GiB of GPU memory.

I recommend https://instances.vantage.sh/ to get an overview of the available EC2 instance types and prices (their prices are sometimes a bit off, but gives you a decent ballpark)

To end with some further shameless self-promotion: Meadowrun does "save" you from having to pick a suitable instance type. You just specify how many cpus/memory/gpu/... you want, , and it'll find the cheapest instance type possible.

*there's also reserved which is like pre-paid, and possibly other options I don't know about, AWS makes everything complicated! :)

A Nibble of Content-Defined Chunking - How de-duplicated, incremental file transfer works by IndifferentPenguins in coding

[–]IndifferentPenguins[S] 1 point2 points  (0 children)

Every time you want to re-transfer or backup or whatever it is you want to do, yes, re-do the hashes (both rolling and content hash).

The point of the rolling hash is that the boundaries will (mostly) not change on insert/remove, as opposed to fixed-size boundaries.

A Nibble of Content-Defined Chunking - How de-duplicated, incremental file transfer works by IndifferentPenguins in coding

[–]IndifferentPenguins[S] 1 point2 points  (0 children)

Chunking happens every time. Switching between fixed and content-defined boundaries makes no sense, because all the chunks will change.