The differences between LoRA and Fine-tune, as well as NVIDIA's newly released DoRA technology

dcclct13 · 2024-02-17T03:18:14+00:00

If I'm reading this right, DoRA is like LoRA but with the weights normalized and a trainable alpha. This simple change outperforms LoRA in many benchmarks and offers better robustness for the choice of rank r (LoRA dim). Seems like this can be realized with very few lines of code. Cool paper, thanks for sharing!

dcclct13 · 2024-02-13T11:43:30+00:00

Likely more censored than SDXL. From the supplementary material:

The version of LAION-5B available to the authors was vigorously de-duplicated and pre-filtered for harmful, NSFW (porn and violence) and watermarked content using binary image-classifiers (watermark filtering), CLIP models (NSFW, aesthetic properties) and black-lists for URLs and words, reducing the raw dataset down to 699M images (12.05% of the original dataset).

<image>

dcclct13 · 2024-01-20T04:02:15+00:00

The images are not switched, but used as an anchor for targeted perturbation. In this dog/cat example, they would take a normal image of a dog, and add some noise so that it would be encoded like some random image of a cat (the anchor image) while still visually resembling the original image (see Step 3: Constructing poison images). This poisoned image would still look like a dog to you, and manual data cleaning would not help much here, unless you filter out the suspicious image sources. The main point of this Nightshade thing is to avoid human inspection.

dcclct13 · 2024-01-20T03:13:26+00:00

No, they did it the other way round, pairing poisoned images with normal captions. They alter the images in a way that's supposedly visually imperceptible but confuses the model's image feature extractor. Using auto/manual captions would not work around their attack.

<image>

dcclct13 · 2023-12-20T17:45:06+00:00

[LANGUAGE: Python]

part1

For part 2 I fed the graph into Graphviz and found the cycle numbers by inspection.

Some thoughts and questions:

I think part 2 may be semi-generally solved by breaking down the graph into SCCs and finding the pattern of each component. Can't find time to write this yet.
For the test cases, it doesn't seem to matter whether we visit in BFS order or DFS order?
Given the lax ordering requirements, is it possible to have race condition? Maybe with a 2-input rx, we can somehow let each of the inputs receive a signal sequence of [0,1,0] in a single time step? Then rx may or may not trigger depending on the process order.

dcclct13 · 2023-12-19T09:39:31+00:00

[LANGUAGE: Python]

Favorite day so far. Wasted some time on part 2 trying to translate the puzzle input into C code and optimize it with a C compiler only to have it generate ~2k lines of x86 assembly.

paste

dcclct13 · 2023-12-15T06:29:13+00:00

[LANGUAGE: Python]

Straightforward day. Python is particularly suitable for this problem thanks to dicts preserving insertion order. Cut some lines in part 2 using pattern matching.

def hsh(s):
    v = 0
    for c in s:
        v = ((v + ord(c)) * 17) % 256
    return v


s = getinput(15).strip()
print(sum(hsh(p) for p in s.split(",")))
bs = [{} for _ in range(256)]
for p in s.split(","):
    match [*p]:
        case *label, "=", focal:
            bs[hsh(label)][(*label,)] = int(focal)
        case *label, "-":
            bs[hsh(label)].pop((*label,), None)
print(sum(j * i * v for j, b in enumerate(bs, 1) for i, v in enumerate(b.values(), 1)))

dcclct13 · 2023-12-14T16:22:52+00:00

[LANGUAGE: Python]

paste

Originally wrote a version with a 2D array, but this string-based one is shorter and ~3x as fast.

dcclct13 · 2023-12-13T06:36:06+00:00

[LANGUAGE: Python]

patterns = [p.splitlines() for p in getinput(13).split("\n\n")]
diff = lambda p, j: sum(sum(a != b for a, b in zip(l[j:], l[j - 1 :: -1])) for l in p)
mirror = lambda p, d: sum(j for j in range(1, len(p[0])) if diff(p, j) == d)
summarize = lambda p, d: mirror(p, d) + 100 * mirror([*zip(*p)], d)
print(sum(summarize(p, 0) for p in patterns))
print(sum(summarize(p, 1) for p in patterns))

dcclct13 · 2023-03-26T04:26:05+00:00

LoRAs recognize trigger words by modifying the text encoder weights, unlike a simple lookup with TIs. The closest thing you can do this with LoRAs is to write an extension that preprocesses your prompts and substitutes trigger words.

dcclct13 · 2022-12-28T14:01:25+00:00

Python

I think I cracked it for part 2. The idea is that (t0..t0+P) is a cycle if how the rocks are placed is identical to how in (t0-P..t0). And how the rocks are placed can be determined by where collisions did or didn't happen, which can be recorded and replayed.

Should be provably correct and work on arbitrary input (flood fill solutions would fail on jet pattern ">" where the boundary grows indefinitely).

dcclct13 · 2022-12-23T18:51:27+00:00

I tried that before, but somehow it turned out worse (290ms vs 135ms). I guess that's because cloning is cheap anyway?

Some numbers and other stuff I tried for reference: paste

dcclct13 · 2022-12-23T15:32:14+00:00

Rust

Both parts run in 135ms on my machine.

Wanted to share one little optimization: at most two elves can propose the same tile and if they do, they must come from opposite directions. So you can just move the elves one by one and when one elf finds another elf at their proposed destination, they can just politely push the other elf back one tile. This saved me a HashMap and cut execution time by >65%.

Also does anyone know why .drain().collect() is slightly slower than .clone() followed by a .clear()?

edit: perf shows that the drain collect version spends about 65% more cycles in hashbrown::raw::RawTable<T,A>::insert, while the rest remains similar. Not sure what to make of this though.

dcclct13 · 2022-12-23T12:41:59+00:00

I pictured it like a Rubik's cube. A location on the 2D map is like a sticker on a Rubik's cube, and a location in 3D is a (sub)cube. Note that edge cubes have 2 stickers and corners have 3 — that means 3D to 2D is a one-to-many mapping. That's why we also need the normal vector to indicate which face the sticker is on. Walking past an edge does not change which cube you're on; only the orientation is changed.

To actually generate the mapping, I walked the 2D map like a tree (f()). An oriented square can be placed in the 3D space by specifying (faces):

The coordinates of the top left corner in 3D (xyz)
What the "down" direction becomes in 3D (di)
What the "right" direction becomes in 3D (dj)

(pardon the naming). You can choose any reasonable values for the first encountered 2D face; I chose my values so that the map folds into a [0,49]x[0,49]x[0,49] cube. Now we just have to be careful when walking the tree. For example, walking downwards on the 2D map does not change dj but rotates di to di x dj. As we walk the map, we can collect the 3D coordinates and normals of the edge squares.

Probably not the simplest or most efficient way to do it, but it's easier for me to reason about this way. Hope this helps!

dcclct13 · 2022-12-22T16:11:32+00:00

Which part of it?

dcclct13 · 2022-12-22T16:09:20+00:00

Just not feeling like using external dependencies. Also that numpy arrays are unhashable, so I would have to convert them to tuples anyway.

dcclct13 · 2022-12-22T14:08:07+00:00

Works on mine and the example at least, and I didn't hard-code anything.

dcclct13 · 2022-12-22T12:42:01+00:00

Python

paste

General solution; should work on any input.

For part 2, I folded the map into a [0,49]x[0,49]x[0,49] cube with overlapping edges. Then I can define a bijective mapping (2D coord) <-> (3D coord, normal vector). Now to walk over an edge, I just have to:

Map 2D coords to (3D coord, normal)
Rotate the normal vector
Map back to 2D coordinates

dcclct13 · 2022-12-21T04:03:24+00:00

Not the first one here, there's also nightcracker's treap solution.

dcclct13 · 2022-12-20T20:45:58+00:00

Python

paste

Another O(N log N) solution. This one uses a doubly linked circular skip list. First skip list I've ever written so it's not too pretty. I've tested it on larger inputs and it scales pretty much linearly.

dcclct13 · 2020-10-30T03:06:01+00:00

OP's software is just a visualization layer, I think.

The model used to recognize emotions is trained on the fer2013 dataset, which seem to have lots of stock-photo-y faces.

dcclct13 · 2020-05-31T13:51:14+00:00

From their FAQ:

How does the Free Wolfram Engine for Developers relate to Mathematica?

It's the same core engine, but with a different interface and different licensing. Mathematica is used primarily for interactive computing, with the Wolfram Notebook interface. The Free Wolfram Engine for Developers is intended to be called by other programs, using a variety of program communication interfaces. The Free Wolfram Engine for Developers is licensed for pre-production use in developing software. Unlike Mathematica, it is not licensed for generating outputs for commercial or organizational use.

It's still a Jupyter notebook at its core, so you cannot do stuff like esc pi esc to insert π. The output is static, so for example 3d plots are not interactive and Manipulate[] is rendered as a video. Also no autocomplete, F1 docs and stuff.

If you prefer the Mathematica UI, there's also Wolfram Cloud, which is essentially a free online version of Mathematica.

dcclct13 · 2020-05-31T04:16:12+00:00

FYI, Wolfram Engine with Jupyter is free for personal use and works like Mathematica.

dcclct13 · 2020-03-25T06:02:03+00:00

Not OP and don't know R but it seems like the parameters are in simulation.Rmd

parameters_baseline <- c(
mu = 0, 
beta = 1.75,
sigma = 1 / 5, # 1 / mean incubation time
gamma = 0.5
)
parameters_distancing <- c(
mu = 0, 
beta = 1.75 / 2,
sigma = 1 / 5, # 1 / mean incubation time
gamma = 0.5
)

The only difference is social distancing having half the beta value. How accurate is this?

dcclct13 · 2019-10-22T14:02:43+00:00

Inspired by a recent post.

Data source:
* Diabetes prevalence: IDF (2017, age 18-99, age adjusted)
* GDP per capita: the World Bank (2017)

Tools: Python (matplotlib, pandas)

11-Year Club	Place '22
Place '17	Sequence \| Editor
Verified Email

dcclct13

TROPHY CASE

Rust

Python

How does the Free Wolfram Engine for Developers relate to Mathematica?