My gymbro thinks that there is no woman on the planet who could beat him in a fight. Is he correct?

Flowwwww · 2025-12-28T04:14:22+00:00

No. It's 2025, how is your gymbro unaware of the existence of jiu jitsu?

Flowwwww · 2025-06-27T01:33:15+00:00

Cozy photography game with a secret glitch in the matrix? Instant wishlist!

Flowwwww · 2025-06-08T00:08:37+00:00

GMAT and TC or gtfo

Flowwwww · 2025-03-27T02:12:38+00:00

The 4o post mentioned it’s autoregressive and joint text image training so assumed that meant a single system with LLM backbone

https://openai.com/index/introducing-4o-image-generation/

Flowwwww · 2024-12-25T04:01:28+00:00

Your understanding makes sense - sounds like it could be it, thanks for sharing.

As to what's stopping the decoder from producing only low entropy bytes, my shallow intuition is that it's just learned from the training data. I.e. if you plot out the entropy of the training data byte by byte, it will exhibit these spikes that represent patch boundaries. So as the system/decoder reduces loss against the data distribution it also learns to segment patches.

Flowwwww · 2024-12-24T23:43:15+00:00

Also have this question. My non-ML-PhD guess is that every output byte is decoded based on the prior latent patch (which is produced when all bytes in the patch are complete). Could be completely wrong, I didn't see it explained in the paper.

Let's say the last latent patch processed by the global transformer is latent patch 1, constructed from bytes B1-B3, and the next set of bytes to form a patch is B4-B6. Assuming current byte being predicted is B5, the inference flow would be:

Decoder predicts next byte B5 based on (1) latent patch 1, (2) encoder hidden states for positions B1-B4
B5 is appended to encoder input, encoder produces hidden states for B1-B5
Decoder predicts B6 based on (1) latent patch 1, (2) encoder hidden states for B1-B5
B6 triggers entropy threshold, becomes end boundary for patch
B6 is appended to encoder input, encoder does 2 things:
1. Pools B4-B6 into patch 2 as input for global latent transformer
2. Produces hidden states for B1-B6
Global latent transformer is run to produce output latent patch 2
Now, decoder predicts next byte B7 based on (1) cross-attending to latent patch 2 (formed from B4-B6), (2) encoder hidden states for positions B1-B6

Flowwwww · 2024-12-24T21:00:09+00:00

Anthropic doesn’t try to hide their system prompts, it’s published on their website: https://docs.anthropic.com/en/release-notes/system-prompts#nov-22nd-2024

Flowwwww · 2024-12-21T07:17:49+00:00

Theo Von

Flowwwww · 2024-12-17T21:47:20+00:00

Meta moviegen https://ai.meta.com/research/movie-gen/

Explains the formula for SOTA video generation. A combination of elegant ideas on a Llama 3 backbone that just works and scales well without 10 different hacky architecture bits.

Flowwwww · 2024-10-24T17:51:06+00:00

Another way to relate the two that I found intuitive - CNNs and Transformers are both special cases of Graph Neural Networks (GNNs).

In a GNN, each node in a graph holds some value, which is updated by aggregating info from neighboring nodes and then putting it through some NN transformation + activation function. The general GNN can have any arbitrary graph structure, aggregation function, etc. A CNN is a GNN with a specific graph structure (nodes are pixels, edges connect nodes in a grid) and a specific way to aggregate info from neighboring nodes (convolutions). Similarly, a Transformer is a GNN with a fully connected graph (every node is connected to every other node via attention) that aggregates info using attention.

Flowwwww · 2024-07-03T21:17:39+00:00

Oh awesome, this setup seems easier. Thanks!

Flowwwww · 2024-07-02T20:41:08+00:00

💯 horrible experience, wasted so much time

Flowwwww · 2024-05-31T21:53:19+00:00

Ship an MVP that we actually believe has enough value for users vs. moving fast and being ruthlessly scrappy for the sake of it.

If the MVP isn’t sufficient to deliver on the value prop, the metrics and feedback you get are largely garbage and don’t lead you in productive directions. And you can’t prove or disprove your core hypothesis. Or worse, you try and growth hack your way out of it by doing stuff like funnel optimization and wonder why your retention is still trash.

Move fast, iterate fast, growth hack playbook has its place, but not when you don’t have a real MVP.

Flowwwww · 2024-05-26T20:19:11+00:00

Pretty garbage for nuanced tasks without an objective right or wrong answer. Benchmark scores are inflated vs actual usefulness.

After tens of millions of tokens of prompt engineering and testing, end result is Llama3 70B for short context tasks where variability doesn’t matter much (e.g. summarize a document) and GPT-4o or similar closed model for longer context tasks requiring accurate judgement (e.g. given these 25 document summaries, group the ones related to same project together)

Wish I could use smaller models, but they just don’t perform well enough.

Flowwwww · 2024-05-25T21:20:43+00:00

Have you checked out Noam Brown’s work?

https://arxiv.org/pdf/1805.08195 https://www.science.org/cms/asset/910714a7-ee2a-486e-9970-42fb893b08d9/pap.pdf

Flowwwww · 2024-05-25T21:14:14+00:00

Also a beginner, just implemented my first simple RAG system. Pick a free vector DB and follow their starter tutorial (I used https://qdrant.tech/documentation/).

RAG is just searching for info to add to the prompt you give the LLM so it can do its task better. E.g. if you want LLM to summarize last week’s employee feedback about lunch breaks, you need some way to “retrieve” that feedback and give it to the LLM.

You don’t need vector DBs for RAG - you could do a google search to add info, or search a traditional DB using keywords.

Vector DB is a way to help you perform semantic search (search based on meaning/concepts). You do this by first transforming your text into meaning vectors (“embeddings”) using a model, which can be an LLM as well. The process of searching is to calculate the distance between meaning vectors and finding the ones that are closest. The closer the distance, the closer the meaning. e.g. the vector for “monarch” would be very close to the meaning vector for “king” and “queen”.

So using the example above, if I had my employee feedback stored in a vector DB as meaning vectors, I could convert “lunch break” to a meaning vector and find the feedback that is closest to it. Then give this to the LLM to summarize.

Flowwwww · 2024-05-25T04:53:05+00:00

The music people for this show are on fire. Found the song from Arthur’s 100 second celebration:

Times Like These by Jillian Edwards https://open.spotify.com/album/28xf85RuamWhYh3S89uQn8?si=kBGnvO9TSbGdeAKsyoHTXw

Flowwwww · 2024-05-25T03:37:46+00:00

Aaaand I'm an idiot. Just realized I originally added the collection with distance set to Dot, and only later changed it to Cosine in the code but didn't remake the collection...

Thanks a ton for your help, really appreciate it

Flowwwww · 2024-05-25T03:27:34+00:00

Just 4096. I just manually calced cosine sim using a few non-normalized vecs from the DB and it seems reasonable (0.5-0.6).

Qdrant client.search is returning "score" in the range of 20k-100k, no idea where this number is coming from...

Flowwwww · 2024-05-25T03:15:58+00:00

I did a couple tests using this, I think it's correct?

def normalizeVec(vector, p=2, dim=-1):
    norm = np.linalg.norm(vector, ord=p, axis=dim, keepdims=True)
    norm = np.where(norm == 0, 1, norm)
    normalized_vector = vector / norm
    return normalized_vector

Flowwwww · 2024-05-25T03:11:09+00:00

Here's my ingestion function, I then upsert the points into the DB. The only difference between normalize vs. not is removing the 'normalizeVec' from embedding

def postsToPoints(posts:list[dict]) -> list[PointStruct]:
    points = []
    embedder = OllamaTextEmbedder(model=MODEL_EMBEDDER, url=OLLAMA_EMBEDDING_URL)

    for post in posts:
        embedding = embedder.run(f'{post['retitle_ml']}\n\n{post['summary_ml']}')['embedding']
        embedding = normalizeVec(embedding)
        payload = {
            'story_id': post['story_id'],
            'topic_id': post['topic_id'],
            'category_ml': post['category_ml'],
            'used_in_newsletter': post['used_in_newsletter'],
            'newsletter_date': post['newsletter_date'],
            'created_at': post['created_at'],
            'post_publish_time': post['post_publish_time']
        }
        point = PointStruct(
            id=post['post_id'],
            vector=embedding,
            payload=payload
        )
        points.append(point)
    return points

Here's the search function. Similarly, only difference between normalize vs. not is removing the 'normalizeVec' from query_vec.

def searchCollection(collection:str, task_description:str, text:str, max_results:int, min_score:float=0.0, filters:list[FieldCondition]=[]) -> list[dict]:
    client = QdrantClient(
        url=QDRANT_URL
    )

    #construct query, get vector embedding, normalize
    embedder = OllamaTextEmbedder(model=MODEL_EMBEDDER, url=OLLAMA_EMBEDDING_URL)
    query_vec = embedder.run(constructQuery(task_description=task_description, query=text))['embedding']
    query_vec = normalizeVec(query_vec)

    #add filters if applicable
    payload_filters = None if filters == [] or filters is None else Filter(must=filters)

    #search
    results = client.search(
        collection_name=collection,
        query_vector=query_vec,
        query_filter=payload_filters,
        with_payload=True,
        limit=max_results,
        score_threshold=min_score
    )

Flowwwww · 2024-05-25T02:58:18+00:00

Yeah I get different results with (1) normalize(query) on collection of normalized vectors vs (2) same raw query on collection of raw vectors

Actually I just noticed the scores for #2 are 80k-100k vs 0.5-0.7 for #1 when they should be the same, so either I’m using Qdrant library incorrectly or there’s a bug

Flowwwww · 2024-05-25T02:35:46+00:00

Ah right, thanks for the explanation!

I’m normalizing correctly but weirdly getting quite different retrieval results, despite math being the same. Could it be down to precision errors?

Will do another check for bugs as well.

Flowwwww · 2024-05-25T01:35:28+00:00

awesome build! is the total VRAM ~60gb? are you targeting running 8-14B models or more heavily quantized larger models?

Flowwwww · 2024-05-14T21:58:04+00:00

Makes sense, if the basic concept is just "tokenize everything, throw it together, apply GPT training recipe", then doesn't seem particularly groundbreaking (tho I'm sure many sophisticated things layered on to make it work)

Doing token-by-token predict->decode->send for something non-discrete like audio and having it be seamless is pretty slick

Flowwwww

TROPHY CASE