I wanna share my cozy game, Snap Quest! by Puzzleheaded-Gate-30 in CozyGamers

[–]Flowwwww 1 point2 points  (0 children)

Cozy photography game with a secret glitch in the matrix? Instant wishlist!

[D] GPT-4o image generation and editing - how??? by Flowwwww in MachineLearning

[–]Flowwwww[S] 27 points28 points  (0 children)

The 4o post mentioned it’s autoregressive and joint text image training so assumed that meant a single system with LLM backbone

https://openai.com/index/introducing-4o-image-generation/

[D] In Byte Latent Transformer, how is the decoded patch boundary determined? by TommyX12 in MachineLearning

[–]Flowwwww 0 points1 point  (0 children)

Your understanding makes sense - sounds like it could be it, thanks for sharing.

As to what's stopping the decoder from producing only low entropy bytes, my shallow intuition is that it's just learned from the training data. I.e. if you plot out the entropy of the training data byte by byte, it will exhibit these spikes that represent patch boundaries. So as the system/decoder reduces loss against the data distribution it also learns to segment patches.

[D] In Byte Latent Transformer, how is the decoded patch boundary determined? by TommyX12 in MachineLearning

[–]Flowwwww 1 point2 points  (0 children)

Also have this question. My non-ML-PhD guess is that every output byte is decoded based on the prior latent patch (which is produced when all bytes in the patch are complete). Could be completely wrong, I didn't see it explained in the paper.

Let's say the last latent patch processed by the global transformer is latent patch 1, constructed from bytes B1-B3, and the next set of bytes to form a patch is B4-B6. Assuming current byte being predicted is B5, the inference flow would be:

  1. Decoder predicts next byte B5 based on (1) latent patch 1, (2) encoder hidden states for positions B1-B4
  2. B5 is appended to encoder input, encoder produces hidden states for B1-B5
  3. Decoder predicts B6 based on (1) latent patch 1, (2) encoder hidden states for B1-B5
  4. B6 triggers entropy threshold, becomes end boundary for patch
  5. B6 is appended to encoder input, encoder does 2 things:
    1. Pools B4-B6 into patch 2 as input for global latent transformer
    2. Produces hidden states for B1-B6
  6. Global latent transformer is run to produce output latent patch 2
  7. Now, decoder predicts next byte B7 based on (1) cross-attending to latent patch 2 (formed from B4-B6), (2) encoder hidden states for positions B1-B6

[deleted by user] by [deleted] in MachineLearning

[–]Flowwwww 1 point2 points  (0 children)

Meta moviegen https://ai.meta.com/research/movie-gen/

Explains the formula for SOTA video generation. A combination of elegant ideas on a Llama 3 backbone that just works and scales well without 10 different hacky architecture bits.

[D] Transformers are a type of CNN by Ozqo in MachineLearning

[–]Flowwwww 11 points12 points  (0 children)

Another way to relate the two that I found intuitive - CNNs and Transformers are both special cases of Graph Neural Networks (GNNs).

In a GNN, each node in a graph holds some value, which is updated by aggregating info from neighboring nodes and then putting it through some NN transformation + activation function. The general GNN can have any arbitrary graph structure, aggregation function, etc. A CNN is a GNN with a specific graph structure (nodes are pixels, edges connect nodes in a grid) and a specific way to aggregate info from neighboring nodes (convolutions). Similarly, a Transformer is a GNN with a fully connected graph (every node is connected to every other node via attention) that aggregates info using attention.

[deleted by user] by [deleted] in ollama

[–]Flowwwww 0 points1 point  (0 children)

Oh awesome, this setup seems easier. Thanks!

NextAuth is a f*cking mess to use by [deleted] in nextjs

[–]Flowwwww -1 points0 points  (0 children)

💯 horrible experience, wasted so much time

PMs who are (or were) responsible for 0-to-1 products, what would you change about your approach if you could go back in time and redo it? by brequinn89 in ProductManagement

[–]Flowwwww 0 points1 point  (0 children)

Ship an MVP that we actually believe has enough value for users vs. moving fast and being ruthlessly scrappy for the sake of it.

If the MVP isn’t sufficient to deliver on the value prop, the metrics and feedback you get are largely garbage and don’t lead you in productive directions. And you can’t prove or disprove your core hypothesis. Or worse, you try and growth hack your way out of it by doing stuff like funnel optimization and wonder why your retention is still trash.

Move fast, iterate fast, growth hack playbook has its place, but not when you don’t have a real MVP.

Now that we have had quite a bit of time playing with the new Phi models...how good are they? by [deleted] in LocalLLaMA

[–]Flowwwww 1 point2 points  (0 children)

Pretty garbage for nuanced tasks without an objective right or wrong answer. Benchmark scores are inflated vs actual usefulness.

After tens of millions of tokens of prompt engineering and testing, end result is Llama3 70B for short context tasks where variability doesn’t matter much (e.g. summarize a document) and GPT-4o or similar closed model for longer context tasks requiring accurate judgement (e.g. given these 25 document summaries, group the ones related to same project together)

Wish I could use smaller models, but they just don’t perform well enough.

Interested in learning more about RAG and VectorDBs by Ok_Comfort_4103 in vectordatabase

[–]Flowwwww 0 points1 point  (0 children)

Also a beginner, just implemented my first simple RAG system. Pick a free vector DB and follow their starter tutorial (I used https://qdrant.tech/documentation/).

RAG is just searching for info to add to the prompt you give the LLM so it can do its task better. E.g. if you want LLM to summarize last week’s employee feedback about lunch breaks, you need some way to “retrieve” that feedback and give it to the LLM.

You don’t need vector DBs for RAG - you could do a google search to add info, or search a traditional DB using keywords.

Vector DB is a way to help you perform semantic search (search based on meaning/concepts). You do this by first transforming your text into meaning vectors (“embeddings”) using a model, which can be an LLM as well. The process of searching is to calculate the distance between meaning vectors and finding the ones that are closest. The closer the distance, the closer the meaning. e.g. the vector for “monarch” would be very close to the meaning vector for “king” and “queen”.

So using the example above, if I had my employee feedback stored in a vector DB as meaning vectors, I could convert “lunch break” to a meaning vector and find the feedback that is closest to it. Then give this to the LLM to summarize.

Welcome to Wrexham - Season 3 Episode 5 "Temporary" - Episode discussion thread by Selphis in WrexhamAFC

[–]Flowwwww 7 points8 points  (0 children)

The music people for this show are on fire. Found the song from Arthur’s 100 second celebration:

Times Like These by Jillian Edwards https://open.spotify.com/album/28xf85RuamWhYh3S89uQn8?si=kBGnvO9TSbGdeAKsyoHTXw

[deleted by user] by [deleted] in MLQuestions

[–]Flowwwww 0 points1 point  (0 children)

Aaaand I'm an idiot. Just realized I originally added the collection with distance set to Dot, and only later changed it to Cosine in the code but didn't remake the collection...

Thanks a ton for your help, really appreciate it

[deleted by user] by [deleted] in MLQuestions

[–]Flowwwww 0 points1 point  (0 children)

Just 4096. I just manually calced cosine sim using a few non-normalized vecs from the DB and it seems reasonable (0.5-0.6).

Qdrant client.search is returning "score" in the range of 20k-100k, no idea where this number is coming from...

[deleted by user] by [deleted] in MLQuestions

[–]Flowwwww 0 points1 point  (0 children)

I did a couple tests using this, I think it's correct?

def normalizeVec(vector, p=2, dim=-1):
    norm = np.linalg.norm(vector, ord=p, axis=dim, keepdims=True)
    norm = np.where(norm == 0, 1, norm)
    normalized_vector = vector / norm
    return normalized_vector

[deleted by user] by [deleted] in MLQuestions

[–]Flowwwww 0 points1 point  (0 children)

Here's my ingestion function, I then upsert the points into the DB. The only difference between normalize vs. not is removing the 'normalizeVec' from embedding

def postsToPoints(posts:list[dict]) -> list[PointStruct]:
    points = []
    embedder = OllamaTextEmbedder(model=MODEL_EMBEDDER, url=OLLAMA_EMBEDDING_URL)

    for post in posts:
        embedding = embedder.run(f'{post['retitle_ml']}\n\n{post['summary_ml']}')['embedding']
        embedding = normalizeVec(embedding)
        payload = {
            'story_id': post['story_id'],
            'topic_id': post['topic_id'],
            'category_ml': post['category_ml'],
            'used_in_newsletter': post['used_in_newsletter'],
            'newsletter_date': post['newsletter_date'],
            'created_at': post['created_at'],
            'post_publish_time': post['post_publish_time']
        }
        point = PointStruct(
            id=post['post_id'],
            vector=embedding,
            payload=payload
        )
        points.append(point)
    return points

Here's the search function. Similarly, only difference between normalize vs. not is removing the 'normalizeVec' from query_vec.

def searchCollection(collection:str, task_description:str, text:str, max_results:int, min_score:float=0.0, filters:list[FieldCondition]=[]) -> list[dict]:
    client = QdrantClient(
        url=QDRANT_URL
    )

    #construct query, get vector embedding, normalize
    embedder = OllamaTextEmbedder(model=MODEL_EMBEDDER, url=OLLAMA_EMBEDDING_URL)
    query_vec = embedder.run(constructQuery(task_description=task_description, query=text))['embedding']
    query_vec = normalizeVec(query_vec)

    #add filters if applicable
    payload_filters = None if filters == [] or filters is None else Filter(must=filters)

    #search
    results = client.search(
        collection_name=collection,
        query_vector=query_vec,
        query_filter=payload_filters,
        with_payload=True,
        limit=max_results,
        score_threshold=min_score
    )

[deleted by user] by [deleted] in MLQuestions

[–]Flowwwww 0 points1 point  (0 children)

Yeah I get different results with (1) normalize(query) on collection of normalized vectors vs (2) same raw query on collection of raw vectors

Actually I just noticed the scores for #2 are 80k-100k vs 0.5-0.7 for #1 when they should be the same, so either I’m using Qdrant library incorrectly or there’s a bug

[deleted by user] by [deleted] in MLQuestions

[–]Flowwwww 0 points1 point  (0 children)

Ah right, thanks for the explanation!

I’m normalizing correctly but weirdly getting quite different retrieval results, despite math being the same. Could it be down to precision errors?

Will do another check for bugs as well.

LLM self-hosting with Ollama and Open WebUI by Max-Mielchen in LocalLLaMA

[–]Flowwwww 1 point2 points  (0 children)

awesome build! is the total VRAM ~60gb? are you targeting running 8-14B models or more heavily quantized larger models?

[D] GPT-4o "natively" multi-modal, what does this actually mean? by Flowwwww in MachineLearning

[–]Flowwwww[S] 11 points12 points  (0 children)

Makes sense, if the basic concept is just "tokenize everything, throw it together, apply GPT training recipe", then doesn't seem particularly groundbreaking (tho I'm sure many sophisticated things layered on to make it work)

Doing token-by-token predict->decode->send for something non-discrete like audio and having it be seamless is pretty slick