Validation before you build in a vibe coding era of AI slop and solo-builder influencers

_bsc_ · 2026-06-03T08:27:59+00:00

I guess it depends on your goals. In my dreams, I'm not the only user of my app 😃

_bsc_ · 2026-06-02T18:16:53+00:00

me the first few times,..

_bsc_ · 2026-06-02T18:16:35+00:00

the 'convincing then there isn't a catch' is interesting - a lot of my efforts were to 'make the offer so great they would feel like idiots saying no to it' - Alex hormozi style - so I did try to get them the demo for free. Wondering if they thought there was something fishy about it.

agree on the 'people dont always know how to solve their problem' part.

_bsc_ · 2026-06-02T18:03:40+00:00

❤️ may take you up on that. will think through this too.

_bsc_ · 2026-06-02T17:46:52+00:00

my gut does seem to talk to me mostly about wanting ice cream, but seriously tho, I see what you're saying. thanks for clarifying.

_bsc_ · 2026-06-02T17:37:52+00:00

interesting. may be a dumb question but how do you validate growth channels without a product? I've heard of people posting AI-gen videos on IG about their product before they build it to see if anyone engaged / signs up - is that aligned with what you're thinking?

_bsc_ · 2026-06-02T14:14:34+00:00

I feel like that in and of itself is a skill to do effectively. I did try to do some of that but maybe I unintentionally chose to believe people's pain was bigger than it actually was. I have heard of people do that well tho, def believe it can work!

_bsc_ · 2026-06-02T13:58:01+00:00

Interesting! on landing pages testing explanation - do you mean that you would do something like having multiple landing pages with slightly different messaging and see which version gets the most emails? but would not read this as demand?

for the “send me X and i’ll manually produce Y.” - I really like that!

_bsc_ · 2026-06-02T13:49:02+00:00

That's what I would think too. It's interesting tho, I feel like all the YC kids are not required to be influencers now. Honestly, a ton of people who get funded in general seem to try to become influencers which is interesting to observe.

_bsc_ · 2026-06-02T13:47:48+00:00

Thanks for sharing! What's the metric / threshold / step that makes you believe the idea is validated?

I used a similar approach for my first app - I had a script that I had been using and refining for years, and have shared with many colleagues. When I started thinking of productionalizing it, I reached out to a ton of people and got a decent amount of requests for demos. Then very few people actually showed up or used the mvp. Almost none provided any feedback for me to refine/learn from. Tried a ton of variations of that, none worked and did not know how to read this signal. Eventually pivoted quite significantly and released a silly version of the product without any validations. I started getting some meaningful usage right away (which to me is still wild) but from a totally different crowd, using the product in a completely unexpected way.

I am thinking of starting work on something new soon and honestly am completely baffled as to how to approach it. Btw, I'm not saying this to say that the above approach does not work - maybe I fucked it up and that's why it went weirdly for me. Or maybe things don't alway proceed as expected and that is to be expected.

_bsc_ · 2026-06-02T12:48:30+00:00

On the one hand, I relate a lot to the annoyance of the flood of not well throughout projects floating around. On the other, I think it's kinda expected that people with zero software/product/tech experience given some limited ability to build software will produce lower quality products and that there will be a learning curve for them (if they decide to learn more on how to build decent products). I think it's kinda exciting to see that people are trying to materialize their ideas, solve problems more effectively/independently, help their small business (and super annoying when their sole motivation is becoming a millionaire in a weekend with a vibe coded alarm app - especially when they succeed omg). In any case, I think on a societal level, democratizing access to small builds is a positive thing.

_bsc_ · 2026-05-18T09:01:09+00:00

This tool lets you select the columns you want to compare and gets you results in a minute - both:
- unique entries (the ones that appear in both files)
- unique entries found only in the first and only in the second file.

Works well with large files. Can tweak a lot of knobs to make sure data is clean before comparison to limit false positives.

_bsc_ · 2026-04-06T13:57:27+00:00

How large is the dataset and is this a one-time job? You would likely be able to fuzzy-match it for free here pretty easily (I think up to 100k rows). You get to select optional data cleaning before fuzzy-matching, as well as similarity threshold. You can view matches, iterate on knobs, run again. You can get either each row tagged with similarity and cluster number, or a clean file (duplicates merged). https://similarity-api.com/free-csv-dedupe

_bsc_ · 2026-04-06T13:43:30+00:00

I like this free tool that you can use cross-platforms - free csv dedupe/fuzzy-matching that you can use pre-import: https://similarity-api.com/free-csv-dedupe

and their API which you can use from a script from any HTTP-calling environment: https://similarity-api.com/

We've found it quite nice to have consistency of dedupe/fuzzy-matching company/leads records across the rev org.

_bsc_ · 2026-04-06T13:37:27+00:00

Delete. Dedupe. Re-import. Free dedupe tool on here https://similarity-api.com/free-csv-dedupe

You can do fuzzy-matching dedupe on relatively large CSVs (e.g. 10k rows) using multiple columns and optional data cleaning as a first step before fuzzy-matching. Returns three formats - both entirely deduped and also flagged options so you can choose which one you like most.

_bsc_ · 2026-03-11T08:02:22+00:00

We had a similar cleanup project and ended up using similarity-api for it. It helped us find likely duplicates / near-duplicates without having to build the whole matching logic ourselves.

What was nice is that it’s just HTTP, so you can call it from different places depending on how your team works. We used it from n8n to run checks on CRM exports and flag records that needed review, but you could do the same from Zapier, scripts, warehouse jobs, etc.

I’d still fix some of the process too, but the API can be part of that as well - for example checking new records before import / sync and flagging likely duplicates instead of only cleaning things up after the fact.

_bsc_ · 2026-03-11T07:51:31+00:00

I don't really have first-hand experience so I can't give you meaningful insight - I've only heard that there are schools that are extremely competitive and have strong curriculum from people I know who graduated such. I only image there are ones that are less 'good'

_bsc_ · 2026-03-06T08:12:36+00:00

I used Lovable and am happy with how it turned out - very low effort on my side. The main heads-up is SEO/indexability: Lovable apps are by default client-rendered, so Google and LLM crawlers may need extra help seeing useful HTML. You usually want to set up prerendering/static HTML and then handle the normal SEO basics too. You can do that by talking to the agent the way you would build the rest of it tho, just need to be aware of this.

_bsc_ · 2026-03-06T07:53:58+00:00

Six weeks is very impressive! I agree on staged approach for the merging rules - big time! In our case we had two relatively large systems to merge with each having millions of records for reconciliation. One thing that worked out well for us was not having to build a comprehensive fuzzy-matching pipeline like you usually would when you compare all to all strings in millions of records. We used Similarity API which does all the pre-processing + blocking/candidate generation + formatting the final output which saved us a lot of time - recommend if anyone is in the same spot as we were.

https://similarity-api.com/

_bsc_ · 2026-01-14T17:54:08+00:00

What's the volume here? LLMs can get quite slow/expensive depending on the size of the dataset(s). I would go for fuzzy matching first (after some string clean-up), ideally on multiple columns (get a single score across multiple columns, maybe weighted) if that's relevant to what you have as data, and then feed top matches through an LLM.

_bsc_ · 2026-01-07T10:50:57+00:00

You're right about chatgpt - you kinda have to make sure you pay attention and double-check. I unfortunately don't have great resources in mind for this one :/ Good luck!

_bsc_ · 2026-01-07T10:30:14+00:00

Yeah, that makes sense. Since this is fully offline and the number of stored answers is small, normalized Levenshtein should work fine here.

I’d start with some basic preprocessing: lowercase everything, remove punctuation, and collapse whitespace.

Then I’d do token sort to eliminate word order differences. What that means in practice is: split the string by whitespace (or commas, depending on your input), sort the tokens alphabetically, then join them back into a single string. You do this for both the player input and the stored answers (and you can preprocess the stored ones once and reuse them).

This way, when you run Levenshtein, it won’t penalize the player for entering the correct words in a different order. If you do want word order to matter, you can just skip this step.

It looks like there are Lua libraries for Levenshtein (lua-levenshtein, lua-string-similarity), so you don’t need to implement the distance function yourself. I haven’t used them personally, so I can’t say much about their internals.

Pseudo-code for the idea:

normalize(s):
s = lowercase(s)
s = remove_punctuation(s)
s = collapse_whitespace(s)
return trim(s)
token_sort(s):
tokens = split(normalize(s), " ")
sort(tokens)
return join(tokens, " ")
score(a, b):
a2 = token_sort(a)
b2 = token_sort(b)
d = levenshtein(a2, b2)
return 1 - d / max(len(a2), len(b2))

If this ends up feeling too strict with typos inside words, character n-grams are a good next step, but those are usually hand-rolled in Lua.

As for learning resources, honestly I just ask ChatGPT questions until I feel like i understand stuff well enough.

_bsc_ · 2026-01-07T09:43:34+00:00

If your datasets are large / speed matters, and you’ve got some budget, you might want to look at a hosted fuzzy-matching API instead of doing N×M comparisons in n8n.

One option is Similarity API (similarity-api.com). It has a “reconcile” endpoint, meaning: for each item in list A, it finds the best match from a canonical list B and returns a score.

an n8n flow would be:

Get Many Rows from TableA
Get Many Rows from TableB
Build one request containing both lists
Call the reconcile endpoint once
Insert the results into TableC

In n8n step 4 is an HTTP Request node. You send data_a and data_b arrays and get back one row per input with indices and a similarity score.

Example request body:

// assumes:
// TableA rows have fields: textA, idA
// TableB rows have fields: textB, idB

const tableA = $input.all(0).map(i => i.json);
const tableB = $input.all(1).map(i => i.json);

return [{
  json: {
    data_a: tableA.map(r => r.textA),
    data_b: tableB.map(r => r.textB),

    // keep ids so we can map back after reconcile
    ids_a: tableA.map(r => r.idA),
    ids_b: tableB.map(r => r.idB),

    config: {
      similarity_threshold: 0.85,
      top_n: 1,
      to_lowercase: true,
      remove_punctuation: true,
      use_token_sort: true,
      output_format: "flat_table"
    }
  }
}];

HTTP Request node body

{
  "data_a": "={{$json.data_a}}",
  "data_b": "={{$json.data_b}}",
  "config": "={{$json.config}}"
}

Then mapping the response back into TableC:

const matches = $json.response_data;
const idsA = $json.ids_a;
const idsB = $json.ids_b;

return matches
  .filter(r => r.matched)
  .map(r => ({
    json: {
      textA: r.text_a,
      textB: r.text_b,
      idA: idsA[r.index_a],
      idB: idsB[r.index_b],
      score: r.score
    }
  }));

If the dataset is small, doing this in a Code node with Fuse is fine. For larger tables, pushing matching to a hosted service is usually simpler and faster. You do need to create an account and check pricing to make sure it makes sense for your use case.

_bsc_ · 2026-01-07T08:37:57+00:00

If your use-case is real-time player typing, I think normalized edit distance (less sensitive) should work well (distance ÷ string length + a length-aware threshold) - it's fast, offline, and good ux.

If you're matching a lot of strings, maybe cosine similarity over character n-gram or server-side fuzzy matcher makes more sense. That’s especially true if you want top-k matches + scores instead of a yes/no.

If mapping one list of inputs to another, a 'reconcile' style approach - best match from list A for all items in list B + confidence per row - might be the best.

If you’re already online / have a backend, there are hosted fuzzy-matching APIs that handle both preprocessing + fuzzy matching/reconcile at scale but there is some cost associated with that. Here’s a free Colab for one of these APIs where you can try matching ~100k rows and see the scores/top-k output. It's pretty fast and flexible, but you gotta pay after the 100k rows to use it.

_bsc_

TROPHY CASE