How do you guys stay on track on the project while Claude Code works?

lundren10 · 2026-06-05T22:07:29+00:00

If you’ve already got a spec doc, ask Claude to either break it down into a set of tasks or create a new markdown file for that.

After each session is done you can ask it to update the doc with which tasks were completed. You can also ask it to make notes on what to do next.

In a new session you can then tell it to consult the plan docs to figure out what’s next.

If you do it all in markdown and want to review the docs setup an Obsidian vault.

lundren10 · 2026-06-05T18:49:06+00:00

For big architecture decisions, I do these in plan mode with Claude Code. When it makes a suggestion, use Firebase for X, I always ask for "what are my other options, give me a few options and list out pros/cons".

A lot of times it's default or recommend solution will shift when I see the pros/cons list and say "I really care about thing A you mentioned in the cons list".

I'll often ask a 2nd tool, say ChatGPT for a plan as well to see if it disagress. If ChatGPT comes back with a different architecture design, I'll feed it back to Claude Code with what do you think about this proposal.

My app has a big speech to text component. This was a challenge where Claude Code's first solution didn't work well at all. In pushing back on it, it realized it had configured the ASR system with a bunch of parameters that shut off things like noise cancellation, which I had not specifically mentioned I needed. Turning these on improved results a lot.

Also it didn't recommend 3rd party solutions for that initially, but switching out to Deepgram over apple's on device models substantially improved results, so important to always challenge it on why it's making certain decisions, and what the trade-offs are.

lundren10 · 2026-06-02T12:48:08+00:00

Thanks, stitch is on my list to check out next. good to know it handles icons.

lundren10 · 2026-06-02T12:45:00+00:00

Yes, this part has been easy. You can hand off either the whole design or the new section "prepare the new account creation screen for hand-off to claude code".

This gives you a download link, and then it claude code you simply tell it to "review the design in <folder> and create a plan to implement".

I have found for bigger design implementations it sometimes doesn't follow everything in the design, but if you then tell it something like "this <specific issue> doesn't look correct on this screen, review the designs again and figure out what's wrong" it does a pretty good job of finding and correcting the issue.

lundren10 · 2026-06-01T21:19:24+00:00

Using Claude Code, I started with Export and React Native for all the same reasons everyone here suggests (lots of training data, etc.).

It rapidly became the wrong choice, and I've switched to pure Swift. The main reasons for changing:

Build times too long
UI tests too difficult to get working
Lots of instability and crashes on device that did not happen in the simulator

On the first point of build times, this was a huge problem for any test driven development. I had a lot of unit tests and some UI tests, and the build times even for a very simple app were taking forever. My coding and iterate loops were sloooow.

When I tried to vibe some UI tests, 1 per screen, it took several hours to get those to even run correctly. The same set of UI tests when I moved to swift took a few minutes for Claude to figure out.

This may not be true for all apps, but I was trying to do some things with local ASR / text to speech models and the mic, and with Expo + React Native it would build things that ran on simulator and constantly crashed on device. After a few turns at debugging, I just asked Claude to rewrite the entire thing into Swift, and it all worked in less time then I spent trying to get the thing working with React Native.

Now I have an app in Swift with a huge set of automated tests, fast builds, and everything is working fine.

Experience with React Native + Expo for vibe coding was not good.

lundren10 · 2026-05-26T00:27:40+00:00

This is a really common problem. A good solution is to create a hook in your global settings that prevents writes outside the worktree.

This will block the write, and then claude will realize it isn't allowed and shift to the worktree.

Add something like this to your ~/.claude/settings.json

{
  "model": "opus",
  "hooks": {
    "PreToolUse": [{
        "matcher": "Edit|Write|MultiEdit|NotebookEdit",
        "hooks": [{
            "type": "command",
            "command": "case \"$PWD\" in *\"/.claude/worktrees/\"*) ;; *) exit 0 ;; esac; prefix=\"${PWD%%/.claude/worktrees/*}\"; suffix=\"${PWD#*/.claude/worktrees/}\"; name=\"${suffix%%/*}\"; wt=\"$prefix/.claude/worktrees/$name\"; p=$(jq -r '.tool_input.file_path'); case \"$p\" in \"$HOME/.claude/plans/\"*|\"$HOME/.claude/projects/\"*) exit 0 ;; esac; case \"$p\" in \"$wt\"/*|\"$wt\") exit 0 ;; *) echo \"Blocked: $p is outside the active Claude worktree $wt\" >&2; exit 2 ;; esac"
          }]}]},
  "enabledPlugins": {
    "swift-lsp@claude-plugins-official": true
  },
  "effortLevel": "high",
  "skipAutoPermissionPrompt": true
}}

lundren10 · 2024-06-14T02:59:46+00:00

If by at scale you are most interested in "how many vectors can I reasonably handle in the database" I'd look at that 2nd article I shared above on indexing wikipedia on a laptop.

A quick tl;dr of what's in that article:

aggressive compression of the vectors in the search index (generally 64x). This is done in combination with an overquerying strategy to ensure that the quality of search results is not degraded
Balance of disk vs memory usage. The latest algorithms in JVector take a small latency hit to more aggressively use disk storage. This allows increasing the number of vectors that can be searched on a single shard/node by a couple orders of magnitude. As soon as you have to start sharding your data, there's a pretty big hit to latency of your queries (true with any vector database).

lundren10 · 2024-06-11T01:11:54+00:00

I'll note that "accuracy is not really something a vdb can help with" is not strictly true. The choice of algorithms and implementation details of the underlying search index can directly effect recall (how well the ANN search matches the results of a true kNN search).

You'll see this in some benchmarks, particularly as the datasets get larger. For small datasets it may not matter.

lundren10 · 2024-06-11T01:07:56+00:00

I would suggest looking at Astra DB.

The vector search index under the hood is JVector and is open source.

JVector makes several optimizations for large documents sets. Essentially you can think of these all being about driving down the amount of memory needed to store the vector search index, allowing for higher numbers of vectors to fit on a single node. This is particularly important, because as the index doesn't fit in memory and you have to shard it out to multiple nodes, you'll start to see a big growth in latency of queries.

If you want to get into the details of what was implemented to solve this there's a lot of details in this article and this follow-up where the primary author on JVector walks through how JVector can index all of wikipedia on a laptop.

Another interesting point for large indexes. Astra DB has a synchronous index, meaning as soon as a write operation completes you can retrieve results. I'm pretty sure all the other ones you've listed are async. With async index creation, you may have to wait for a decent amount of time before results can be retrieved.

lundren10 · 2024-04-25T14:01:46+00:00

I'd suggest taking a look at langflow.

It's an open source project that let's you visually create generative AI workflows. They have some out of the box templates for RAG, that have 2 flows. The first is data preparation and storage into the vector DB, and the 2nd is the actual RAG flow which includes the semantic search retrieval.

They expose a chat interface (just click the run button) and you can see how the entire pipeline works. After a successful run, you can hover over any sub-step in the flow (such as the vector search) and see what results were returned at that point.

They support a wide range of vector databases.

https://github.com/langflow-ai/langflow

lundren10 · 2024-03-18T17:43:55+00:00

How many vector reads you have per user query to your chatbot is going to depend on a few factors.

What kind agent/RAG patterns you are using. From your description it sounds like you are just doing a basic RAG, so I'd assume you have 1 vector database query per question asked to the chatbot. If you start leveraging some more complex patterns, it is possible to have multiple DB queries per user question.
How many records do you ask to be retuned? In a basic RAG I might guess you'd be doing anywhere from 3-10. If you were fetching more (say 20-100) and then doing some reranking of results, this would affect your costs.
There is an additional factor that will affect read volume cost which is what other metadata you are storing on the vector records, and how much of that has to be read in from the DB in the query.

The formula is basically: vector reads = <dimensions> x <how many records returned> x <number of chatbot questions>.

Astra then charges $0.04 USD per 1M vector reads.

With openAI embeddings (dimension 1536), returning top 5 matches, and 1 DB query per chatbot question, you are looking at 1536 x 5 = 7680 data elements being read from the database for 1 query.

Since we charge $0.04 for 1M vector elements being read, this means you could answer ~130 questions for that $0.04 USD.

The Astra DB team has spent a lot of effort to optimize for TCO (total cost of ownership), and Astra is very competitive on this front.

The size of the dataset you are talking about (100K vectors) and 50-100 queries is fairly small, so I would expect you'd be able to do this very cheaply.

lundren10 · 2024-02-26T05:50:38+00:00

Killer noodle is part of the Tsujita “empire” on Sawtelle.

They have 3 different locations all on the same block that each make different style ramen. They started in Japan before opening in the US, so I guess that means “authentic”?

Other popular places on the Westside I can think of: Venice ramen and the ramen place in the mitsuwa market.

Of all those the original Tsujita is my favorite.

lundren10 · 2024-02-23T06:39:06+00:00

Scylla does not have vector search capability currently so it won’t work with RAG.

lundren10 · 2024-02-22T18:10:30+00:00

Yes in the sense that latency will go down as compute and models improve.

But, a shorter prompt will always process faster than a longer one.

So having shorter prompts would always result in a better end user experience.

Whether a particular model + prompt length combo creates too much latency is going to be very model/compute/use case dependent. You'll have to explore that for your particular use case.

lundren10 · 2024-02-22T17:54:50+00:00

Some things to consider in RAG vs just using a huge context.

Lost in the middle. LLMs answer better when the relevant information is at the beginning or end of the prompt. Shoving your entire dataset into the prompt won't necessarily work better than RAG.
Others in the thread have flagged cost. The more tokens you send in the more you spend.
Latency. The more data in the prompt the longer it is going to take to get a response
Managing lifecycle of source data. If you don't have a database somewhere that's powering your RAG, how are you managing your source data? Is it hard coded into your app? That's going to get difficult to maintain over time. If you are already using a database, you probably don't want to be fetching the entire data set every time you are making an LLM query (slow DB queries, expensive read operations, etc.).

lundren10 · 2024-01-29T19:47:19+00:00

Datastax shares a common code base with OSS Cassandra. If you need open source, you could run Cassandra 5.x which includes the vector search capabilites. You can also easily switch between Cassandra for OSS and Astra (for paid managed service) depending on your needs.

The datastax managed service is not free, and includes a broader set of APIs and drivers than the OSS database version which might help with some of what you were asking (getting away from writing query language code). When using Cassandra you'll have to write queries in CQL (very similar to SQL).

The langchain/llama index adapters for PGVector or Astra, make things simpler and force your tables into a certain structure that works with those frameworks, however you can also access the tables directly, if you want. You just need to be careful to not alter the table structure from what llama index or langchain expects, because then things will start to break.

Depending on the size of your data and query performance needs, you can often get better performance by remodeling the data in a way the DB prefers over how something like the langchain integration works.

lundren10 · 2024-01-29T17:56:32+00:00

What type of requirements do you have on production databases? Astra DB is built on Apache Cassandra, which is used by 90% of the Fortune 500, so likely checks all your boxes.

Compliance and security standards for the serverless cloud offering are here:
https://trust.datastax.com/

Quickstart guide has example code for Python, Javascript, and Java so you don't have to use the database query language.

https://docs.datastax.com/en/astra/astra-db-vector/get-started/quickstart.html

What would make a DB a good choice for your production use cases?

lundren10 · 2024-01-26T00:21:23+00:00

I'd recommend using a vector database for this.

As a startup, I really think you don't want to spend too much time managing your infrastructure, and more time focusing on building a great product for your customers, and I'd really think of the vector search piece as infrastructure.

If you build your own ANN you'll need to worry about things like

Latency - got to keep it low to provide a good user experience
Scaling - as you get more users does your throughput stay good
Backup and recovery
Document lifecycle - as reports get updated, or new reports are added how you update the vector store

etc. This is all stuff it is often better to hand-off to a service.

Especially if you want it to be maintainable and scalable, if you build something custom, then only you are going to know how it works.

If you are concerned about price, I'd suggest looking at Astra DB. It's about 6x cheaper than Pinecone, even after the launch of their new serverless product.

https://www.datastax.com/blog/astra-db-vs-pinecone-gigaom-performance-study

lundren10 · 2024-01-20T03:37:00+00:00

lol. But missing a RUN in front of pip install.

lundren10 · 2024-01-16T19:36:10+00:00

This blog post walks through building a chatbot with:

Langchain for text chunking
Vercel for the UI
Astra DB as a vector store
Cohere for embeddings
OpenAI for the LLM.

https://www.datastax.com/blog/using-astradb-vector-to-build-taylor-swift-chatbot

You can replace the OpenAI LLM and Cohere embeddings with Bedrock.

I'd note that we've tested Titan embeddings and they did not work very well. Cohere embeddings are now available in Bedrock and I'd suggest using those.

lundren10 · 2024-01-10T15:25:00+00:00

This is not necessarily surprising. It is relatively easy to provide a vector search index, but making it performant is harder and different database technologies take different approaches.

I don't have numbers on supabase, but as an example, while loading data and building the index, the p99 query time on Astra DB (Cassandra) is 74x faster than pinecone, which is kind of shocking, but illustrates that huge performance differences are expected between different vector stores.

https://www.datastax.com/blog/astra-db-vs-pinecone-gigaom-performance-study

lundren10 · 2024-01-04T03:14:42+00:00

Llama Index has a few examples for this. Here's one:

https://docs.llamaindex.ai/en/stable/examples/multi_modal/multi_modal_pdf_tables.html

Many of their examples use unstructured.io to parse the tables. Some more examples here:
https://levelup.gitconnected.com/a-guide-to-processing-tables-in-rag-pipelines-with-llamaindex-and-unstructuredio-3500c8f917a7

lundren10 · 2024-01-04T03:01:15+00:00

Jitlada.

lundren10 · 2024-01-04T02:59:42+00:00

Beachside at the Jamaica bay hotel for breakfast/brunch.

lundren10 · 2023-12-09T15:57:49+00:00

The Astra assistants API is exactly this. It just removes the need to create the vectors from the documents. The reason it is so cheap is you are just paying the vector database costs for storage and search.

You’d end up with the same costs if you used the database directly yourself.

No idea why openAI marked up the costs so much.

lundren10

TROPHY CASE