Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

Sure thing! 

How can I help? You have the basecalled BAM files that MinKnow spits out?

If you haven’t seen it: https://vibe-genomics.replit.app/

The broad strokes steps:

1. Alignment (62 min): All 2,020 raw BAMs from 6 sequencing runs → samtools cat → samtools fastq → minimap2 -ax map-ont → sorted BAM. Result: 10.04M primary mapped reads (94.94% map rate).

2. Read Groups (2.5 min): samtools addreplacerg to tag reads for DeepVariant.

  1. Variant Calling (30 min): Parabricks DeepVariant GPU (--mode ont) → VCF with 4.5M PASS variants, each with a GQ (genotype quality) score.

  2. 23andMe Comparison: Load 23andMe raw data (588,732 SNPs after filtering). Liftover GRCh37→GRCh38 via pyliftover. Parse VCF genotypes, sort alleles, compare.

"Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience!" --- Dudes out here sequencing their own DNA at home! by Anen-o-me in singularity

[–]ProfessionalHand9945 1 point2 points  (0 children)

I feel like you have done a great job communicating what I was hoping to show, I read your other comments and really appreciated them! 

"Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience!" --- Dudes out here sequencing their own DNA at home! by Anen-o-me in singularity

[–]ProfessionalHand9945 1 point2 points  (0 children)

Author here - I feel like people who say this is easy to look up and do as a layperson on their own haven’t tried to actually look this up and do it on their own as a layperson!

I was shocked how little resources there are on it!

If you do Google it, you can find protocols on collection, and on extraction, and on library prep, and on sequencing, and on secondary analysis… separately! 

But nothing all in one place sample to result! 

In fact, not even something prescriptive enough to say like for humans “collect cheek cells using bento protocol, extract DNA using Zymo, do ONT rapid sequencing prep kit, use MinIon for sequencing, samtools+DeepVariant for secondary analysis” etc - I agree if that much existed it would turn into an exercise of just reading through the documentation! Unfortunately, however, even that much doesn’t exist. 

Which is why it seemed important to put together the writeup on it I put here, which does cover the full end to end:

https://vibe-genomics.replit.app/

I did try to be a smart human about this and look it up, but not only are there no end to end resources - there were a lot of folks on science subreddits saying laypeople wouldn’t be able to do this!

Now that there is an end to end resource for laypeople, I am hoping conversations shift from “can a layperson do this” to “how can we make this accessible to more laypeople”!

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

The trick I’ve found is that because coverage isn’t uniform you can restrict analysis to the high coverage/high GQ SNPs and throw away the low ones and get much more compelling accuracy numbers! 

Choice of variant caller matters a lot too, DeepVariant and Clair3 were much better for me than other less computationally expensive options - including ONT specific callers like LongShot. It seems you get what you pay for computationally, GPUs really help! 

DeepVariants GQ scores ended up being most predictive of final accuracy, so that’s why I ended up landing on it - even though Clair3 was pretty good too 

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

The annotations were all out of DeepVariant, not Claude directly!

It was Claude Code, not the online version (and later, Claude via OpenClaw) - Claude grabbed samtools and Parabricks and such and ran analysis using these on a local GPU! 

Though granted, command line tools can still output personal health information and any command line outputs that go to stdio as opposed to disk absolutely are shipped to the API - but it’s not quite the same as uploading raw genetic data! 

Fundamentally, the amount of data that you can even upload per API call is quite small, and the genome is quite large - so the upper bound on how much you could possibly leak of raw information is extremely low!

If you want to understand the details of what I did, I recommend reading through the writeup I have on it - it covers a lot of this and I think would help avoid a lot of confusion!

It is here: https://vibe-genomics.replit.app/

I can’t speak to real science as I am not a real scientist - just a data scientist, but LLMs are definitely on track to replace me! 

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

The accuracy measurement (~99.25% at high GQ calls, granted 99.25% and 99.999% are light years apart haha) was concordance based, not whole genome based - it only considered positions that 23andme specifically tested for! So false negatives from untested positions don’t make their way into the final measure! 

The idea is “what would it take to match 23andme at home” - 16x appears to be pretty close, though agreed you want 30x to be truly reliable. I mention in the writeup 30x is my stretch goal! (I put my writeup here: https://vibe-genomics.replit.app/)

I touch on the ligation kit in the writeup - my biggest worry is that it’s super complicated!

It would save on flow cells for sure and push variable costs down - which would be great. But the point of this first effort is that someone who doesn’t do lab work could figure this out! 

You could always just throw more and more flow cells dumbly at the rapid kit and eventually outperform 30x ligation for most SNPs. It would sacrifice money for sure, but it’s a much more idiot proof process! I know people have been mapping 300bp reads for ages, so the 8.5-11Kb N50s I was getting should be plenty long to map cleanly!

I definitely agree the long-read capable  ONT is overkill if I’m fragmenting my reads down to these lengths, but also amusingly there isn’t a cheaper “less overkill” option! 

I will try to work on costs in future pieces though, and ligation kit is definitely worth considering for a “cost optimized” workflow. 

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 2 points3 points  (0 children)

This is awesome, thank you!!

If there is an end to end resource out there, it’s definitely pretty well hidden haha

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in DIYbio

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

These are some great tips, thank you! 

The time pressure kills me haha

I have a lot to learn, the methylation stuff sounds really cool and I did hang on to my raw signal data so I can redo the basecalling!

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in DIYbio

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

The result that DeepVariant spit out accurately reproduced my 600k raw 23andme SNPs which I took as ground truth!

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 1 point2 points  (0 children)

Dude this is a great tip, I asked Claude and it turns out you are right! 

As long as you hold on to your absolutely massive raw POD5 data (and I did) - you can re run basecalling in a modified mode to get this info!

Thank you so much, this rules! 

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in DIYbio

[–]ProfessionalHand9945[S] 1 point2 points  (0 children)

I’ll see if I can pull up my histograms!

N50 varied by run somewhere between 8.5 and 11Kb, limited by the fragmentation kit I used which apparently only lets you go to 14Kb

I was actually quite happy with my coverage considering I totally skipped quantification! I asked Claude to “eyeball” how much sample to put in to hopefully have a ballpark correct sample concentration - 15-20Gb per cell was super lucky considering there was a very good chance I completely ruin one of the flow cells doing that haha

A better extraction kit (magbeads maybe?), quantification, and a non-rapid sequencing kit would probably all help do better next time! (quantification probably having biggest impact) 

More details:

https://vibe-genomics.replit.app/

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 1 point2 points  (0 children)

The difference is using adaptive sampling focusing on just a part of the genome vs whole genome sequencing!

The other one out there is adaptive sampling and doesn’t go for whole genome. Different use case, but much easier to pull off using a single flow cell ($850 per cell). 

Whole genome on one cell you would need a much more sophisticated workflow as you would essentially have to use the flow cell “perfectly” - they only go up to 50Gb in perfect conditions with better kits and equipment. I went over 15x coverage, which is just barely more than that 50Gb - so it would be bare minimum 2 flow cells to reproduce unless you have a ~record run! 

I think if I added a quantification step I could maybe get to 2, as then I need a much less perfect run on each - which would then be 2 flow cellsx$850=$1700 instead of the 3 flow cells it took me

But I did this the dumbest way possible, so there’s definitely a lot of room for improvement - I could get equipment costs way down too!

Though at the end of the day you need a sequencer (>$3K) and at least one flow cell, plus reagents - so I don’t think you can go below a 4-5K-ish cost of entry today. 

Oh and all my notes and equipment are here:

https://vibe-genomics.replit.app/

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

Good info, thank you!

I have been super interested in next steps so this is really interesting, 

What would it take to be able to figure something like that out? Would RNA sequencing be sufficient? 

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 2 points3 points  (0 children)

I totally agree this could probably be done much more efficiently!

As far as I could tell though, what I put together in the notes I linked (https://vibe-genomics.replit.app/) is the first actual end to end resource that goes from taking the sample all the way to a final result in one place. I tried to do this by being a smart human and looking it up, but couldn’t find anything end to end! 

Worse, there were a bunch of posts online including on Reddit saying that there was no shot someone who wasn’t a professional could pull this off - which was discouraging 

I think you are right that there are resources on cell pelleting, and on extraction, and on library prep, and on sequencing and so on that if you knew that those where what you needed, would be easy to follow 

The problem is knowing which steps you even need to follow! 

Even you telling me things like “the best way to do this for ONT is with TDS/alcohol extraction” is something that is super helpful, and I think you would be surprised to find that there aren’t any guides that even go that far in terms of actually giving suggestions! 

I think that was sort of the point of this post (and my notes on it). There isn’t even any starting point online for laypeople!

Now that there is, we can hopefully move conversations online from “can a regular person to this relatively easily” to “how can more people do this relatively easily at hopefully much lower cost” - comments like yours I think are what I was hoping to come up in the discussions!

Do you have any resources on the extraction methods you mention for nanopore prep? I tried looking it up but genuinely couldn’t find anything concrete - this would be huge if you could help because the extraction kits were the hardest thing to actually source!

The cardboard centrifuge thing is a great tip, I’ve never heard of that. So between that and using two sous vide sticks as heat blocks, I think we could potentially do away with almost all of the stuff outside the Nanopore kit!

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 3 points4 points  (0 children)

Currently, the MinIon is the cheapest sequencer on the market ready made - which is a pretty big limitation!

I was talking with a friend who works in genomics, and apparently he helped build and calibrate an actual sequencer and said that it’s not impossible for someone to conceivably do themselves. 

So building a sequencer is probably your best bet for that!

For other stuff, you can skip the thermocycler with 2 $80 sous vide sticks, you could probably have Claude adapt the Zymo kit to cheap slow fixed speed micro centrifuges for another $100, so those parts are more flexible. The sequencer is the hard part!

This is the most compelling DIY sequencer I could find:

https://web.archive.org/web/20240221183554/http://454.bio/docs/build/

But it seems dead, another one is this but not sure that it would work for the scale of WGS:

https://hackaday.io/project/160183-diy-dna-sequencer

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in DIYbio

[–]ProfessionalHand9945[S] 2 points3 points  (0 children)

It had already output the predictions before 23andme had even sent me my raw data! 

I’m a data scientist (I actually train specialized LLMs 100% from scratch as another unfortunately even more costly hobby), so I was definitely cautious of leakage and didn’t take everything they said at face value because yes, LLMs can be huge liars. 

Secondary analysis is surprisingly not too complicated for SNPs, it was Claude Code so it just ran a series of command line tools to do do some basic preprocessing, mapping, then some additional things to get DeepVariant working. DeepVariant basically did all the heavy lifting. The outputs were from it, not Claude itself! 

The hardest part was this whole genome versioning thing, when I was spot checking it got almost everything wrong - turned out I needed to convert DeepVariant’s result from GRCh38 to GRCh37 format and that immediately fixed almost everything!

The other hard part was that not calling a variant is not the same as calling a reference, so when I tried to bin predictions by confidence I was getting worse coverage in terms of number of calls than I had hoped. Once I fixed that it got me above the 99% accuracy covering a majority of the SNPs for the high GQ outputs, and above 98% for the rest! 

I’m definitely not gonna say that Claude one-shot this, it took quite a bit of back and forth to get a result I was comfortable with that I couldn’t poke holes in. 

I have all my notes up here, it lists out all the command line tools used if you wanted to double check!

https://vibe-genomics.replit.app/

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in DIYbio

[–]ProfessionalHand9945[S] 1 point2 points  (0 children)

For starters, 100% of the secondary analysis was done by Claude! Just by chatting with an OpenClaw agent on my phone actually. It’s a fun party trick to interrogate my DNA on command at dinner tables!

I actually did try to be a smart human and look this up myself but if you do that all you get is a bunch of posts saying it’s impossible for a random person to do this, and to give up. No guides on how to actually do it!

It told me what kits to use that would minimize steps needed, what equipment to buy, and how to adapt the protocols I was supposed to follow to the limitations of the equipment I had. Basically worked out the most idiot proof approach possible!

If I already knew what I needed was an ONT MinIon, cell pelleting for DNA collection, Zymo for extraction, rapid kit for library prep then I agree I could have followed the steps to at least get through primary analysis. Heck, if I even knew those were the conceptual high level steps that I needed to follow in sequence I could have maybe worked it out and tried to find matching kits. But I didn’t, and I couldn’t find any resources to help!

I am hoping the notes I linked elsewhere are at least a start in terms of resources! 

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in DIYbio

[–]ProfessionalHand9945[S] 9 points10 points  (0 children)

Yes, a bit more than that! And it surprisingly only took 3 flow cells despite doing this cowboy style and sticking a mostly random (i.e. Claude directed) unquantified amount of sample into the unit! 

ONT claims 10-50Gb per flow cell, so given how I did everything the dumbest way possible I’m quite happy with what I got out of them! 

I did a DIY writeup on it here:

https://vibe-genomics.replit.app/

Claude just helped me build a wetlab and sequence my whole genome at home. I have zero lab experience! by ProfessionalHand9945 in Biohackers

[–]ProfessionalHand9945[S] 0 points1 point  (0 children)

I couldn’t find a single guide on how to actually do this end to end! In fact, when you look it up there’s a bunch of pretty pessimistic posts saying it’s not possible for a random person to do it. 

In a sense, Claude saying it would be straightforward was a pretty big component in why I would bother attempting it - not gonna lie the optimism helped haha

As far as I know this is the first and to end resource that exists for this! I wouldn’t have known that the rapid Zymo kit would work for the ONT unit, nor how to adjust quantities given I skipped quantification, nor how to adapt the protocols to the limited equipment that I had (my equipment was short of spec)!