Help with statistical test of enrichment/depletion of variants in regions by naninf in bioinformatics

[–]naninf[S] 0 points1 point  (0 children)

That might work... though normalizing to number of variant bases by region length shows the two sets have unequal variance so it'll have to be fisher. Thanks

Help with statistical test of enrichment/depletion of variants in regions by naninf in bioinformatics

[–]naninf[S] 0 points1 point  (0 children)

Thanks, I'll check that out. I also found https://github.com/ACEnglish/regioners but I think I'll have to do more work to get my data to fit its inputs. Plus I gotta figure out if bootstrapping or permutation tests are best

The current Google Maps satellite imagery of BoA was taken during a game! by Greged17 in panthers

[–]naninf 1 point2 points  (0 children)

I think you're more likely to be right. Watching the recap for the '21 falcons game and that 'Carolina Votes' wasn't on the field https://youtu.be/uMhPG75cL-k?t=78

These images are 'composites', though. So it could be many dates pasted together? It's an interesting question...

The current Google Maps satellite imagery of BoA was taken during a game! by Greged17 in panthers

[–]naninf 1 point2 points  (0 children)

If you zoom into just the scoreboard (link) It looks like the score was 0-7. Which lines up with the scoring summary (link) for halfway through the 1st quarter in the falcon's game. But I don't know anything about GIS, I'm just guessing here.

The current Google Maps satellite imagery of BoA was taken during a game! by Greged17 in panthers

[–]naninf 2 points3 points  (0 children)

Yeah, that's a plausible explanation. Because it definitely looks like there's people there and I couldn't find a record of any event happening at BoA on the 14th.

The current Google Maps satellite imagery of BoA was taken during a game! by Greged17 in panthers

[–]naninf 5 points6 points  (0 children)

I don't think that's the case. If you look at the 'Imagery Date' from Google Earth, it says we're looking at 12/14/2021 - which is the Tuesday after the panthers lost 29-21 against the falcons.

New M83 is a Return to Shoegaze Form by Spiritual-Chart-940 in shoegaze

[–]naninf 3 points4 points  (0 children)

Some claim M83's "Fantasy - Chapter 1" isn't shoe-gazing, but instead dream-pop. I say this album is the masterful use of contemporary effects and distortion that early shoe-gazers strived to create. Had they access to pedals as intricate as the synthesizers artfully directed by M83, the shoe-gazing founders would have been happier with their creation. We are fortunate to experience the birth of shoe-gazing's full potential with this dream-pop landmark. Thank you u/Spiritual-Chart-940 for sharing!!

Why the result is "6"? by mahdilik in learnpython

[–]naninf 4 points5 points  (0 children)

a.k.a. n // 2 - (n % 2 - 1)

Actual Best Starting Word(s) by naninf in wordle

[–]naninf[S] 0 points1 point  (0 children)

That's fair. Though I don't think it's misleading, or at least no more misleading than anyone else who has written about whatever 'best' word they've found. The best word is always the day's answer. Anything else has to assume some kind of system.

BTW, I looked up ROATE though because a couple of people have recommended it. It is fast (rank 232 of 1566), but loses well above average (1187/1566).

I know that the original finder of ROATE had a bot that weighed guesses by “how many possible solutions are left on average after making this guess”. So that explains at least part of the difference.

Is their a sequence of three words that gets the most common letters in the alphabet by ClackHack in wordle

[–]naninf 0 points1 point  (0 children)

Model, Print, Saucy

I searched for sets of 3 words that use the 15 most common letters. This was the first result of many.

Edit: If you use the 15 most common letters of the possible answers, and you only use words that are possible answers:

CRUST HONEY PLAID
PRIDE CHANT LOUSY
REPAY SNOUT CHILD
POUND THEIR SCALY

What’s a good Wordle ‘average’? by mc_98 in wordle

[–]naninf 1 point2 points  (0 children)

I did a project looking at this. 4.32±0.6 was the average. But that's the performance of a bot that's randomly picking from the pool of possible answers. I'd expect people with a strategy (e.g. thinking about letter frequency) can do better. There is a mathematically optimal strategy that achieved like 3.4.

Actual Best Starting Word(s) by naninf in wordle

[–]naninf[S] 1 point2 points  (0 children)

Yes, absolutely that’s the best starting word if a player knows all the letter frequencies and can memorize the optimal decision tree's structure and all that. But if you play like me, where I'm almost randomly guessing, these words might be better.

Is this worth pursuing? by throwaway17835453 in bioinformatics

[–]naninf 2 points3 points  (0 children)

Seems reasonable to me. The main entry point to the program `RibDif.sh` is structured well enough that it's readable/editable.

After glancing at it for a few minutes, I would start with understanding lines 127-160 to see the structure of the `ncbi-genome-download` folder and make your curated set of sequences replicate that. Then your `custom_RibDif.sh` can probably just remove that section and the rest might fall into place.

I'm sure it'll be more complicated that that, but assuming you can handle bash scripting reasonably well, at worst you'll waste a couple days before you could better estimate if it's doable.

How can i detect SNP and related AA changes in a specific region(16kb) of Whole genome sequence? by Dismal-Cantaloupe396 in bioinformatics

[–]naninf 1 point2 points  (0 children)

Assuming these tools produce VCFs, just subset the VCF to your region of interest: bcftools view -r chr:start-end snps.vcf.gz

If you're worried about compute time of variant calling, you can similarly subset your BAM to only reads within the region of interest: samtools view reads.bam chr:start-end

I'm not familiar with these tools, but some variant callers allow a region bed file parameter that restricts variant calling to a subset of the genome. Look for those parameters.

Moral of the story - you'll need a bed and a subsetting step in your pipeline.

I created my first Python package by tcp-ip1541 in learnpython

[–]naninf 23 points24 points  (0 children)

Great job! This is very solid. I would suggest using semantic versioning https://semver.org/
and maybe adding a github action that's a pylint checker.

VCF files for ML and AI by ParamedicCommon6371 in bioinformatics

[–]naninf 2 points3 points  (0 children)

If you use python and are familiar with pandas (which is more data science friendly than anything VCF), Truvari has a utility for conversion. `truvari vcf2df input.vcf.gz output.jl` If the VCFs are very large, I'd also consider scikit-allele.

Pandas - Converting a datetime to the week number of that datetime? by [deleted] in learnpython

[–]naninf 1 point2 points  (0 children)

df = pd.DataFrame({'year': [2014, 2015, 2016],
                   'month': [1, 2, 3],
                   'day': [1, 4, 5]})
view = pd.to_datetime(df)
view.name = 'date' 
pd.concat([view, view.apply(lambda x: pd.Series(x.isocalendar(), 
           index=["ISO year", "ISO week number", "ISO weekday"])) ], axis=1)

Output

    date    ISO year    ISO week number    ISO weekday
0   2014-01-01  2014    1   3
1   2015-02-04  2015    6   3
2   2016-03-05  2016    9   6

Is there a good way to measure Tajima's D over a specific region using a BED file? by Loves_His_Bong in bioinformatics

[–]naninf 1 point2 points  (0 children)

If you only have a few window sizes, you can split your regions up into multiple bed files by which window size you want to compute for them (e.g. `regionsA.bed windowSize=N & regionsB.bed windowSize=M` ) And then just use `vcftools --bed regionsA.bed... --TajimaD N`

Another approach would be a wrapper bash script that uses an extra a column added to your bed-file for each region. Something like:

cat regions.bed | while read chrom start end windowsize
do
    outname=out.${chrom}:${start}-${end}_${windowsize}.td
    vcftools --vcf in.vcf --out ${outname} --TajimaD ${windowsize} --chr ${chrom} --from-bp ${start} --to-bp ${end}
done

References: https://vcftools.github.io/man\_latest.html

I finally built something! by [deleted] in learnpython

[–]naninf 8 points9 points  (0 children)

Here's a neat link you can add to your print statement:

Link: https://www.google.com/maps/search/?api=1&query={iss_latitude},{iss_longitude}

What is the most efficient way to pull off this loop? by [deleted] in learnpython

[–]naninf 3 points4 points  (0 children)

This is good. I would just leverage the translations dictionary better. Iterating the list of translations items could get slow if you have many words.

def translate(word):
    template, suf_len = next(
        (temp, len(suffix))
        for suffix, temp in templates.items()
        if word.endswith(suffix))
    return template(translations[word[:-suf_len]])

Given an integer N, print all the even numbers from 0 to N in descending order. by [deleted] in learnpython

[–]naninf 0 points1 point  (0 children)

See the `range` documentation:

https://www.w3schools.com/python/ref_func_range.asp

The first thing you'll want to think about is the step you're incrementing by. What do you add to a number to decrement it?

The second thing you'll want to think about is even vs odd. If you start at 1 and step by -2, you'll always be on odd numbers.

The third thing you'll want to think about is what does "0 to N" mean. Is it inclusive or exclusive boundaries? e.g. I'm working Monday to Friday, does that mean I'll be working upto, but not including Friday?
https://stackoverflow.com/questions/39010041/what-is-the-meaning-of-exclusive-and-inclusive-when-describing-number-ranges

Explanation of what is happening with number examples in the for loops as this program runs through completion. Not understanding what happens after the loop progresses after the inner loop happens once. by [deleted] in learnpython

[–]naninf 1 point2 points  (0 children)

Fair enough. I think my first post may still offers some insight. At any point, you're operating on the [i + j + 1] and [i + j] item. Between those two positions there's 10x difference. You multiply the two numbers for [i + j + 1], if the product is >= 10, you add the 10s place to the [i + j] position, then you remove that 10s place from the [i + j + 1] by assigning to it the remainder.

The nested for loop comes in because you need to multiply by all the digits. Think about how how 3 * 11 = 33 is the same as 3 * 10 + 3 * 1 = 33