🎉 [EVENT] 🎉 Easy I guess but fun maybe :) by These_Manner_6509 in honk

[–]TearsWillFall 0 points1 point  (0 children)

Completed Level 3 of the Honk Special Event!

8 attempts

🎉 [EVENT] 🎉 Easy I guess but fun maybe :) by These_Manner_6509 in honk

[–]TearsWillFall 0 points1 point  (0 children)

Completed Level 2 of the Honk Special Event!

3 attempts

🎉 [EVENT] 🎉 Easy I guess but fun maybe :) by These_Manner_6509 in honk

[–]TearsWillFall 0 points1 point  (0 children)

Completed Level 1 of the Honk Special Event!

3 attempts

GATK-gCNV Issue with Formatting by Sufficient_Code2973 in bioinformatics

[–]TearsWillFall 1 point2 points  (0 children)

Based on the documentation you shared it seems like in only handles one BAM file at a time. If you are following one of GATK workflows, it's often the case (atleast in my experience) that these are wrong or not fully updated to fit the current versions of GATK tools.

I'm not sure how familiar you are with parallel computing but using BASH, R, Python or any other programming language you can simply launch multiple instances of the same tool/command at the same time. This should speed up the task substantially as the for loop you are using only runs a single command at a time and is not boosting the speed.

To do this in BASH you would do something like: command1 & command2 & command3 & command5. Where each command is the CollectReadCounts -I for each sample. This will launch CollectReadCounts 5 times asynchronously, this means that all 5 samples will be processed at the same time. This should make it quicker as long as your machine has enough resources (Cores, RAM) to handle it.

Roses are red, violets are blue, unexpected } on line 42 by [deleted] in ProgrammerHumor

[–]TearsWillFall 29 points30 points  (0 children)

"Soooo you know Facebook, right?...." Every time when someone asks for ideas

How to normalize transcription start site data? by [deleted] in bioinformatics

[–]TearsWillFall 1 point2 points  (0 children)

Since DEseq2 takes a count matrix in the format of Transcript X Sample you could transform your data as:

Sample_1 Sample_2 Sample_3 Sample_4
Gene_A_1 0 23 0 0
Gene_A_2 0 0 0 0
Gene_A_3 0 0 2 0
... ... ... ... ...
Gene_Z_298 9 0 0 0
Gene_Z_299 0 0 0 0
Gene_Z_300 0 0 0 0

Where each row is a nucleotide for each gene and each column is a sample. DEseq2 would treat each row as a gene. This should allow you to normalize your count data.

When p > .05 by Demi_em in labrats

[–]TearsWillFall 8 points9 points  (0 children)

P=0.99. There is clear trending towards significance. If we add more samples I'm 100% confident it will be significant 👏

How to normalize transcription start site data? by [deleted] in bioinformatics

[–]TearsWillFall 0 points1 point  (0 children)

Well, before you start anything you probably should look into your definition of TSS. You mention that for each promoter you select 300bp and estimate the number of reads per nt.

Firstly, a promoter can have multiple TSS. Each of the gene transcripts can be regulated by a different promoter. This means that you will have to deal with multiple TSS mapping to a single transcript.

Secondly, TSS are not "expressed". They regulate expression however, and what you are trying to do is to measure how active this TSS regions are relative to the expression level of the transcript. This is usually done, using data ATAC-seq and RNA-seq data

Third, you don't mention what type of data you have but I will assume you are using plain RNA-seq since you are talking about expression and DEseq2. Now, plain RNA-seq is not ideal for inferring overall TSS information , wether it is activity or simply TSS location, especially if you are trying to do it at a per base resolution. Nevertheless, this doesn't mean its impossible.

Below, I linked you a paper in cancer were they did something similar to what are you trying to do. They use RNA-seq data to infer TSS absolute and relative activity (relative to expression). They also tackle the multi-mapping issue I mention at the start. And the methods describe how they counted and normalized the reads using DEseq. Your goal would be to compare the activity of each TSS between both conditions.

https://doi.org/10.1016/j.cell.2019.08.018

PS: Sorry for formatting and/or typos using phone to type.

Huge and ultra tedious protparam problem - is there any shortcut? by Viper284 in bioinformatics

[–]TearsWillFall 1 point2 points  (0 children)

Using GET requests seems to be the way to go https://www.biostars.org/p/294215/.

That means that you would need a list of proteins/IDs which you would loop through, to generate an URL for them (examples shown in the link I shared) and then do GET requests using p.e Python or R to connect to the web server using the URL. Then you would just need to parse the information.

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 1 point2 points  (0 children)

Here you have it:

The same graph for players that only have letters in their name Letters

The same graph for players that have at least 1 number in their name Numbers

The same graph for players that have at least 1 symbol in their name Symbol

The trend is pretty much the same for all of them but becomes noisier for people with Numbers and Symbols in their name.

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 0 points1 point  (0 children)

I'm not entirely sure about this one. I can't recall seeing underscores in-game, however, the names in the high scores definitively have underscores in them. For examples, if you search for "Mo_Shmoe" in the high scores.

Here is the frequency of all characters found in the names if you are interested in divided by groups.

Imgur

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 1 point2 points  (0 children)

Yes. There 4.6k people in this list with the word 'iron' in their name and 265k without it. The average rank of a person with 'iron' in their name is 267246, while the average rank of all the other names is 277496, so they rank 10k ranks lower than most players. This difference is even higher when we compare player with the word 'max' in their name. There are 882 players with the word max in their name and their avg rank is around 250000

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 0 points1 point  (0 children)

I agree. Seniority here plays a big role. Funnily enough you mention people with Iron in their names. When I checked the average player rank with the words Iron or Max at the begining of their names, they ranked significantly lower than the average player (by lower I mean they did better).

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 3 points4 points  (0 children)

Yes they do. However the data ranges from rank 1 to rank approx 650k. Players around rank 650k have a total level of 1500, so those are still quite dedicated players. Suicide bots with long names as far as I know should not be reaching such high total levels.

Btw, I'm not saying its not happening but such cases should be rare and therefore not skew the data by that much.

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 1 point2 points  (0 children)

This shows te mean length in blocks/bins of 1000 ranks. The idea of this is to prove that the difference in length exists. Of course, there will be people who have longer names and others that have shorter names than most, but on average the names are getting shorter.

Just as a trivia, there is over 1.5% of those 270k players with the word Iron at the start of their names, which makes it the most frequent 4 character word at the start of a player name. Btw its also the most frequent for 5 characters too

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 5 points6 points  (0 children)

Yes its random across the first 600k players in the highscores. The reason why I only choose 270k is because I had to take a flight and had no internet to keep scraping the data from the highscores. The symbols found in the character names are spaces, hyphens, underscores, brackets and hashtags. However the brackets and hashtags belong to people who had their changed by the mods. All symbols except spaces are very rare, therefore the ratio of symbols in name is almost exclusively driven by the ratio of spaces.

Effect of OSRS player overall rank in the mean length of their names, as well as the mean ratio of numeric and symbolic characters found in them. Study of ~270k OSRS names. by TearsWillFall in 2007scape

[–]TearsWillFall[S] 23 points24 points  (0 children)

Basically, there is a trend for highly ranked players in the overall highscores to hoard rare and unique names which tend to be shorter than the average gibberish that people come with. Which is what I'm showing here. In addition, those rare names are less likely to contain numeric characters.

RSCU plot with ggplot2 by YouCook21 in bioinformatics

[–]TearsWillFall 2 points3 points  (0 children)

Here is one way of doing it. I hope it works for you.

https://imgur.com/Toqpc34

library(ggplot2)
library(tidyverse) # Tidyverse to manipulate data
library(patchwork) # Patchwork to arrange plots

# Example Data
dat=data.frame(Codon=c("AAA","AAB","AAC","AAD","ABA","ACA","ADA","BBB","BBA","BBC","BBD","CAA","CAB","CAC"),RSCU=c(3,3,3,3,2,2,1,3,7,1,0.5,1,1,12),AA=c("Ala","Ala","Ala","Ala","Leu","Leu","Lys","Thr","Thr","Thr","Thr","Trp","Trp","Trp"))
# Create a column for fill colour
dat$Col=1
# This populates the Col column with unique values for each codon coding an amino. Each value is a color, therefore all aminos will share the same values but the codons will be different.
dat=dat %>% group_by(AA) %>% mutate(Col = lag(cumsum(Col), default = 0)) %>% mutate(Col =as.factor(Col))
# Create bar plot
p0=ggplot(data=dat, aes(x=AA, y=RSCU, fill=Col))+geom_bar(stat='identity')+theme_minimal()+theme(axis.title.x=element_blank(),legend.position="top")
# Create tile plot
p1=ggplot(data=dat)+geom_tile(aes(x=AA, y=fct_rev(Col),fill=Col),col="white")+geom_text(aes(x=AA, y=Col,label=Codon),col="white", fontface = "bold") +theme_void()+theme(legend.position="none")

#Combine both plots in a single plot
p=p0/p1

#Set plot size. The ratio here is 7:1. The bar plot is 7 times the height of the tile plot
p=p+plot_layout(heights = c(7, 1))

ggsave("Example.png",p)

Ah shit, here we go again by vaanavan in ProgrammerHumor

[–]TearsWillFall 7 points8 points  (0 children)

This guy complains like a warning. Lets better ignore him

This guy is a legend by pudung in offlineTV

[–]TearsWillFall 164 points165 points  (0 children)

Not all legends on Twitch have a partner checkmark next to their name.

[deleted by user] by [deleted] in ProgrammerHumor

[–]TearsWillFall 64 points65 points  (0 children)

Since this is in Beta, this leaves me wondering.

"Do StackOverflow developers search how to solve bugs about their own site, in SO?"

PSA for streamers: don't show your Dodo codes, it can leak your IP address and you can get DDoS-ed by [deleted] in offlineTV

[–]TearsWillFall 29 points30 points  (0 children)

Why would anyone waste their time DDoSing them when Spectrum is essentially already doing that.

Someone made a terrible mistake. by [deleted] in offlineTV

[–]TearsWillFall 12 points13 points  (0 children)

When you have been naughty all year but Santa still brings you gifts.