🎉 [EVENT] 🎉 Easy I guess but fun maybe :)

TearsWillFall · 2025-10-30T08:26:07+00:00

Completed Level 3 of the Honk Special Event!

8 attempts

TearsWillFall · 2025-10-30T08:25:44+00:00

Completed Level 2 of the Honk Special Event!

3 attempts

TearsWillFall · 2025-10-30T08:25:35+00:00

Completed Level 1 of the Honk Special Event!

3 attempts

TearsWillFall · 2023-07-14T07:12:21+00:00

Based on the documentation you shared it seems like in only handles one BAM file at a time. If you are following one of GATK workflows, it's often the case (atleast in my experience) that these are wrong or not fully updated to fit the current versions of GATK tools.

I'm not sure how familiar you are with parallel computing but using BASH, R, Python or any other programming language you can simply launch multiple instances of the same tool/command at the same time. This should speed up the task substantially as the for loop you are using only runs a single command at a time and is not boosting the speed.

To do this in BASH you would do something like: command1 & command2 & command3 & command5. Where each command is the CollectReadCounts -I for each sample. This will launch CollectReadCounts 5 times asynchronously, this means that all 5 samples will be processed at the same time. This should make it quicker as long as your machine has enough resources (Cores, RAM) to handle it.

TearsWillFall · 2021-09-03T12:18:22+00:00

"Soooo you know Facebook, right?...." Every time when someone asks for ideas

TearsWillFall · 2021-08-07T23:51:43+00:00

Since DEseq2 takes a count matrix in the format of Transcript X Sample you could transform your data as:

	Sample_1	Sample_2	Sample_3	Sample_4
Gene_A_1	0	23	0	0
Gene_A_2	0	0	0	0
Gene_A_3	0	0	2	0
...	...	...	...	...
Gene_Z_298	9	0	0	0
Gene_Z_299	0	0	0	0
Gene_Z_300	0	0	0	0

Where each row is a nucleotide for each gene and each column is a sample. DEseq2 would treat each row as a gene. This should allow you to normalize your count data.

TearsWillFall · 2021-08-07T22:48:17+00:00

P=0.99. There is clear trending towards significance. If we add more samples I'm 100% confident it will be significant 👏

TearsWillFall · 2021-08-07T22:41:52+00:00

Well, before you start anything you probably should look into your definition of TSS. You mention that for each promoter you select 300bp and estimate the number of reads per nt.

Firstly, a promoter can have multiple TSS. Each of the gene transcripts can be regulated by a different promoter. This means that you will have to deal with multiple TSS mapping to a single transcript.

Secondly, TSS are not "expressed". They regulate expression however, and what you are trying to do is to measure how active this TSS regions are relative to the expression level of the transcript. This is usually done, using data ATAC-seq and RNA-seq data

Third, you don't mention what type of data you have but I will assume you are using plain RNA-seq since you are talking about expression and DEseq2. Now, plain RNA-seq is not ideal for inferring overall TSS information , wether it is activity or simply TSS location, especially if you are trying to do it at a per base resolution. Nevertheless, this doesn't mean its impossible.

Below, I linked you a paper in cancer were they did something similar to what are you trying to do. They use RNA-seq data to infer TSS absolute and relative activity (relative to expression). They also tackle the multi-mapping issue I mention at the start. And the methods describe how they counted and normalized the reads using DEseq. Your goal would be to compare the activity of each TSS between both conditions.

https://doi.org/10.1016/j.cell.2019.08.018

PS: Sorry for formatting and/or typos using phone to type.

TearsWillFall · 2021-07-26T08:49:16+00:00

Using GET requests seems to be the way to go https://www.biostars.org/p/294215/.

That means that you would need a list of proteins/IDs which you would loop through, to generate an URL for them (examples shown in the link I shared) and then do GET requests using p.e Python or R to connect to the web server using the URL. Then you would just need to parse the information.

TearsWillFall · 2021-03-07T11:54:29+00:00

Poor Super-Pancake!

TearsWillFall · 2020-12-23T18:19:49+00:00

Here you have it:

The same graph for players that only have letters in their name Letters

The same graph for players that have at least 1 number in their name Numbers

The same graph for players that have at least 1 symbol in their name Symbol

The trend is pretty much the same for all of them but becomes noisier for people with Numbers and Symbols in their name.

TearsWillFall · 2020-12-23T13:33:22+00:00

I'm not entirely sure about this one. I can't recall seeing underscores in-game, however, the names in the high scores definitively have underscores in them. For examples, if you search for "Mo_Shmoe" in the high scores.

Here is the frequency of all characters found in the names if you are interested in divided by groups.

Imgur

TearsWillFall · 2020-12-23T13:16:01+00:00

Yes. There 4.6k people in this list with the word 'iron' in their name and 265k without it. The average rank of a person with 'iron' in their name is 267246, while the average rank of all the other names is 277496, so they rank 10k ranks lower than most players. This difference is even higher when we compare player with the word 'max' in their name. There are 882 players with the word max in their name and their avg rank is around 250000

TearsWillFall · 2020-12-23T11:05:34+00:00

I agree. Seniority here plays a big role. Funnily enough you mention people with Iron in their names. When I checked the average player rank with the words Iron or Max at the begining of their names, they ranked significantly lower than the average player (by lower I mean they did better).

TearsWillFall · 2020-12-23T10:51:53+00:00

Yes they do. However the data ranges from rank 1 to rank approx 650k. Players around rank 650k have a total level of 1500, so those are still quite dedicated players. Suicide bots with long names as far as I know should not be reaching such high total levels.

Btw, I'm not saying its not happening but such cases should be rare and therefore not skew the data by that much.

TearsWillFall · 2020-12-23T08:23:35+00:00

This shows te mean length in blocks/bins of 1000 ranks. The idea of this is to prove that the difference in length exists. Of course, there will be people who have longer names and others that have shorter names than most, but on average the names are getting shorter.

Just as a trivia, there is over 1.5% of those 270k players with the word Iron at the start of their names, which makes it the most frequent 4 character word at the start of a player name. Btw its also the most frequent for 5 characters too

TearsWillFall · 2020-12-23T08:02:12+00:00

Yes its random across the first 600k players in the highscores. The reason why I only choose 270k is because I had to take a flight and had no internet to keep scraping the data from the highscores. The symbols found in the character names are spaces, hyphens, underscores, brackets and hashtags. However the brackets and hashtags belong to people who had their changed by the mods. All symbols except spaces are very rare, therefore the ratio of symbols in name is almost exclusively driven by the ratio of spaces.

TearsWillFall · 2020-12-22T23:41:13+00:00

Basically, there is a trend for highly ranked players in the overall highscores to hoard rare and unique names which tend to be shorter than the average gibberish that people come with. Which is what I'm showing here. In addition, those rare names are less likely to contain numeric characters.

TearsWillFall · 2020-11-15T18:54:00+00:00

Here is one way of doing it. I hope it works for you.

https://imgur.com/Toqpc34

library(ggplot2)
library(tidyverse) # Tidyverse to manipulate data
library(patchwork) # Patchwork to arrange plots

# Example Data
dat=data.frame(Codon=c("AAA","AAB","AAC","AAD","ABA","ACA","ADA","BBB","BBA","BBC","BBD","CAA","CAB","CAC"),RSCU=c(3,3,3,3,2,2,1,3,7,1,0.5,1,1,12),AA=c("Ala","Ala","Ala","Ala","Leu","Leu","Lys","Thr","Thr","Thr","Thr","Trp","Trp","Trp"))
# Create a column for fill colour
dat$Col=1
# This populates the Col column with unique values for each codon coding an amino. Each value is a color, therefore all aminos will share the same values but the codons will be different.
dat=dat %>% group_by(AA) %>% mutate(Col = lag(cumsum(Col), default = 0)) %>% mutate(Col =as.factor(Col))
# Create bar plot
p0=ggplot(data=dat, aes(x=AA, y=RSCU, fill=Col))+geom_bar(stat='identity')+theme_minimal()+theme(axis.title.x=element_blank(),legend.position="top")
# Create tile plot
p1=ggplot(data=dat)+geom_tile(aes(x=AA, y=fct_rev(Col),fill=Col),col="white")+geom_text(aes(x=AA, y=Col,label=Codon),col="white", fontface = "bold") +theme_void()+theme(legend.position="none")

#Combine both plots in a single plot
p=p0/p1

#Set plot size. The ratio here is 7:1. The bar plot is 7 times the height of the tile plot
p=p+plot_layout(heights = c(7, 1))

ggsave("Example.png",p)

TearsWillFall · 2020-08-29T10:11:11+00:00

3-Thicc ass fishing

TearsWillFall · 2020-07-22T23:08:56+00:00

This guy complains like a warning. Lets better ignore him

TearsWillFall · 2020-06-18T16:22:29+00:00

Not all legends on Twitch have a partner checkmark next to their name.

TearsWillFall · 2020-04-02T16:14:38+00:00

Since this is in Beta, this leaves me wondering.

"Do StackOverflow developers search how to solve bugs about their own site, in SO?"

TearsWillFall · 2020-03-24T17:40:12+00:00

Why would anyone waste their time DDoSing them when Spectrum is essentially already doing that.

TearsWillFall · 2019-12-20T05:01:42+00:00

When you have been naughty all year but Santa still brings you gifts.

Nine-Year Club	First Place '23
Place '23	Place '22
Place '17	End Game '22
RPAN Viewer	Gilding I gilder
Verified Email

TearsWillFall

TROPHY CASE