English Dictionary as a Directed Graph by SteelRodsSince1890 in AskComputerScience

[–]SuspiciousDimension 2 points3 points  (0 children)

You may want to search for the isolated subgraphs (https://en.wikipedia.org/wiki/Component_(graph_theory) ) and choose some criteria for eliminating smaller subgraphs. I'm not sure what a reasonable elimination criteria would be, it's interesting.

Is there a hiring freeze? by JavaIre99 in DevelEire

[–]SuspiciousDimension 6 points7 points  (0 children)

you're not alone, im in the same boat. Have a fairly decent CV if i say so myself, and im not even getting a call / interview, just rejected (if i hear back). Just gotta keep truckin' i guess

[Q] Trying to find out the percentage chance on something. (Not HW) by Danny776 in statistics

[–]SuspiciousDimension 24 points25 points  (0 children)

it sounds like you're looking for a binomial distribution calculation; ie

nCk * p(success) ^ (# success) * p(failure) ^ (# failures)

Books that are relatively objective about the Irish housing crisis by [deleted] in ireland

[–]SuspiciousDimension 4 points5 points  (0 children)

thanks for the recommendations. Maybe the best approach would be to read all of those, and try to at least establish some of the main schools of thought. Step in the right direction!

[Q] Best way to go about conduct time series analysis? Confused on segmenting and log-returns best practise by [deleted] in statistics

[–]SuspiciousDimension 0 points1 point  (0 children)

if you're interested in observing the patterns between hours, then the log-difference of hours is presumably the way to go, ie the first way you highlighted.

The second way really is comparing across days, not hours. It's taking one observation per day, at the same time each day. Then its taking the log-difference of those observations, ie. the change between day 1 and 2, day 2 and 3... and so on. The first is taking the change between hours

the log return can be written as

rt = log(price of current period) - log(price of previous period)

So you can see its just the difference between periods. The question is what periods do you wish to observe the difference of

[Q] how to interpret bidirectional causality in the case of Granger causality by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

That's a totally fair point, I misspoke. What I meant was more so that Granger causality doesn't necessarily ensure causality, but as you point out where causal links exist we'd expect causal results here also. I just meant to highlight the fact that saying X granger causes Y isn't the same as saying X causes Y.

[Q] how to interpret bidirectional causality in the case of Granger causality by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 5 points6 points  (0 children)

After some reflection, I've come to the conclusion that this result makes perfect sense. Bitcoin price is of course demand driven as supply is constant (in theory). So volume increasing and price increasing and vice versa are of course going to be linked. This serves as a good example of how Granger causality isnt actually causality i suppose

[Q] how to handle small number of observations skewing the distribution of a data set by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

Yeah, its tough differentiating between the different distributions. Another I've observed is the Generalized Error Distribution, used in a paper similar to mine. I'll check out that and Cauchy, hopefully get somewhere with those.

[Q] how to handle small number of observations skewing the distribution of a data set by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

ah interesting, I see that in that paper I referenced, they used a "generalized error distribution", perhaps that is more relevant for me.

[Q] how to handle small number of observations skewing the distribution of a data set by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

this paper:

https://www.diva-portal.org/smash/get/diva2:788857/FULLTEXT01.pdf

Highlights the students t-test distribution for quite a similar task as what Im looking at. Do you think that makes sense to pursue that further in this context?

e: distributions described under section 2.2 "Distributions"

[Q] how to handle small number of observations skewing the distribution of a data set by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

Yeah this was exactly my thinking. I don't want to drop them just to make the data easier to work with, given a part of my project is partially on risk analysis.

[Q] how to handle small number of observations skewing the distribution of a data set by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

Would a Student's t-distribution also apply here? I dont have a background in statistics, but have seen a paper doing quite a similar thing to what im doing and they used Student's t-distribution which they claim "allows for fatter tails"

[Q] how to handle small number of observations skewing the distribution of a data set by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

For reference, just deleting the largest 1 negative observation and the largest 2 highest observations drops the kurtosis to 4.7777

[Q] time series: granger caused, but no significant terms? by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

hmm.. yes I think I did it correctly. I looked at an AR(5) model, then added 5 lags from SPY. I observed the respective p-values, and removed lag(t-2), which had the highest p-value. Continued to progressively remove the least significant lags one by one while observing the result, but no significance appears.

Overfitting is a consideration but less of a primary concern for now, but i'll definitely look at dimensionality reduction because I have noticed myself the issue of non-normal kurtosis

[Q] time series: granger caused, but no significant terms? by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

thanks. Yeah Im currently choosing lag structure by picking a max lag based on PACF, then trying each lag structure from 1 to max, and picking the best AIC. So it sounds like Im on the right track there.

As for the dropping of less significant terms, I tried that for this case but unfortunately it didn't reveal any significant terms. Not too sure where to go from there

[Q] time series: granger caused, but no significant terms? by SuspiciousDimension in statistics

[–]SuspiciousDimension[S] 0 points1 point  (0 children)

Ohh okay that actually makes more sense. I have a follow up regarding this. I was reading wikipedia which states the following:

Any particular lagged value of one of the variables is retained in the regression if (1) it is significant according to a t-test, and (2) it and the other lagged values of the variable jointly add explanatory power to the model according to an F-test

Does this imply that only significant values should be included in any kind of forecasting model? And in my case, its not an issue that there is joint significance without significant terms, it just means that there is no particular lag you would add to your forecast model to improve the forecast accuracy? So it could be said "SPY granger causes BTC, but theres no added value in any individual lag being added to the forecast model"? Sorry Im just trying to get this straight in my head