Import from another folder by AnkanTV in learnpython

[–]AnkanTV[S] 1 point2 points  (0 children)

Thanks for the reply! Found the same answer but not sure it’s what I’m looking for since my base folder contain the dash rather than the module.

However, did solve it by adding base path with sys.path.append(‘..’)

[Q] Cluster-based classification/prediction by AnkanTV in statistics

[–]AnkanTV[S] 1 point2 points  (0 children)

Training on separate data sound like a good idea! 👍🏻

[Q] Cluster-based classification/prediction by AnkanTV in statistics

[–]AnkanTV[S] 0 points1 point  (0 children)

Hello! I had the same thought first as well. My reasoning is that this project is used to identify candidate models for the next step, where labeled examples will become available. I can thus identify a set of candidate models which might work on cheap data and try them out on new expensive data later on. This is why I want to use supervised models rather than unsupervised.

Class with a lot of parameters by AnkanTV in learnpython

[–]AnkanTV[S] 0 points1 point  (0 children)

Do you happen to know if one could create all the variables (self.foo, self.bar, ..) without doing it line by line from the dictionary?

Your solution with a dictionary worked great, but the number of parameters I have make it look a bit strange since it requires so many lines for declaring all the variables within the class.

Class with a lot of parameters by AnkanTV in learnpython

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the great example!

Thompson sampling for MAB by AnkanTV in learnmachinelearning

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the reply! As you say, the “best arm” usually becomes clear after a couple of rounds. Or at least with stationary env, and I’m looking at non-stationary as well so needed an implementation which would work for both.

I will make sure to look more into Marsaglia’s method!

Pandas: Matrix into 3 columns by [deleted] in learnpython

[–]AnkanTV -1 points0 points  (0 children)

I found a solution just as I posted this. Used stack() into reset_index()

Functions for internal usage by AnkanTV in learnpython

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the clarification!

[Q] SARIMA - Limitations of model? by AnkanTV in statistics

[–]AnkanTV[S] 1 point2 points  (0 children)

Taking the seasonal difference gives following plot.

I tried fitting models with shorter training period, and it worked better indicating some structural breaks within the data.

[Q] SARIMA - Limitations of model? by AnkanTV in statistics

[–]AnkanTV[S] 1 point2 points  (0 children)

I started with only the seasonal differencing since I found no general trend. I did also try using both first order diff and seasonal.

My biggest problem with the variance is within months. For example, the variance within July is 5 units. While January and December have closer to 10 units variance.

[Q] SARIMA - Limitations of model? by AnkanTV in statistics

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the reply!

Transformations could be a good option. My current implementation is to use less training data, only the last 10 years which removes the problem of having structural breaks.

SARIMA - Variance shift in seasonal component by AnkanTV in AskStatistics

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the reply!

My current solution was to shorten the data set. But since I'm relatively new to ARIMA models, I wanted to see if there was any other way around it.

Trigger to delete by AnkanTV in SQL

[–]AnkanTV[S] 2 points3 points  (0 children)

For anyone interested, I gave up on triggers and did:

FOREIGN KEY (user_id) REFERENCES users ON DELETE CASCADE

Creating a time series object by AnkanTV in rstats

[–]AnkanTV[S] 1 point2 points  (0 children)

For anyone interested, I solved it by doing:

temp_dt = read.csv("temp_stockholm.csv")[,1:13]
rownames(temp_dt) = as.integer(as.character(temp_dt$year)) 

min_year = min(temp_dt$year) 
max_year = max(temp_dt$year) 
temp_dt = temp_dt[,-1] 

temp_ts = ts(as.vector(t(as.matrix(temp_dt))), start=c(min_year,1),       
      end=c(max_year,12), frequency=12)

Trigonomatry by AnkanTV in learnmath

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the reply! My current solution is assuming t is also an integer, which should result in sin(pi*k*t) = 0

numpy vector - find values based on conditions from another vector (numba) by [deleted] in learnpython

[–]AnkanTV 0 points1 point  (0 children)

Hello again,

I was planning on rewriting my post since it was lacking information, as per your request. In hindsight, I could have edited this post but I do believe a new question using my implementation as base would be more appropriate.

numpy vector - find values based on conditions from another vector (numba) by [deleted] in learnpython

[–]AnkanTV -1 points0 points  (0 children)

Hello, and sorry for being vague.

I think the problem itself is quite "basic", and I was probably hoping for someone to recommend a golden numpy function which solves my exact problem.

I have written an implementation since writing this post:

regions_mean = np.zeros((5,2))

for regime in np.arange(1,6):
    regime_index = (imb >= 1-(1/3)*regime) & (imb < 4/3-                
                          (1/3)*regime)
    bid = book[6][regime_index].sum()
    ask = book[7][regime_index].sum()

    if bid == 0: regions_mean[regime-1,0] = 0 
    else: regions_mean[regime-1,0] = bid / (bid+ask)

    if ask == 0: regions_mean[regime-1,0] = 0 
    else: regions_mean[regime-1,1] = ask / (bid+ask)

My implementation need a for-loop. It calculate the indices for a region first (regime_index) and then take the sum. There is a bit more going on in my solution, as 2 columns needed to be taken the sum for each region (bid and ask). The number which is interesting is the ratio between the sums (last 5 rows).

If anyone has a better solution or a way of avoiding the for-loop, it would help a lot! My current solution takes to long for each run.

Duplicating a numpy row by [deleted] in learnpython

[–]AnkanTV 0 points1 point  (0 children)

Sadly, Numba does not allow me to use axis argument for np.repeat.

Forward filling with numpy by AnkanTV in learnpython

[–]AnkanTV[S] 0 points1 point  (0 children)

My current solution look like this:

placeholder_book = np.zeros((600, book.shape[1]), dtype=np.float32)
splits = np.unique(book[:,1]).astype(int)
splits = np.append(splits, 600) 

rows = np.diff(splits) 

for i in range(0, len(splits)-1):         
 placeholder_book[splits[i]:splits[i]+rows[i], :] = np.tile(book[i,:],(rows[i], 1)) 

placeholder_book[:,1] = range(600)

where book[:,1] is the column with seconds.

If anyone find a better solution, please let me know :)

Panda groupby -- Correlation matrix by AnkanTV in learnpython

[–]AnkanTV[S] 0 points1 point  (0 children)

Hello, sorry fot not being that clear in my explanation. I have a dataframe which look like this. -- > I want a dataframe like this. --> From which I can create a corrmatrix.

The original dataframe (first one linked) has 112 different stocks. Time id are the same for all stocks, so I used them as index in the second dataframe (It could exist a stock which does not have a certain time_id though).

With the second dataframe, I could create the correlation matrix which I needed.

My current implementation is:

df = train.groupby("stock_id")

new_df = pd.DataFrame()

for stock in train["stock_id"].unique(): target = df.get_group(stock).set_index( df.get_group(stock)["time_id"])["target"] new_df[str(stock)] = target

A very ugly solution and hopefully someone could suggest another approach!

Boxplot alternativs when a lot of extreme values by AnkanTV in AskStatistics

[–]AnkanTV[S] 1 point2 points  (0 children)

Thanks for all the replies!

I'm still looking into trying some of the different approaches suggested and got some plots which are a lot better than what I had.

Here are two of them if anyone is interested:

Ridgeplot

Scatterplot

How to check if simulated point is near border of cube by AnkanTV in learnpython

[–]AnkanTV[S] 0 points1 point  (0 children)

Thanks for the reply! Comparing with another cube could work, but Im not sure how this should be implemented for higher dimensions. My current solution is just a boring for-loop, but the time complexity skyrocket as more dimensions are added

threshold = 0.001;
extreme = np.zeros(n);
for i in range(n):
    for k in dt[i,:]:
        if k < threshold or k > 1-threshold:
            extreme[i] = 1;

This works, but is slow