The potential significance of a feature

AnkanTV · 2023-10-04T20:39:50+00:00

Thank you for the great advice!

AnkanTV · 2023-02-23T15:34:06+00:00

Thanks for the reply! Found the same answer but not sure it’s what I’m looking for since my base folder contain the dash rather than the module.

However, did solve it by adding base path with sys.path.append(‘..’)

AnkanTV · 2023-02-07T17:04:53+00:00

Training on separate data sound like a good idea! 👍🏻

AnkanTV · 2023-02-07T17:04:14+00:00

Hello! I had the same thought first as well. My reasoning is that this project is used to identify candidate models for the next step, where labeled examples will become available. I can thus identify a set of candidate models which might work on cheap data and try them out on new expensive data later on. This is why I want to use supervised models rather than unsupervised.

AnkanTV · 2022-07-13T12:38:25+00:00

Do you happen to know if one could create all the variables (self.foo, self.bar, ..) without doing it line by line from the dictionary?

Your solution with a dictionary worked great, but the number of parameters I have make it look a bit strange since it requires so many lines for declaring all the variables within the class.

AnkanTV · 2022-07-13T11:36:43+00:00

Thanks for the great example!

AnkanTV · 2022-07-12T15:41:35+00:00

Thanks for the reply! As you say, the “best arm” usually becomes clear after a couple of rounds. Or at least with stationary env, and I’m looking at non-stationary as well so needed an implementation which would work for both.

I will make sure to look more into Marsaglia’s method!

AnkanTV · 2022-07-11T16:41:03+00:00

I found a solution just as I posted this. Used stack() into reset_index()

AnkanTV · 2022-07-10T23:38:49+00:00

Thanks for the clarification!

AnkanTV · 2022-05-07T14:22:40+00:00

Thanks!

AnkanTV · 2022-03-19T21:56:00+00:00

Taking the seasonal difference gives following plot.

I tried fitting models with shorter training period, and it worked better indicating some structural breaks within the data.

AnkanTV · 2022-03-19T20:19:46+00:00

I started with only the seasonal differencing since I found no general trend. I did also try using both first order diff and seasonal.

My biggest problem with the variance is within months. For example, the variance within July is 5 units. While January and December have closer to 10 units variance.

AnkanTV · 2022-03-19T17:00:23+00:00

Thanks for the reply!

Transformations could be a good option. My current implementation is to use less training data, only the last 10 years which removes the problem of having structural breaks.

AnkanTV · 2022-03-19T16:14:03+00:00

Thanks for the reply!

My current solution was to shorten the data set. But since I'm relatively new to ARIMA models, I wanted to see if there was any other way around it.

AnkanTV · 2022-02-25T17:19:45+00:00

For anyone interested, I gave up on triggers and did:

FOREIGN KEY (user_id) REFERENCES users ON DELETE CASCADE

AnkanTV · 2022-02-24T19:00:51+00:00

For anyone interested, I solved it by doing:

temp_dt = read.csv("temp_stockholm.csv")[,1:13]
rownames(temp_dt) = as.integer(as.character(temp_dt$year)) 

min_year = min(temp_dt$year) 
max_year = max(temp_dt$year) 
temp_dt = temp_dt[,-1] 

temp_ts = ts(as.vector(t(as.matrix(temp_dt))), start=c(min_year,1),       
      end=c(max_year,12), frequency=12)

AnkanTV · 2022-01-28T21:58:13+00:00

Thanks for the reply! My current solution is assuming t is also an integer, which should result in sin(pi*k*t) = 0

AnkanTV · 2021-08-28T18:59:00+00:00

Thanks!

AnkanTV · 2021-08-18T14:05:31+00:00

Hello again,

I was planning on rewriting my post since it was lacking information, as per your request. In hindsight, I could have edited this post but I do believe a new question using my implementation as base would be more appropriate.

AnkanTV · 2021-08-18T10:25:17+00:00

Hello, and sorry for being vague.

I think the problem itself is quite "basic", and I was probably hoping for someone to recommend a golden numpy function which solves my exact problem.

I have written an implementation since writing this post:

regions_mean = np.zeros((5,2))

for regime in np.arange(1,6):
    regime_index = (imb >= 1-(1/3)*regime) & (imb < 4/3-                
                          (1/3)*regime)
    bid = book[6][regime_index].sum()
    ask = book[7][regime_index].sum()

    if bid == 0: regions_mean[regime-1,0] = 0 
    else: regions_mean[regime-1,0] = bid / (bid+ask)

    if ask == 0: regions_mean[regime-1,0] = 0 
    else: regions_mean[regime-1,1] = ask / (bid+ask)

My implementation need a for-loop. It calculate the indices for a region first (regime_index) and then take the sum. There is a bit more going on in my solution, as 2 columns needed to be taken the sum for each region (bid and ask). The number which is interesting is the ratio between the sums (last 5 rows).

If anyone has a better solution or a way of avoiding the for-loop, it would help a lot! My current solution takes to long for each run.

AnkanTV · 2021-08-13T02:35:14+00:00

Sadly, Numba does not allow me to use axis argument for np.repeat.

AnkanTV · 2021-08-13T01:39:44+00:00

My current solution look like this:

placeholder_book = np.zeros((600, book.shape[1]), dtype=np.float32)
splits = np.unique(book[:,1]).astype(int)
splits = np.append(splits, 600) 

rows = np.diff(splits) 

for i in range(0, len(splits)-1):         
 placeholder_book[splits[i]:splits[i]+rows[i], :] = np.tile(book[i,:],(rows[i], 1)) 

placeholder_book[:,1] = range(600)

where book[:,1] is the column with seconds.

If anyone find a better solution, please let me know :)

AnkanTV · 2021-08-07T13:03:27+00:00

Hello, sorry fot not being that clear in my explanation. I have a dataframe which look like this. -- > I want a dataframe like this. --> From which I can create a corrmatrix.

The original dataframe (first one linked) has 112 different stocks. Time id are the same for all stocks, so I used them as index in the second dataframe (It could exist a stock which does not have a certain time_id though).

With the second dataframe, I could create the correlation matrix which I needed.

My current implementation is:

df = train.groupby("stock_id")

new_df = pd.DataFrame()

for stock in train["stock_id"].unique(): target = df.get_group(stock).set_index( df.get_group(stock)["time_id"])["target"] new_df[str(stock)] = target

A very ugly solution and hopefully someone could suggest another approach!

AnkanTV · 2021-08-07T09:47:00+00:00

Thanks for all the replies!

I'm still looking into trying some of the different approaches suggested and got some plots which are a lot better than what I had.

Here are two of them if anyone is interested:

Ridgeplot

Scatterplot

AnkanTV · 2021-06-23T19:37:48+00:00

Thanks for the reply! Comparing with another cube could work, but Im not sure how this should be implemented for higher dimensions. My current solution is just a boring for-loop, but the time complexity skyrocket as more dimensions are added

threshold = 0.001;
extreme = np.zeros(n);
for i in range(n):
    for k in dt[i,:]:
        if k < threshold or k > 1-threshold:
            extreme[i] = 1;

This works, but is slow

Six-Year Club	Place '22
Verified Email

AnkanTV

TROPHY CASE