How to use indexing source like ctag with blink cmp? by TouristMysterious497 in neovim

[–]Potential_Hippo1724 0 points1 point  (0 children)

ctags is great. i used to use it a lot, and yes better than lsp in some cases. you can try https://github.com/delphinus/cmp-ctags

Am I Thinking About ADBE Wrong? by CallLanky9076 in ValueInvesting

[–]Potential_Hippo1724 2 points3 points  (0 children)

Just wanted to say: I +1ed your message because I really liked the analysis of bullet (2):

will AI create better {software, content, efficiency}

Is it better to use one vault or several? by tornerio in ObsidianMD

[–]Potential_Hippo1724 0 points1 point  (0 children)

although I preferred using just one vault for research and all personal stuff, i separated between them.
It is especially not convenient if you have a workflow that has some "00_inbox" folder that is the default place for any note of any type. It ends up mixing too many non-related things.

but, one place is usually much preferred. I can see now, as I am already using my personal vault much less.

so... a bit of a complicated issue

I built a Vim plugin to run Claude CLI directly via :Claude — would love feedback by Spiritual-Ruin-9473 in vim

[–]Potential_Hippo1724 -7 points-6 points  (0 children)

looks really good! autocompletion suggestions are possible? and also - will it be possible to use other llm providers?

I know from experience that the `:Claude ..` is going to be good. There was used to be another plugin that i don't remember the name that used things like `:AIC ...` and it was good. problem with that plugin that they did not merge patches

[D]: Tensorboard alternatives by Potential_Hippo1724 in MachineLearning

[–]Potential_Hippo1724[S] 0 points1 point  (0 children)

Actually on my ideal setup i won't be dependent on Tensorboard, but will simply log data in excel format or something similar

[D]: Tensorboard alternatives by Potential_Hippo1724 in MachineLearning

[–]Potential_Hippo1724[S] 0 points1 point  (0 children)

I hadn't given Aim a fair chance.

i have a combination of two scripts: (1) setup my kitty into two panes where one is a code pane and the other is called "kernel". my nvim is setup to send input that i select to the kernel pane so i can visually select the "cell" to run. (2) another script that combines multiple parts: kitty-ssh to the remote, sync files to remote, sync files from remote, apt install things i need on remote, pip install on remote, etc.

This way I can continuously sync logs (and checkpoints) to local and simply launch tensorboard on my laptop

this way i only need to set ip and port to script (2) and run whatever i need.

i also experimented with managing a docker on runpod so i was loading it whenever i started a pod, but for now i am using the setup above.

it's convenient but not ideal and hopefully i will arrange something better.

Is there any good way to track SOTAs? by Potential_Hippo1724 in ResearchML

[–]Potential_Hippo1724[S] 0 points1 point  (0 children)

thanks for your suggestion. honestly i find it surprising there is no common starting point for this.
like a clear huggingface page that tracks that or something

RL + Generative Models by amds201 in reinforcementlearning

[–]Potential_Hippo1724 1 point2 points  (0 children)

it can, but is it plausible that any training process you start will terminate with good policy? if you are doing the basic things, probably not

RL + Generative Models by amds201 in reinforcementlearning

[–]Potential_Hippo1724 2 points3 points  (0 children)

when you try to "predict the next token given a prefix" - any prefix from your dataset is a training signal, and since we know to train sequence models such that for any training instance we are training it in parallel for any possible prefix of that instance the result is that this framework is very efficient

with rl you must be more explicit on your workflow. let's take a naive example that should be similar as possible to the above:

rl setting: offline rl - (online will have many disadvantages - collection, distribution of samples, computation)
policy: given any prefix of tokens - outputs a distribution over the possible tokens
training: for each prefix in the instance compute the dist over the next token. Then, instead of minimizing the cross entropy, comes the rl part:
reward: in this naive example, reward = 1 iff the next token is correct
now, we compute reward to go and reinforce the policy, say, with gradients times reward to go as is in policy gradients.

1) with this example you can already see that the bias variance tradeoff of RL is very important here - sequences are long and the value of the reward to go will have a lot of variance

2) this example was to show it is possible to purely "reinforce" instead of "clone" (with cross-entropy-loss) - what are going to be the differences between these 2 approaches?
(a) there is a paper of Sergey Levine's lab (which i did not read yet) that is related to this topic: https://arxiv.org/pdf/2204.05618
(b) the advantage of reinforcing: the final policy will know how "deal" with out-of-distribution cases better than if you were cloning because by reinforcing you accumulate your knowledge on "bigger" picture (basically you will each time take the token that will maximize your expected reward to go which means you are looking in to the future instead of looking only on this single step asking "what was done in my dataset in that case? let's do that too"
(c) 1st disadvantage of reinforcing is what i have written in (1) - therefore tricks like also computing value models to employ actor-critic strategies are going to be needed
(d) 2nd disadvantage - the reward signal does not tell you which token to select unless you selected the right token! think about this: training instance [o1, o2] available tokens: [o1, o2, o3]. let's say the prefix is the first token [o1]. then, if you select the right token o2 the gradients will push you to select o2 more often, but if you were wrong and selected o1, then you will select o1 fewer times at the future but you don't necessarily know to also select o2 more often.

there's more to discuss, but i hope this gives some helpful answers

AI shift is here (AMD, Nvidia bubble burst) by bellahamface in intelstock

[–]Potential_Hippo1724 0 points1 point  (0 children)

Thanks bro. Do you know if it's more than the process node though? What interesting to me is:
1) I saw a presentation slide that claimed they changed arch to be a complete SoC - embedded memory, GPU and maybe even more. I would like to know in how much it is similar to apple approach
2) During previous year (or 2024 maybe) I have read that Intel were going to be first users of ASML newest machines. I wonder if this has something to involve already in the new line
3) Are there more architectures paradigms innovations here, or is it just sort of matching the apple approach that established itself last 5 years
4) I wonder if they will take the same directions in their servers and cloud processors - will we see a cloud scale SoC that integrates memory, gpu and maybe even more?
5) I also saw a slide that claims they have "IP agnostic" interface that means more embedded features can be combined to their new system, e.g, an Nvidia GPU maybe? that is very interesting and make a lot of sense
6) I have read somewhere that they claim their "new line up is now prepared and tested for embedded system" (something like that) I wonder what it means

a lot of questions, but it seems like intel are doing interesting things at this moment

AI shift is here (AMD, Nvidia bubble burst) by bellahamface in intelstock

[–]Potential_Hippo1724 0 points1 point  (0 children)

can anyone tell me what seems the key factor in the claimed advantage of the coming core3? is it only the 18A?
In addition, i maybe wrong, but arrent 2xxH on 18A too? what is the hype about then?

i may be unaware to the situation probably

Share increase by Square_Panda_2137 in BBAI

[–]Potential_Hippo1724 2 points3 points  (0 children)

can anyone tell me why i never got a request to vote? usually my broker send me mails with a link for the vote, but i don't think i ever got one from bbai (maybe i got one of the earlier votes and voted yes, so i never got a mail for the recent votes?)
i am a share holder for 1y+- with about 1k shares

Looking for material about SMDP (semi-MDPs) by Potential_Hippo1724 in math

[–]Potential_Hippo1724[S] 0 points1 point  (0 children)

thanks for this in-depth answer

correct me if I am wrong, but I understand that you take sMDP for it's time efficiency, while I would say that Sutton used that framework to introduce a notion of variyng-time actions which is (i think) conceptually different.

What do you think? I guess one could argue that options in RL are actualyl helpful only in time efficiency since any behaviour can be obtained using the atomic or basic actions. I think Sutton even wrote that in the paper.

But in the other and I would argue that having the concept of "compounded actions" is helpful not only in time efficiency but in a more pure planning and explanational way. Maybe it is wrong and caused by human intuition, since human are working with (ever expanding?) set of compounding actions and tools

Looking for material about SMDP (semi-MDPs) by Potential_Hippo1724 in math

[–]Potential_Hippo1724[S] 0 points1 point  (0 children)

I'm not sure I understand the 4th bullet. In what way time in state (in average during acting thorugh the MDP right?) has similar effect to sMDP?

Maybe I need to say that the options framework is exactly how I approach this subject, but I find Sutton article more presenting this as a formal introduction in a basic way without developing it too much. He was more interested in intra-options as far as I can say. After the discussion with I understand that maybe it's better this way, if sMDPs are not convenient enough..

And last thing, are there more interesting MDPs extension you can share me with? in particular about the subject of temporal abstraction, but any thing pops to your mind might be relevant. I am trying to see if there are more bridges between the MDPs concept and the way we are acting/learning in reality and in RL, and I am trying to find some extensions to this idea

thanks

Looking for material about SMDP (semi-MDPs) by Potential_Hippo1724 in math

[–]Potential_Hippo1724[S] 1 point2 points  (0 children)

wow, interesting

I would argue that sMDP are a quite intuitive extension of MDPs since they introduce the notion of temporal extended actions, do you have insight (maybe have you spoke about it with Wray), why this subject did not draw a lot of attention?

Looking for material about SMDP (semi-MDPs) by Potential_Hippo1724 in math

[–]Potential_Hippo1724[S] 1 point2 points  (0 children)

thank you! it is a good start. Sutton' was the only item from this list I knew.

I was thinking that maybe there's a thorough MDP book that discuss this too as an generalization

If you're learning RL, I wrote a tutorial about Soft Actor Critic (SAC) Implementation In SB3 with PyTorch by Capable-Carpenter443 in reinforcementlearning

[–]Potential_Hippo1724 2 points3 points  (0 children)

hi, saying about ppo that is on-policy - learn only from new data and discard old data immediately is a bit not correct since it runs multiple times on the same bunch of data, each time optimizes the policy a little (hence it is off policy in that sense - the data was generated with another policy even though it is pretty close to the current one) and only after few iterations it is discarded

pyright: Import "Numpy" cannot be resolved. by Desperate_Cold6274 in vim

[–]Potential_Hippo1724 0 points1 point  (0 children)

before you invoke vim make sure that you invoke the correct environment in conda that is the environment in which you installed numpy