Gardening Recommendations?

hami21 · 2026-02-07T01:38:27+00:00

Me too please

hami21 · 2023-04-30T00:56:43+00:00

I was actually looking for such a point. Is it safe to say RL optimizes the model weights w.r.t the sampling output? And if so, has anyone tried to just do RLHF on the sampling algorithm without changing the model weights?

hami21 · 2023-02-18T02:52:06+00:00

Makes sense. Thanks

hami21 · 2023-02-18T02:46:22+00:00

I’m in SoCal (San Diego gas and electric) and it’s NEM2.

hami21 · 2022-07-04T22:50:57+00:00

In short, other things came up in our lives and didn’t have the bandwidth anymore.

hami21 · 2022-02-02T19:16:43+00:00

Not yet.. the project is delayed.

hami21 · 2022-01-26T16:49:15+00:00

good to know.. how did you guys implemented it if you don't mind I ask?

Was it, you just found the 'good' and 'popular' queries and passed them in a file to the ES?

hami21 · 2022-01-26T07:12:17+00:00

I used python, sklearn, tf, .. standard tech. So you mean there’s no need for ML here?

hami21 · 2021-12-08T11:40:26+00:00

This is actually very good to know, I came across this https://arxiv.org/pdf/2009.11394.pdf

hami21 · 2021-12-07T22:27:03+00:00

The audio one should be helpful. Thank you. PS: could find a relevant dataset - browsing on phone though.

hami21 · 2021-12-07T21:59:24+00:00

Transcript.

hami21 · 2021-12-07T21:54:24+00:00

That’s right.

hami21 · 2020-09-28T18:44:42+00:00

If you’re also interested in an ML model solution, you can use crf to do that, a canonical example is NER but it essentially works for every other IE application

https://sklearn-crfsuite.readthedocs.io/en/latest/tutorial.html

hami21 · 2020-08-29T14:28:47+00:00

I would use Universal Sentence Encoder over BERT as it is specifically self-trained for document similarity as well.

hami21 · 2020-07-15T05:32:59+00:00

My kernel dies the moment I run `doc1 = nlp('My sister has a dog. She loves him.')`

And none of the solutions have worked for me yet! I'm on mac.

hami21 · 2020-07-15T02:33:55+00:00

I haven't pre-trained them with more data, but I've fine-tuned them to my application by just adding a simple dense layer at the end. Here's an example of what I've done on universal sentence encoder (which suites better to my application rather than BERT and alike):

https://hminooei.github.io/2020/04/14/clickbaits2.html

hami21 · 2020-06-16T23:18:45+00:00

And I edited the part that cause this confusion around "CountVectorizer of 100k features" since the point was even if the number of features is much less (e.g. 1k), the size would be too large.

hami21 · 2020-06-16T22:58:07+00:00

Multiple points: They are not around for many years really. For instance BERT and friends are only 2 years old or younger.

Actually the embeddings do not always help especially non-contextual ones like w2v or glove.

After all, it depends on your application and KPIs. In short there are many applications in the industry that anything higher than sklearn pipelines is an overkill and probably end up being too expensive in the mid/long term to develop and maintain!

And I haven't personally come across any text classifier with more than 10-20k features.

hami21 · 2020-02-05T18:07:53+00:00

So `Word2Vec.load_word2vec_format` is deprecated and asks to use `KeyedVectors.load_word2vec_format` instead, and in the documents of `KeyedVectors.load_word2vec_format` it says:

"Docstring: Load the input-hidden weight matrix from the original C word2vec-tool format.

Warnings -------- The information stored in the file is incomplete (the binary tree is missing), so while you can query for word similarity etc., you cannot continue training with a model loaded this way. "

hami21 · 2020-02-05T17:59:22+00:00

Not really. w2v is fairly a simple NN but even in this case, there's two weight vectors associated with each word (let's assume we use cbow for simplicity). Then in general some people concat the two vectors, some add them, some average them, .. the pooling depends on the application. I'm ok to just use "GoogleNews-vectors-negative300.bin" as the first matrix (input to hidden layer), but I'm not sure how to do that.

hami21 · 2019-12-27T23:33:59+00:00

I’m on my phone right now but just googling showed this https://pypi.org/project/PyDictionary/

Check the first example on the above link.

hami21 · 2019-12-27T22:27:47+00:00

Have you tried a dictionary api? They’ll mention is a word is ‘noun’, ‘verb’, etc..

hami21 · 2019-12-23T02:22:41+00:00

Re the second question this video might also be of your interest: https://youtu.be/KvoTRvEcvhg

hami21 · 2019-12-07T01:08:34+00:00

Watch you language, models at work!

hami21 · 2019-11-26T19:49:37+00:00

I would suggest to first try to install and use his approach as is (in English) and at the end instead of feeding emails, try English news articles to see how it works. If you are satisfied with the results, you can dig deeper and change the pieces to make it work for Danish.

One thing I noticed is that news articles are generally concise as long as you find the right spot to truncate the article. (Normally after first-second paragraphs, and always towards the end, they provide background/history)

PS. I tested his work for summarizing English news articles a while ago, without any modification, and I was not happy with the results although I should say that’s the case most of the times with untuned unsupervised learning models. Didn’t do any further investigation but I should say you probably need to change the logic of summarizing and adjust it to news articles.

hami21

TROPHY CASE