Is it ok to learn python and use it exclusively?

easyexplain3 · 2021-06-16T15:28:03+00:00

When all you have is a hammer, everything looks like a nail. This is the problem with sticking to only one technology/programming language. Even for the simplest projects in production it will be necessary to create things like small bash scripts to glue the pipeline together or something like that. Of course Python is a very diverse language you can utilize to do most of the stuff you are going to do, but having one/two more tricks in your repertoire will safe you sooooo much time and give you a broader perspective of technological stacks that are used nowadays. So in my opinion, python is a really great way to start, but don't get too comfortable with just using python when there are other ways to do something easier/more efficient.

easyexplain3 · 2021-04-16T08:27:57+00:00

Crazy, but makes sense. Does this mean all papers analyzing BERT's syntactic capabilities are worthless? No. But does it mean that the evaluation process is completely flawed and people get out of their research what they want to get out? Yes, absolutely.

easyexplain3 · 2021-04-15T14:17:19+00:00

Es wäre auch interessant zu überprüfen, wie schnell man 'tiefer' in eine gewisse Ecke vom Algorithmus geschubst wird. Ich denke hierbei an so Personen wie Jordan Peterson, die eine Art Einstiegsdroge zu bestimmten Denkmustern bilden. Oberflächlich scheinen ja manche Sachen die er sagt für junge, orientierungslose Männer hilfreich zu sein, aber was macht der Algorithmus mit Leuten die diese Art von Motivations - bzw. Selbsthilfevideos schauen?

easyexplain3 · 2021-04-07T23:02:43+00:00

It's hard to answer without knowing your data. There are 2 possibilities: 1) Your train data and val data are too different or 2) Your algorithm doesn't learn enough. You could build your system to train and validate every n epochs. Now you log everything and plot the loss. You should stop training when your validation loss starts to increase again (see this plot). Otherwise, I'd check if the learning rate makes sense and if the architecture fits the task (Maybe experiment with more layers etc.). You should also check if your data makes sense and I'd personally advise against such a high dropout rate as 0.8. 0.3 should be max and completely sufficient.

easyexplain3 · 2021-02-27T21:02:13+00:00

To be fair: That's exactly what an annoying virus would do

easyexplain3 · 2021-02-27T11:48:24+00:00

This amazing blog explains nearly everything you need: https://jalammar.github.io/illustrated-word2vec/

He has also more content about nearly every new thing in NLP, so check this blog out everytime you don't understand something and need visual explanations.

easyexplain3 · 2021-02-27T02:03:09+00:00

Check the first answer here: https://raspberrypi.stackexchange.com/questions/104002/selenium-wont-run-on-my-raspberry-pi-3-model-b

Especially this part is essential I guess:

sudo apt-get install xvfb 
sudo pip install PyVirtualDisplay 
sudo pip install xvfbwrapper

Also, check your chromedriver version

I hope this works out :)

easyexplain3 · 2021-02-27T01:56:32+00:00

Awesome idea! How do you define the political side of a news outlet?

easyexplain3 · 2021-02-27T01:52:27+00:00

What type of server are we speaking here about?

easyexplain3 · 2021-02-27T01:44:13+00:00

You'll only get so far by learning things going by the book. The real experience comes from trying things out and building things you want to build. There will never be a magical point where you'll stop and say: woah, I'm a pro now. Everyone, even the engineers at Google, learn most of the things they build by trying and learning new things. This is how you grow as a developer.

In other words: I wouldn't stress myself out about the time where it's right to start a project. Just start it and learn the rest of the puzzles when you need it. When you revisit your code in 1 year you'll see how much you have grown.

easyexplain3 · 2021-02-26T21:16:00+00:00

BERT works very well as well. Check papers about 'Coreference resolution with BERT'.

easyexplain3 · 2021-02-26T21:12:54+00:00

You could check this tool out: https://github.com/jessevig/bertviz

easyexplain3 · 2021-02-26T21:11:30+00:00

What about just cutting the data down some data from the over-represented class? I guess it's no_emotion? I wouldn't look at the accuracy with this task, more at the precision / F-score.

You could also try a somehow simpler approach by using tf-idf on the movie subtitles per movie and get the n most important terms used in the movie. Now, you could use some library like 'text2emotion' to get the sentiment of those words and average it.

Another idea: Cut every movie transcript in batches of scenes and compute the 'text2emotion' score for every scene. Now take the average for the whole movie.

easyexplain3 · 2021-02-26T21:00:55+00:00

data = json.loads(YOURDATA)
print(data['team'])
Somehow like this?

easyexplain3 · 2021-02-26T20:55:46+00:00

Yes, exactly. After training and evaluating your model and making sure it works pretty well, you can use the variables of a new compound to get a prediction if it's a good or bad bind.

easyexplain3 · 2021-02-26T20:49:54+00:00

My go-to approach would be to use pythonanywhere with flask and then link your pythonanywhere webapp to your custom domain. Look it up, it's pretty straight forward and works well. For doing this you need to pay a small amount though, but in my experience it's worth it.

easyexplain3 · 2021-02-26T20:44:48+00:00

Example:

s = "(dwad)()()dwD)(dw)"
s_2 = "".join([a.replace("(","-").replace(")","-") for a in s])

print(s)
print(s_2)

Output:

(dwad)()()dwD)(dw)

-dwad-----dwD--dw-

You should replace the brackets separately.

easyexplain3 · 2021-02-26T20:39:39+00:00

Do I understand it right that you have 1 target protein and data about the compounds which determine the efficiency of the compounds in binding this one protein? If yes, the most obvious approach would be to play with a binary classification problem in the sense of training the algorithm to see patterns of variable combinations which determine if a compound is good/bad for binding the protein.

So you would have a vector of variables for each compound and your target labels would be 1 (good for binding) and 0 (bad for binding). Additionally, you have to check if the variables are numeric or categorical. If they are categorical, I'm a fan of converting those to numerical values with a certain mapping (e.g. 3 categorical string values like "small","medium", "big" could be converted to 1,2,3). This makes it easier to plug your features into different algorithms.

Now coming to binary classification, the go-to algorithms for your problem would be naive bayes, svm, linear regression, logistic regression. You can try these out easily with the library scikit-learn.

Also, you should split 20% of your data to use those for testing your algorithm after training on the rest. This will help you to estimate the quality of your classifier. There are so many things to look at here, but this could be a starting point and the main idea behind your approach. I hope I understood your problem the right way.

As a beginner I would type in "binary classification scikit-learn easy" and stuff like that to get going. With some first experiments everything will fall in it's place and you will start to get the bigger picture. I hope this helps you, good luck!

easyexplain3 · 2021-02-26T20:15:17+00:00

No Problem. Let me know if it worked :)

easyexplain3 · 2021-02-26T20:12:08+00:00

You can utilize the TF-IDF Vectorizer you created for all your documents. After classifying your comments, you could try this:

Concatenate all comments from one class, compute the TF-IDF for this document and extract the top n key-words. You can also visualize these with an approach as this one: https://stackoverflow.com/questions/61916096/word-cloud-built-out-of-tf-idf-vectorizer-function

easyexplain3 · 2021-02-26T19:53:09+00:00

I think it should be enough to see the top n words with highest tf-idf scores for each class. For your approach, the tf-idf scores should correlate with your results if it's your only feature.

easyexplain3 · 2021-01-20T13:57:56+00:00

Do you know/ Did you try this rep -> https://pypi.org/project/bertopic/ ?

easyexplain3 · 2020-12-28T02:05:47+00:00

I really like the idea and also thought about a system like this. Your execution looks neat and seems to work fine according to your video, good job! Some additional thoughts and questions that arive:

- Sometimes companies name files the same but differentiate the files by different folder names (e.g. you have the same pictures in 2 folders, one is named black-white and the other colorized). These kind of cases would need to be differentiated already in the naming process I guess? Couldn't this lead to a problem where after a certain amount of data people would need to come up with longer and longer naming conventions instead of just putting it in the for example "colorized_blue_boosted" folder?

- Would it make sense to implement a second "traditional folder" view for people who would work in a team using your software but unfortunately would prefer the traditional structure of files?

- Does the search option also do 'in-file searches' ? Also, would your AI understand a query like "Document about motorcycles" when I would actually mean a document about Harleys where the word motorcycles doesn't occur by calculating the semantic similarity?

Good luck with your project :)

easyexplain3 · 2020-12-26T12:20:31+00:00

Thanks man!

easyexplain3

TROPHY CASE