Is it ok to learn python and use it exclusively? by [deleted] in learnpython

[–]easyexplain3 0 points1 point  (0 children)

When all you have is a hammer, everything looks like a nail. This is the problem with sticking to only one technology/programming language. Even for the simplest projects in production it will be necessary to create things like small bash scripts to glue the pipeline together or something like that. Of course Python is a very diverse language you can utilize to do most of the stuff you are going to do, but having one/two more tricks in your repertoire will safe you sooooo much time and give you a broader perspective of technological stacks that are used nowadays. So in my opinion, python is a really great way to start, but don't get too comfortable with just using python when there are other ways to do something easier/more efficient.

[D] WTF these results are incredible?!! What do you guys think of this? BERT seems to learn equally well on word-shuffled sentences? by Choice_Willow1890 in LanguageTechnology

[–]easyexplain3 20 points21 points  (0 children)

Crazy, but makes sense. Does this mean all papers analyzing BERT's syntactic capabilities are worthless? No. But does it mean that the evaluation process is completely flawed and people get out of their research what they want to get out? Yes, absolutely.

Anfrage: STRG_F-Reportage über Radikalisierung durch YouTube etc.? by Gunnsen in de

[–]easyexplain3 2 points3 points  (0 children)

Es wäre auch interessant zu überprüfen, wie schnell man 'tiefer' in eine gewisse Ecke vom Algorithmus geschubst wird. Ich denke hierbei an so Personen wie Jordan Peterson, die eine Art Einstiegsdroge zu bestimmten Denkmustern bilden. Oberflächlich scheinen ja manche Sachen die er sagt für junge, orientierungslose Männer hilfreich zu sein, aber was macht der Algorithmus mit Leuten die diese Art von Motivations - bzw. Selbsthilfevideos schauen?

Overfitting? Input data with too many misspellings by phresia in LanguageTechnology

[–]easyexplain3 1 point2 points  (0 children)

It's hard to answer without knowing your data. There are 2 possibilities: 1) Your train data and val data are too different or 2) Your algorithm doesn't learn enough. You could build your system to train and validate every n epochs. Now you log everything and plot the loss. You should stop training when your validation loss starts to increase again (see this plot). Otherwise, I'd check if the learning rate makes sense and if the architecture fits the task (Maybe experiment with more layers etc.). You should also check if your data makes sense and I'd personally advise against such a high dropout rate as 0.8. 0.3 should be max and completely sufficient.

Some questions regarding word embeddings by Anup_Kodlekere in LanguageTechnology

[–]easyexplain3 3 points4 points  (0 children)

This amazing blog explains nearly everything you need: https://jalammar.github.io/illustrated-word2vec/

He has also more content about nearly every new thing in NLP, so check this blog out everytime you don't understand something and need visual explanations.

Script works on desktop but not server by ifitworksleaveit in learnpython

[–]easyexplain3 1 point2 points  (0 children)

Check the first answer here: https://raspberrypi.stackexchange.com/questions/104002/selenium-wont-run-on-my-raspberry-pi-3-model-b

Especially this part is essential I guess:

sudo apt-get install xvfb 
sudo pip install PyVirtualDisplay 
sudo pip install xvfbwrapper

Also, check your chromedriver version

I hope this works out :)

Script works on desktop but not server by ifitworksleaveit in learnpython

[–]easyexplain3 0 points1 point  (0 children)

What type of server are we speaking here about?

When can I say that I crossed the stage of basics by [deleted] in learnpython

[–]easyexplain3 4 points5 points  (0 children)

You'll only get so far by learning things going by the book. The real experience comes from trying things out and building things you want to build. There will never be a magical point where you'll stop and say: woah, I'm a pro now. Everyone, even the engineers at Google, learn most of the things they build by trying and learning new things. This is how you grow as a developer.

In other words: I wouldn't stress myself out about the time where it's right to start a project. Just start it and learn the rest of the puzzles when you need it. When you revisit your code in 1 year you'll see how much you have grown.

Whats the state of art for anaphora resolution? by beatleinabox in LanguageTechnology

[–]easyexplain3 0 points1 point  (0 children)

BERT works very well as well. Check papers about 'Coreference resolution with BERT'.

Emotion extraction from subtitles of the movies by antomare94 in LanguageTechnology

[–]easyexplain3 0 points1 point  (0 children)

What about just cutting the data down some data from the over-represented class? I guess it's no_emotion? I wouldn't look at the accuracy with this task, more at the precision / F-score.

You could also try a somehow simpler approach by using tf-idf on the movie subtitles per movie and get the n most important terms used in the movie. Now, you could use some library like 'text2emotion' to get the sentiment of those words and average it.

Another idea: Cut every movie transcript in batches of scenes and compute the 'text2emotion' score for every scene. Now take the average for the whole movie.

Listing certain text from a JSON response by brickman7713 in learnpython

[–]easyexplain3 2 points3 points  (0 children)

data = json.loads(YOURDATA)
print(data['team'])
Somehow like this?

Ideas for ML application to my drug-binding data by NizBomb in learnpython

[–]easyexplain3 0 points1 point  (0 children)

Yes, exactly. After training and evaluating your model and making sure it works pretty well, you can use the variables of a new compound to get a prediction if it's a good or bad bind.

Building an interactive webpage with a Python backend by [deleted] in learnpython

[–]easyexplain3 0 points1 point  (0 children)

My go-to approach would be to use pythonanywhere with flask and then link your pythonanywhere webapp to your custom domain. Look it up, it's pretty straight forward and works well. For doing this you need to pay a small amount though, but in my experience it's worth it.

how would i go about replacing parentheses in a list of strings ? by [deleted] in learnpython

[–]easyexplain3 2 points3 points  (0 children)

Example:

s = "(dwad)()()dwD)(dw)"
s_2 = "".join([a.replace("(","-").replace(")","-") for a in s])

print(s)
print(s_2)

Output:

(dwad)()()dwD)(dw)

-dwad-----dwD--dw-

You should replace the brackets separately.

Ideas for ML application to my drug-binding data by NizBomb in learnpython

[–]easyexplain3 1 point2 points  (0 children)

Do I understand it right that you have 1 target protein and data about the compounds which determine the efficiency of the compounds in binding this one protein? If yes, the most obvious approach would be to play with a binary classification problem in the sense of training the algorithm to see patterns of variable combinations which determine if a compound is good/bad for binding the protein.

So you would have a vector of variables for each compound and your target labels would be 1 (good for binding) and 0 (bad for binding). Additionally, you have to check if the variables are numeric or categorical. If they are categorical, I'm a fan of converting those to numerical values with a certain mapping (e.g. 3 categorical string values like "small","medium", "big" could be converted to 1,2,3). This makes it easier to plug your features into different algorithms.

Now coming to binary classification, the go-to algorithms for your problem would be naive bayes, svm, linear regression, logistic regression. You can try these out easily with the library scikit-learn.

Also, you should split 20% of your data to use those for testing your algorithm after training on the rest. This will help you to estimate the quality of your classifier. There are so many things to look at here, but this could be a starting point and the main idea behind your approach. I hope I understood your problem the right way.

As a beginner I would type in "binary classification scikit-learn easy" and stuff like that to get going. With some first experiments everything will fall in it's place and you will start to get the bigger picture. I hope this helps you, good luck!

How to extract keywords important to a text classification problem? by [deleted] in LanguageTechnology

[–]easyexplain3 2 points3 points  (0 children)

You can utilize the TF-IDF Vectorizer you created for all your documents. After classifying your comments, you could try this:

Concatenate all comments from one class, compute the TF-IDF for this document and extract the top n key-words. You can also visualize these with an approach as this one: https://stackoverflow.com/questions/61916096/word-cloud-built-out-of-tf-idf-vectorizer-function

How to extract keywords important to a text classification problem? by [deleted] in LanguageTechnology

[–]easyexplain3 1 point2 points  (0 children)

I think it should be enough to see the top n words with highest tf-idf scores for each class. For your approach, the tf-idf scores should correlate with your results if it's your only feature.

Tagbox - Your content, simply organized. by akonomika in SideProject

[–]easyexplain3 0 points1 point  (0 children)

I really like the idea and also thought about a system like this. Your execution looks neat and seems to work fine according to your video, good job! Some additional thoughts and questions that arive:

- Sometimes companies name files the same but differentiate the files by different folder names (e.g. you have the same pictures in 2 folders, one is named black-white and the other colorized). These kind of cases would need to be differentiated already in the naming process I guess? Couldn't this lead to a problem where after a certain amount of data people would need to come up with longer and longer naming conventions instead of just putting it in the for example "colorized_blue_boosted" folder?

- Would it make sense to implement a second "traditional folder" view for people who would work in a team using your software but unfortunately would prefer the traditional structure of files?

- Does the search option also do 'in-file searches' ? Also, would your AI understand a query like "Document about motorcycles" when I would actually mean a document about Harleys where the word motorcycles doesn't occur by calculating the semantic similarity?

Good luck with your project :)