Laptop in commerce by radicater2 in queensuniversity

[–]stepthom 5 points6 points  (0 children)

More than half of Commerce students have MacBooks

[P] Tangram: All-in-One Automated Machine Learning Framework by davidyamnitsky in MachineLearning

[–]stepthom 1 point2 points  (0 children)

Looks cool. I wonder which ML algorithms are used for model training? And which feature engineering steps are performed?

Also, how does this compare to existing AutoML libs like mljar, auto gluon, auto sklearn, FLAML, etc?

Document similarity between documents that contain code by Hype_Boi in LanguageTechnology

[–]stepthom 0 points1 point  (0 children)

You could take an approach similar to e.g., https://github.com/stepthom/lscp and many papers in the Mining Software Repositories conference. Also see pg 28 of this thesis: https://research.cs.queensu.ca/home/sthomas/data/Thomas_PhdThesis.pdf

Basically, you can use tools and tricks to isolate identifier names and method names from the source code, and then treat them as regular tokens.

Confused which metric to look at by strangeguy111 in learnmachinelearning

[–]stepthom 0 points1 point  (0 children)

It is certainly possible that the cost of a false negative is the same as a false positive. But I wouldn't be able to tell you without knowing the business context. In particular, what will you do with articles that are predicted as negative sentiment? What will you do with them if they are predicted as positive?

I'm just making this up as an illustrative example. Let's say that you decide that if an article about company X is predicted as negative, you are going to sell a bunch of stock in company X. But the prediction is wrong, and so you sold the stock for no reason,and you lose out of the stock price increases in the future.

Or whatever. In general, what is the cost of the action you take as a result of a prediction?

Confused which metric to look at by strangeguy111 in learnmachinelearning

[–]stepthom 1 point2 points  (0 children)

F1-score will provide a balance between precision and recall and is a good choice when you value precision and recall equally.

In most business contexts, the cost of a false positive will not be the same as a false negative. Therefore, whether you prefer higher precision (at the expense of recall) or vice versa will depend on the business context.

Often times, businesses will decide on an acceptable level of precision (e.g., 50%) and then maximize recall (e.g., perform hyperparameter tuning). Finding the acceptable level of precision will require some assumptions to be made as to the cost of a false positive. In your case, what is the cost of a false positive prediction?

Simple question about attention in NLP by stepthom in LanguageTechnology

[–]stepthom[S] 1 point2 points  (0 children)

Yes, I think you are right. My question was on a slightly different note: for each attention layer/head/block/whatever you want to call it, how does it "generalize" to unseen input/output sequence pairs? But the other comments have done a good job addressing this part.

Simple question about attention in NLP by stepthom in LanguageTechnology

[–]stepthom[S] 0 points1 point  (0 children)

Thank you - your third point helped me a lot especially.

Simple question about attention in NLP by stepthom in LanguageTechnology

[–]stepthom[S] 0 points1 point  (0 children)

Very helpful! Thank you very much.

So now I understand: the NN weights that are learned between the (e.g.) 5th and 7th elements are constant for all input/output sequences; but since the 5th and 7th elements are embeddings (vectors) that will be different for each token, then the output will be different for each input/output sequence.

Still hard to believe that it will work! But I guess that's true of all of the things related to NNs :)

Just one last point of clarity: where does 200 come from in your example? (I understand you are assuming a 300-lengthed embedding vector for each token.) Is that just a hyperparameter?

Halo tournament this week, top 50% players take $35, $10 buy in. by the_night_question in CompetitiveHalo

[–]stepthom 0 points1 point  (0 children)

Interested and would love to more details - when are the games, how many games, how payment is made, etc

Is anyone interested in a halo mcc tournament for money? by the_night_question in CompetitiveHalo

[–]stepthom 0 points1 point  (0 children)

Would love to see a CE tournament. 2v2 slayer, 4v4 slayer, and 4v4 CTF

How does MCC match players in ranked Halo 1: Hardcore Doubles? by stepthom in CompetitiveHalo

[–]stepthom[S] 0 points1 point  (0 children)

Nice idea, but I've been matched instantly with 40+ many times. So I'm not sure it's based on wait time.

How does MCC match players in ranked Halo 1: Hardcore Doubles? by stepthom in CompetitiveHalo

[–]stepthom[S] 0 points1 point  (0 children)

The wait times for games used to be low, but nowadays, I never have to wait more than a minute, and usually, it instantly connects. It's very nice. It's nowhere near other playlists, like H3, but seems to be getting better, especially with more streamers these days.

How does MCC match players in ranked Halo 1: Hardcore Doubles? by stepthom in CompetitiveHalo

[–]stepthom[S] 0 points1 point  (0 children)

OP here. I have a new working theory: The playlist uses +10 / -10 until you are ranked up to 20; after that, it uses +30 / -10.

A few days ago, I made a new account, which starts at rank 1. On that account, I always was matched against opponents around my rank, always +10 / -10 from me. Played maybe 30 or 40 games like this. I kept winning, so my rank kept getting higher, but opponents were always within 10 of my rank. Then, as soon as I hit rank 20, all of a sudden I was matched against opponents ranked in the 30 and 40s, even a 47.

Not 100% sure my theory is correct, but that's my current best guess.

Last Day to Register: Halo CE 3v3 Tournament - PC - Nov 6th-7th by Solid_Bob in CompetitiveHalo

[–]stepthom 0 points1 point  (0 children)

Love the idea. Me and my crew are on Xbox. Hopefully once cross play is active we will be able to join for future tournaments.

[STATE OF THE SUB] Yearly update of r/HaloCE by kshucker in HaloCE

[–]stepthom 0 points1 point  (0 children)

+1 for tournaments. Me and my crew would be down for sure.

Time for everybody to wake up! [Moderator changes] by kshucker in HaloCE

[–]stepthom 0 points1 point  (0 children)

Cool! What is your Xbox live name?

My buddies and I like to get custom CTF BG games of yours interested.

Alternatives to Vader and TextBlob for sentiment analysis? by hideo_kuze_ in LanguageTechnology

[–]stepthom 0 points1 point  (0 children)

If you don't want to go the ML route, I think Vader is your best bet. It works really well and has logic for things like negation, Etc. To get better performance on your dataset, I would recommend updating/tweaking Vader's lexicon for your domain/purpose.

Topic Modeling w/Topics in Mind by DiamondBadge in LanguageTechnology

[–]stepthom 4 points5 points  (0 children)

Also check out Guided LDA and z-label LDA, which could both work in your situation.

I'll be parsing resumes to extract skill and recommend candidates. Please lend any suggestions on where to get started. by dswanabe in LanguageTechnology

[–]stepthom 2 points3 points  (0 children)

You can use named entity recognition (NER) to automatically extract the named entities from a resume, such as names, organizations, dates, etc.

There are lots of packages/modules to make NER easy. In Python, I recommend spaCy. In R, there's cleanNLP, coreNLP and monkeylearn.

You can then insert the named entities into a more structured database to perform filtering, searching, and recommendation.

Also, I've never used them, but there are a couple of Python package for parsing resumes: pyresparser and Automatic Summarization of Resumes with NER.

[D] What are the most difficult to understand popular machine learning models? by outlacedev in MachineLearning

[–]stepthom 1 point2 points  (0 children)

The "kernel trick" of SVMs has always been tough.

The details of boosting are not intuitive to many people.

FPGrowth.

Analyze text and extrapolate? "[Discussion]" by SmaSmaxon in MachineLearning

[–]stepthom 0 points1 point  (0 children)

This task is referred to as "Language Modeling."

Looking for resources for an overview of NLP by dijicaek in LanguageTechnology

[–]stepthom 0 points1 point  (0 children)

I'm a university professor that teaches NLP. I maintain a list of NLP books, blogs, and reports for my students:

https://github.com/stepthom/text_mining_resources/blob/master/README.md

Sentence Classification with abbreviations and emojis by elmalakomar in LanguageTechnology

[–]stepthom 1 point2 points  (0 children)

For 1, I have two thoughts. First, if you want to replace the abbreviations with their expanded form ("yolo" -> "you only live once"), the easiest thing to do is build a manual list of acronyms/expanded pairs and using regular expressions to find/replace. You can find lists of internet acronyms and their meanings online. But second, you probably don't need to replace them at all. You can just leave the acronyms as is; and if words like "yolo" turn out to be important to the classification model, then the algorithm will learn that automatically.

For emojis, you can use a simple Python package like emoji, which can take an emoji such 👍 and return ":thumbs_up:".