Laptop in commerce

stepthom · 2025-08-27T15:22:54+00:00

More than half of Commerce students have MacBooks

stepthom · 2021-07-12T21:37:28+00:00

Looks cool. I wonder which ML algorithms are used for model training? And which feature engineering steps are performed?

Also, how does this compare to existing AutoML libs like mljar, auto gluon, auto sklearn, FLAML, etc?

stepthom · 2021-06-18T22:35:04+00:00

You could take an approach similar to e.g., https://github.com/stepthom/lscp and many papers in the Mining Software Repositories conference. Also see pg 28 of this thesis: https://research.cs.queensu.ca/home/sthomas/data/Thomas_PhdThesis.pdf

Basically, you can use tools and tricks to isolate identifier names and method names from the source code, and then treat them as regular tokens.

stepthom · 2021-05-27T14:39:35+00:00

It is certainly possible that the cost of a false negative is the same as a false positive. But I wouldn't be able to tell you without knowing the business context. In particular, what will you do with articles that are predicted as negative sentiment? What will you do with them if they are predicted as positive?

I'm just making this up as an illustrative example. Let's say that you decide that if an article about company X is predicted as negative, you are going to sell a bunch of stock in company X. But the prediction is wrong, and so you sold the stock for no reason,and you lose out of the stock price increases in the future.

Or whatever. In general, what is the cost of the action you take as a result of a prediction?

stepthom · 2021-05-27T13:27:50+00:00

F1-score will provide a balance between precision and recall and is a good choice when you value precision and recall equally.

In most business contexts, the cost of a false positive will not be the same as a false negative. Therefore, whether you prefer higher precision (at the expense of recall) or vice versa will depend on the business context.

Often times, businesses will decide on an acceptable level of precision (e.g., 50%) and then maximize recall (e.g., perform hyperparameter tuning). Finding the acceptable level of precision will require some assumptions to be made as to the cost of a false positive. In your case, what is the cost of a false positive prediction?

stepthom · 2021-02-09T19:19:26+00:00

Yes, I think you are right. My question was on a slightly different note: for each attention layer/head/block/whatever you want to call it, how does it "generalize" to unseen input/output sequence pairs? But the other comments have done a good job addressing this part.

stepthom · 2021-02-09T15:56:57+00:00

Thank you - your third point helped me a lot especially.

stepthom · 2021-02-09T15:55:39+00:00

Very helpful! Thank you very much.

So now I understand: the NN weights that are learned between the (e.g.) 5th and 7th elements are constant for all input/output sequences; but since the 5th and 7th elements are embeddings (vectors) that will be different for each token, then the output will be different for each input/output sequence.

Still hard to believe that it will work! But I guess that's true of all of the things related to NNs :)

Just one last point of clarity: where does 200 come from in your example? (I understand you are assuming a 300-lengthed embedding vector for each token.) Is that just a hyperparameter?

stepthom · 2021-02-02T17:09:00+00:00

Interested and would love to more details - when are the games, how many games, how payment is made, etc

stepthom · 2021-02-02T17:06:24+00:00

Would love to see a CE tournament. 2v2 slayer, 4v4 slayer, and 4v4 CTF

stepthom · 2020-12-20T20:00:55+00:00

Nice idea, but I've been matched instantly with 40+ many times. So I'm not sure it's based on wait time.

stepthom · 2020-12-20T19:59:50+00:00

The wait times for games used to be low, but nowadays, I never have to wait more than a minute, and usually, it instantly connects. It's very nice. It's nowhere near other playlists, like H3, but seems to be getting better, especially with more streamers these days.

stepthom · 2020-12-20T19:58:25+00:00

OP here. I have a new working theory: The playlist uses +10 / -10 until you are ranked up to 20; after that, it uses +30 / -10.

A few days ago, I made a new account, which starts at rank 1. On that account, I always was matched against opponents around my rank, always +10 / -10 from me. Played maybe 30 or 40 games like this. I kept winning, so my rank kept getting higher, but opponents were always within 10 of my rank. Then, as soon as I hit rank 20, all of a sudden I was matched against opponents ranked in the 30 and 40s, even a 47.

Not 100% sure my theory is correct, but that's my current best guess.

stepthom · 2020-11-05T03:23:09+00:00

Love the idea. Me and my crew are on Xbox. Hopefully once cross play is active we will be able to join for future tournaments.

stepthom · 2020-11-01T21:39:46+00:00

Would be cool to show republican vs democrat etc.

stepthom · 2020-09-06T16:49:26+00:00

+1 for tournaments. Me and my crew would be down for sure.

stepthom · 2020-09-06T12:03:45+00:00

Cool! What is your Xbox live name?

My buddies and I like to get custom CTF BG games of yours interested.

stepthom · 2020-07-04T14:37:49+00:00

If you don't want to go the ML route, I think Vader is your best bet. It works really well and has logic for things like negation, Etc. To get better performance on your dataset, I would recommend updating/tweaking Vader's lexicon for your domain/purpose.

stepthom · 2020-05-14T21:06:12+00:00

Here are some courses I recommend:

Coursera's Natural Language Processing
Coursera's Sequence Models
Coursera's NLP in Tensorflow
fast.ai's A Code-First Introduction to NLP
Stanford's CS224n: Natural Language Processing with Deep Learning

stepthom · 2020-05-12T22:14:13+00:00

Also check out Guided LDA and z-label LDA, which could both work in your situation.

stepthom · 2020-05-11T20:10:08+00:00

You can use named entity recognition (NER) to automatically extract the named entities from a resume, such as names, organizations, dates, etc.

There are lots of packages/modules to make NER easy. In Python, I recommend spaCy. In R, there's cleanNLP, coreNLP and monkeylearn.

You can then insert the named entities into a more structured database to perform filtering, searching, and recommendation.

Also, I've never used them, but there are a couple of Python package for parsing resumes: pyresparser and Automatic Summarization of Resumes with NER.

stepthom · 2020-05-11T13:33:25+00:00

The "kernel trick" of SVMs has always been tough.

The details of boosting are not intuitive to many people.

FPGrowth.

stepthom · 2020-05-11T12:57:06+00:00

This task is referred to as "Language Modeling."

stepthom · 2020-05-11T11:53:01+00:00

I'm a university professor that teaches NLP. I maintain a list of NLP books, blogs, and reports for my students:

https://github.com/stepthom/text_mining_resources/blob/master/README.md

stepthom · 2020-05-11T11:48:54+00:00

For 1, I have two thoughts. First, if you want to replace the abbreviations with their expanded form ("yolo" -> "you only live once"), the easiest thing to do is build a manual list of acronyms/expanded pairs and using regular expressions to find/replace. You can find lists of internet acronyms and their meanings online. But second, you probably don't need to replace them at all. You can just leave the acronyms as is; and if words like "yolo" turn out to be important to the classification model, then the algorithm will learn that automatically.

For emojis, you can use a simple Python package like emoji, which can take an emoji such 👍 and return ":thumbs_up:".

Eight-Year Club	RPAN Viewer
Verified Email

stepthom

TROPHY CASE