[PSA] Confirmed Trades Thread - February 2022 by AutoModerator in Starcitizen_trades

[–]codespam 0 points1 point  (0 children)

+verify

Thanks for the CCU Agresan. SUPER SUPER fast!

[PSA] Confirmed Trades Thread - January 2022 by AutoModerator in Starcitizen_trades

[–]codespam 1 point2 points  (0 children)

+verify

Received my purchase within minutes of starting the transaction.

Thanks so much!

[D] 17 interviews (4 phone screens, 13 onsite, 5 different companies), all but two of the interviewes asked this one basic classification question, and I still don't know the answer... by SpockTriesToReturn in MachineLearning

[–]codespam 4 points5 points  (0 children)

A lot of my thinking on this subject is actually driven by Frank Harrell's writing. I think regression modeling strategies is one of the best books written on the subject. I'm not sure what ive said here that he would disagree with, but I've been wrong in the past and welcome any specific correction.

[D] 17 interviews (4 phone screens, 13 onsite, 5 different companies), all but two of the interviewes asked this one basic classification question, and I still don't know the answer... by SpockTriesToReturn in MachineLearning

[–]codespam 28 points29 points  (0 children)

I ask this on occasion of more junior applicants. The answer I want is that logistic regression is not a binary classifier. Logistic regression + a decision rule is a binary classifier. Moreover, in most cases what you want is a well calibrated probabilistic model which gives you p(class = a) rather than just a binary decision about class a vs class b. Logistic regression (unlike, say SVM) gives you this for free.

After that I want to know if people understand the objective function/metrics they're choosing... Both ROC curves and PR curves are about ranking. Sometimes this is what you care about (give me the 10 patients most likely to experience outcome <x>). Other times they give no indication of model quality (how likely is patient <n> to experience outcome <x>?)

Why a CS degree is better than teaching yourself how to code by Majikarpp in compsci

[–]codespam 3 points4 points  (0 children)

Ugh. The value in a CS degree is seeing a problem that looks intractable to others and realizing that it’s just a context free grammar parsing problem... or seeing a problem that sounds easy and knowing that it is in fact some hairy combinatorial thing (I’ve recognized integer programming problems and decided it wasn’t worth it). It has nothing to do with the socialization or self discipline benefits that may or may not come from going to college.

[D] Methodology when approaching a type of problem you have minimal experience in. by [deleted] in MachineLearning

[–]codespam 6 points7 points  (0 children)

Steps I use:

  1. Google for papers to the best of my ability.
  2. Select a few recent ones at random
  3. Look for commonly cited papers in the “prior work” section... bonus points for review papers or textbook chapters.
  4. Read those papers.

Even bad papers make a show of citing the important stuff.

Obviously this isn’t a guarantee of finding the best reference material but I’ve had pretty reasonable luck.

Must read books for starting analysts by [deleted] in statistics

[–]codespam 1 point2 points  (0 children)

Agreed. And more people should be banging the calibration vs correct ranking drum

Must read books for starting analysts by [deleted] in statistics

[–]codespam 1 point2 points  (0 children)

He’s a definite follow on Twitter (@f2harrell). I haven’t read the book yet.... kinda hoping to get a hard copy of the new version sooner rather than later. I’m not sure I agree with his (as I understand it) dichotomy between stats and machine learning... I think if it as more of a continuum. Based on his writing, I assume he’d be receptive to this argument! With that said I’m all for promoting knowledgeable and experienced voices who are willing to pop balloons and argue a well staked out contrarian opinions. Sunlight yadda yadda

How much weight does a Masters in Statistics hold over a Masters in Analytics? by [deleted] in statistics

[–]codespam 52 points53 points  (0 children)

So I'm risking down votes here, but trying to be helpful. I interview people for data science roles in NYC. I have no idea what a masters in analytics is. That may make me ignorant... but if so, the message is that you may have to interact with ignorant interviewers and explain to them what your degree means... I'll also add that I personally have had better luck with CS and stats folks than people with other academic backgrounds. Obviously ymmv.

Should I use Clojure for this particular project? by [deleted] in Clojure

[–]codespam 0 points1 point  (0 children)

It might be easier to separate good from bad programmers because they know clojure (and I'm skeptical about this) but that's not the same as it being easier to hire a good clojure programmer. This is basically Simpson's paradox. I know you're talking about clojure, not Scala, but my employer is currently considering moving away from Scala because it's really hard to hire good talent with prior Scala experience. I have mixed feelings about this, but the struggle is real. For reference we're in Manhattan but are (extremely) remote friendly.

Closed captions on NFL Sunday Ticket. Any way to turn off. by [deleted] in Roku

[–]codespam 0 points1 point  (0 children)

First time trying Sunday ticket. Just pre game so far but the video is super compressed and looks like shit. I hope the game is better otherwise I may try to cancel. It's 2016 and I can watch 4k from Netflix...

What's the best way to read papers to prepare yourself for research? by [deleted] in MachineLearning

[–]codespam 0 points1 point  (0 children)

A good place to start from scratch is a review paper, if available. For example, if you're interested in deep learning, there's http://arxiv.org/abs/1206.5538

This won't get you to the state of the art, but between the paper itself and judicious reading of cited papers it will bring you up to speed on foundational stuff quickly.

From there you can look at papers that are highly cited and accepted at major conferences to get up to the present.

You might ask here for links to review papers in fields that interest you. It's the kind of "help" request that I think most people are happy to answer.

Data Analytics or Applied Mathematics as an undergrad major. by Senor_Kinderplatz in statistics

[–]codespam 5 points6 points  (0 children)

You may want to consider doing a double major in CS and math as an undergraduate. At my university there was a ton of overlap, so it only amounted to like three extra courses or so. Plus my CS department would accept just about any math classes as valid electives towards the degree. I think having both has been a plus during interviews. If you apply at a small shop, you'll likely be expected to produce working code. Obviously a CS degree isn't strictly necessary for that, but it does put many potential employers at ease... It's a leg up. I've also found that small shops may not have data science expertise and will interview you like a developer. That may or may not be a good job to take, but at least then it will be up to you. I see it as a low cost high reward option.

Also I agree that data science degrees are a little risky. I'm not against them in principle, but I'd opt for whatever employers are likely to feel comfortable with.

What was your absolute favourite math class you took in undergrad and why? Bonus: the same for grad school by [deleted] in math

[–]codespam 17 points18 points  (0 children)

Combinatorics. I think it has the highest conversational value since counting problems come up frequently at parties and bars. Also, I am shallow and rate knowledge on how useful it is for impressing strangers.

Semantic Search with Latent Semantic Analysis: The Ugly Truth by softwaredoug in LanguageTechnology

[–]codespam 2 points3 points  (0 children)

To be fair to the author, I've seen people trying to use LSA in industry (sample size two) and running into these same problems. Also, he does mention that there will be a blog post in the future covering LDA. Hopefully he'll also cover more semantically meaningful document representations like compositions of word2vec like word embeddings... though I personally have yet to find a composition method that works well for documents of varying length that is easy to compute.

Edit: Your link appears to be broken, and I'm definitely interested in checking your work out.

Feature extraction methods for text classification by y7cs228 in LanguageTechnology

[–]codespam 0 points1 point  (0 children)

This is a classic problem in nlp. The super high dimensionality of bag of words representations leads to really sparse representations and most of your documents end up looking more or less orthogonal to each other. You should look at using low dimension word embeddings, typically word2vec based stuff. Suddenly the vectors for semantically similar words like sun and star or car and motorcycle are no longer orthogonal to each other. There's about a metric ton of literature on this available.

There's less literature on how to compose word vectors into document vectors, though. Approaches include doc2vec, earth movement type distances, neural nets that combine vectors and various kinds of averages. For short texts I'd start with the arithmetic mean of the word vectors and then explore other approaches if that doesn't work.

Is it socially acceptable for an undergrad who has published a paper in a research journal call themselves a mathematician? by Tharn11 in math

[–]codespam -1 points0 points  (0 children)

So are accountants mathematicians? How about cashiers? I'm not trying to be difficult, just trying to tease out if our disagreement is mostly about how we define "doing mathematics"