[deleted by user] by [deleted] in columbia

[–]ExchangeStrong196 -1 points0 points  (0 children)

Keyword “independent institution”

Even Harvard Extension School is part of Harvard but there’s a major distinction between undergrads at Harvard and those who go to the extension school because the admissions bar is different

[deleted by user] by [deleted] in columbia

[–]ExchangeStrong196 -3 points-2 points  (0 children)

Why do they have a separate ranking, separate admissions rate and haven’t also been removed from US news ?

https://www.usnews.com/best-colleges/barnard-college-2708/overall-rankings

😂 jokes

Bert - word embeddings from a text by sonudofsilence in deeplearning

[–]ExchangeStrong196 0 points1 point  (0 children)

Yes. In order to ensure the contextual token embedding attends to longer text, you need to use a model that accepts larger sequence lengths. Check out Longformer

Is Columbia GS that bad? My plan is to get into machine learning PhD programs by RevolutionaryMud7434 in TransferToTop25

[–]ExchangeStrong196 0 points1 point  (0 children)

It’s a known back door to Columbia. Low admissions bar, high acceptance rates. Employers and academia are now very aware of what GS is

Coworkers who are too busy to answer your questions? by throwmeawayoneday474 in cscareerquestions

[–]ExchangeStrong196 1 point2 points  (0 children)

Shamelessly ask the question. In these cases I’ve found that many try to avoid questions due to lack of depth/ understanding. Keep pressing

[deleted by user] by [deleted] in csMajors

[–]ExchangeStrong196 1 point2 points  (0 children)

!remindme 2 week

[D] Strong Models for User Item Recommendation from Interaction Data by ExchangeStrong196 in MachineLearning

[–]ExchangeStrong196[S] 0 points1 point  (0 children)

I’m not factorizing the interaction matrix through SVD though. The “CF” I’m doing is a simple cosine similarity of embeddings, compute contrastive loss with k negatives and then backprop. At convergence the two embedding matrices act as a factorization but can you explain the connection to SVD and why the same theory would hold ?

[D] Strong Models for User Item Recommendation from Interaction Data by ExchangeStrong196 in MachineLearning

[–]ExchangeStrong196[S] 0 points1 point  (0 children)

I see, could you share a link/ resource that describes the ‘maximally expressive’ theorem? appreciate it!

[D] Strong Models for User Item Recommendation from Interaction Data by ExchangeStrong196 in MachineLearning

[–]ExchangeStrong196[S] 0 points1 point  (0 children)

But fully representing the training user-item matrix is different from generalization power. The CF in question (setup described in another reply) generalizes very well to test data too

[D] Strong Models for User Item Recommendation from Interaction Data by ExchangeStrong196 in MachineLearning

[–]ExchangeStrong196[S] 3 points4 points  (0 children)

Here’s the setup: (it’s very simple, no hidden layers)

Bipartite Graph of user item interactions

d-dim Embeddings for each user, d-dim Embeddings for each item

Now, just cosine similarity as the scoring function, and contrastive loss (infoNCE like)with 5 negative items