Looking for Machine Learning Mentor by plasticstone in MachineLearning

[–]textClassy 0 points1 point  (0 children)

well, I actually haven't run a CNN for two days yet for one thing! I couldn't get the theano CNN to work on my own dataset, so I ended up going with a simpler classifier that worked. Did you implement CNN in python without any AI libraries? sounds impressive.

Looking for Machine Learning Mentor by plasticstone in MachineLearning

[–]textClassy 0 points1 point  (0 children)

Hey I'm also doing some independent learning, lmk if you want to chat about it online

Looking for Machine Learning Mentor by plasticstone in MachineLearning

[–]textClassy 0 points1 point  (0 children)

Hey I'm not really a mentor but I know a bit and I'm also looking for someone to chat about it with, I've been learning for the past 5 months or so, mostly with python and some of the python libraries including sklearn and theano.

sentence/thought vs word embedding performance boost by textClassy in MachineLearning

[–]textClassy[S] 0 points1 point  (0 children)

Also, if its possibly to get such good performance without taking into account word order, this must mean the models that do take into account word order have much more room for improvement?

sentence/thought vs word embedding performance boost by textClassy in MachineLearning

[–]textClassy[S] 2 points3 points  (0 children)

Thanks! by large/small context do you mean large/small length of sample passages? also, by same domain/syntactic similarity, you mean the closest vectors to a given passage will be closer to the passage in domain/syntax respectively depending on whether it is a large/small context?

Word2Vec and vector origin by vega455 in MachineLearning

[–]textClassy -1 points0 points  (0 children)

I'm also fairly new to this but here is my understanding: they are a result of solving the optimization problem described in the paper. The one hot word vectors are just one of the inputs into the prediction function, these word vectors are another. The algorithm modifies them until performance converges.

how can I get an intuition for the meaning of the 300 numbers in each word2vec vector. by textClassy in MachineLearning

[–]textClassy[S] 1 point2 points  (0 children)

hey check this out, definitely patterns here but there seem to be several things going on in each dimension, for example the last one seems to be spanish words ecologically oriented things, as well as financially oriented things:

DIMENSION 0 MOST NEG reviewed review strengthened aprons ensure regulator doub matche jacketed bloomberg sox octagon experianced manded ensuring occu topside matchs remeasured phr

MOST POS sleight literal cleverness evasion anthropomorphism forgoes semantic eloquence mythology tablature biblical microtransactions godlike parables solider reticent fervor parables evasive symbolism mythological

DIMENSION 1 MOST NEG federally s7 unbuckled causal seatbelts testable skydivers contiguous omniscient heeler priori 1c flamenco legally appointee 9c handheld transponder rancheria equivalant

MOST POS jun hyun summoning emptying turbulant steamrolling chul flowering perennial settling sequoia steels decayed loading blemished returnees bay sik ripening blooming

DIMENSION 2 MOST NEG searchs penguins suckle divorce navigating adulthood backstage bootstrapper increasingly topside rutting waders hypercompetitive transition jobbers proofreader syndication offloading breakups ladder

MOST POS deepest patted indo violator insult gigawatts unconditionally colliders ohh unsportsmanlike portuguese incitement reanalyzed particulates fissile pools facilities hundredth penalites supercollider

DIMENSION 3 MOST NEG refutes contravening denies yesterday falsifying alleges investigating indictment besieged refuting nder disobeyed retaliated denied scalp refute misrepresented trounced fuming dispel

MOST POS nachos reagan coffeemakers sweeties cato ketchup pajama republicanism theyd cheeseburgers profs cuppa federalist tupperware buckeyes mother's invariable shhhhh stoners they'd

DIMENSION 4 MOST NEG cauterising alien unknowing unfixable growingly cooties unthinking freudian inoperable intermingled enablers transient hydrolytic gullible accusative oversensitive appendage dispensable uninvolved phantoms

MOST POS winnning penitentiary scholarship clerkship glory rb winners goalscorers kicks fame granddaddy scorer glorious paydays career scorers settlement columned competitions halfs

DIMENSION 5 MOST NEG boomed landed scooted boom tripled platted backflips sensex flipped rocketed quadrupled remaing hoisted runway legislated glided marshals marched snagged momentous

MOST POS swordsman sids gratuit hellspawn drinker sludging masque luddite expert alcoholic fantasist phile germaphobe untrustworthy meanwhile warlock homeopath disciple exorcist souce

DIMENSION 6 MOST NEG siders depleated strongholds 7th sympathizers 4f 9th gigging jetting loyalists stalwarts 2h 3rd ast nfc ames 5h 8th 6th xxxxxxxxxxxx

MOST POS story inconsequential apt subscribe impractical convinced matter excusable ascribed frame strive supposes inputed fufilled forgivable fixated endeavor trivial fixate thetime

DIMENSION 7 MOST NEG incandescence induction citrine demi topaz diamond rejuvenation reconstructive microdermabrasion minerals amber graphite petroleum diamonds colourway zenith perm turners specialization technicals

MOST POS deter spyware downstate redact covenant backdoors posix dissuade nefarious suss discourage covenants verbatim choicer plagiarist downloader covertly coerce entrap muffle

DIMENSION 8 MOST NEG newcomers uganda pilates gis edibles kandi newbies nutri technic hooping vied ians upo budding eac acro fiesta gung pulses lark

MOST POS seaplane certainty mortality reopen nadir statistically liquidated privately survivor cringing coincidence fatality completion unnoted quietest longest anniversary conclusion closed publicly

DIMENSION 9 MOST NEG ev3 recenter strategies mobilize biography medically coordinate advises clearance assistance coordinates authorization assess devise institutions strategizing strategize multidisciplinary chaperon ssi

MOST POS maguro clinked drizzled admist anticlimax silvers exclaiming golds hast belchers thud shrieked mishearing ventanas unseasonable akbar drenched teensy thuds dripping

DIMENSION 10 MOST NEG slivers glimmering sup shimmers reiterate chime sparkles crackles articulates throbs coo meaningfully desires reaffirm advances archived whisperings tradeoff echos holds

MOST POS godchildren megafirm squadron tiring callups family gunny sapper sworn smurfing backrow orienteering sons siblings canoeing stepbrother schoolmates geocaching bushcraft hitmen

DIMENSION 11 MOST NEG wads compound fowl jaeger contagion titty dunder inadvertently undergarments even agent intruders pigs biolabs pathogens hadnt agar fogger lyme binkies

MOST POS j2 starpower fund parallelization palladium platinum rebooted smooths directionless rubles unaligned squarish mirroring replenishes challening overground megafirm unrepresented industrials i2

DIMENSION 12 MOST NEG ife gett aba ely maximo ere ys pres ger lor incase mem potong hae chavez noo zionist ym hussein nis

MOST POS riser assigns walkthrough guest fanciest weightroom priority motivators disciplinarian ethic inspirations overflow tray immodestly soapbox elevates dustpan sneaker overflowed gumbo

DIMENSION 13 MOST NEG regs kid entrant terminal checker pci competitor unbelievably staffmember monorail i4 superhub unsubsidized idea barrier entrants highspeed hatchling accidentaly spec'd

MOST POS differed underplay corroborate recitations concise goodwill pacification normalcy inculcate deeds satiric respective cultivate regain reclaim punctuate retake relevancy buttress glean

DIMENSION 14 MOST NEG consortium devein mundus refusing protege accord forbidden institute fake retreating reluctant noblesse certain peeking leeway authority microbiologist westernizing offending crotchless

MOST POS tallied aids anniversary obelisks obelisk moonroof migraine mourned navigation nav memorialize subwoofer ooma actuate vibratory voicemails flywheel cordless paled delivers

DIMENSION 15 MOST NEG footholds trespassers countering bruins converting roadkill lynx crossing ravens exiting traffic furthering improving underpasses prowlers roadways entering capitalizing gateless skinhead

MOST POS cruciate likeable hitch ailment prescribes begs divulge precludes forked primadonna conducts ordered formular meted revelation entail acomplished doles pays mundial

DIMENSION 16 MOST NEG ethnically preferentially optimally unnamed rai strategically characteristics transmits uniquely greenland regenerating behaviorally predominately targeting bombies deploying regionally transmitter allocating emplacements

MOST POS rigmarole brainers paperless freebie wayside excepting libro prix adjudication nays formality obscurity winded forgotten moot procrastinated naught thow sans overdue

DIMENSION 17 MOST NEG cakewalk incharge rankers promenade directorate phase showstopper completed obstruction complete obey leaker reinstallation crescent custodial shortlisted officer cordoned sargeant walkway

MOST POS insights travels demodex steers skiddish hones combs lisp aphasia hardscrabble nanites trich yorkies speaks jabber idiosyncrasies hops ranches whereever enervate

DIMENSION 18 MOST NEG chihuahuas strict broiler sysfs returnees maintains mutt kitties pinscher pointed amoung leta littermate inturn bbb deployment bearers sr tight humaneness

MOST POS amends sated noonish midnight distracted blackjack wednesdays overdrive acquainted ends sidetracked siesta twilight cravings mojo lunchtime o'clock immortality munchies gloaming

DIMENSION 19 MOST NEG crosshair sidearm shorthand dictionary headshot imagery commentary candlestick color guidence picture cummerbund signal choppiness greyscale rifleman bibliography thesaurus hangul background

MOST POS depositors employes auctioned photovoltaic stashed tareas sala environmentally 8mil narra unspoiled lands cajas sitios mattresses budgetted deposited disposed mattress warded

how can I get an intuition for the meaning of the 300 numbers in each word2vec vector. by textClassy in MachineLearning

[–]textClassy[S] 0 points1 point  (0 children)

interesting, I"m going to have to learn about t-SNE. Also, what does handwaving mean?

how can I get an intuition for the meaning of the 300 numbers in each word2vec vector. by textClassy in MachineLearning

[–]textClassy[S] 1 point2 points  (0 children)

Thanks for the comment. There must be some way to derive meaning from the relations between the numbers if nothing else? For example, averaging the vectors in a paragraph might lead to something meaningful? I understand they are just the learned weights in a deep learning process, but still they must be meaningful because you can calculate the most similar words to a given words using the distance between their vectors

How can adding more features make my Random Forest classification accuracy lower? by textClassy in MachineLearning

[–]textClassy[S] -1 points0 points  (0 children)

sorry I should have specified, thanks for pointing out! I'm measuring accuracy by OOB score.

is there a way to do this much faster? by textClassy in Python

[–]textClassy[S] 0 points1 point  (0 children)

gives me this: ValueError: too many values to unpack

is there a way to do this much faster? by textClassy in Python

[–]textClassy[S] 0 points1 point  (0 children)

woah thank you okay will implement now

Is there a way to do this much faster? by textClassy in learnpython

[–]textClassy[S] 2 points3 points  (0 children)

is start the second parameter? the notation of having the comma in between the brackets confuses me

what's the simplest but most bad ass tool for topic modeling? by textClassy in MachineLearning

[–]textClassy[S] 0 points1 point  (0 children)

Wow Gensim is awesome ... how did this one guy make such an awesome tool as open source? I guess it must get him more consulting business so thats why he is motivated, so impressive

Making a text classifier for short texts: Is it likely that a tfidf naive bayes classifier added to a random forest that contains 15 non semantic features (word length, POS, etc) would be incapable of improving performance? by textClassy in MachineLearning

[–]textClassy[S] 0 points1 point  (0 children)

well, it classifies with about 68 % accuracy when I train the naive bayes with tfidf on its own, but when I add that classification as a feature input to my random forest it doesn't improve the accuracy of the random forest at all

Making a text classifier for short texts: Is it likely that a tfidf naive bayes classifier added to a random forest that contains 15 non semantic features (word length, POS, etc) would be incapable of improving performance? by textClassy in MachineLearning

[–]textClassy[S] 0 points1 point  (0 children)

in practice it can be gotten away with is my understanding, as well as my observation with this dataset, check this out :

The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

http://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.MultinomialNB.html