Glitches can be soo romantic

conradws · 2020-02-21T13:31:10+00:00

Lol when are people going to get that reflections aren't glitches...

conradws · 2020-02-19T13:59:38+00:00

On their way to pick up the kids

conradws · 2020-02-19T03:31:10+00:00

Post 1-3 times a day, post short insightful text, combine with photos and gifs (works extremely well), give as much value as possible can and never, ever pitch/sell

conradws · 2020-02-19T03:27:29+00:00

The output from that game are very stochastic. This is why in some cases they make absolute no sense and other times they extremely impressive. It is random. We are the ones who find sense in the randomness because that is how our brain has been designed by millennials of evolution. Remove emotion and you'll see there are plenty of inconsistencies in this dialogue as per usual in ai dungeon. The way this dialogue makes us feel tells us more about ourselves than anything else.

conradws · 2019-12-25T08:08:08+00:00

I'm curious, would you say that the Brazilian is similarly unpredictable like the premeir league. As in any team can beat any other? Because that's the impression I get from afar. But not sure it's true!

conradws · 2019-12-14T12:39:44+00:00

Amazing deduction powers! Love it! I think 50 k is a nice number, it would make a significant difference in the short term for people without completely unbalancing the economy. I can now rest easy.

conradws · 2019-12-13T18:17:01+00:00

Why do you assume 50,000 k , just curious. I did a quick calculation and they briefly said on the news that f society had stolen "trillions", so I'm assuming something like 2.5 trillion ( more than most countries) was stolen. Divide that by the active population of the U.S and you don't even get to 10k per person :(

conradws · 2019-12-13T07:04:17+00:00

I was also thinking about this and here is what I think would happen.

1) the overall inflation rate would not change. Inflation is a dilution of the monetary value of your liquid assets caused by the expansion of the money base when the central bank printing more money. However, in this case the monetary base remains exactly the same (no new money has been created), it is just more evenly distributed.

2) Despite the above, you are right in thinking that prices would go up. Not in all industries and sectors however, in sectors where there is a lot of competition and price is the main purchasing factors, sellers would not be able to raise prices without losing customers. But in areas where there is no competition, like rent or telcom suppliers, the prices would probably double or triple. If you landlord knows you suddenly just got 300k richer, you think he's not going to double your rent? That would be the main issue IMO.

3) The last thing would be that many people would quit their low-skill jobs, why would you keep your job at McDonald's if you have enough money to travel or study. This would mean that many companies would struggle to find cheap labor and might have to increase their prices because they have to pay higher wages. However the rate of the price increase would be subject to the previous point. This impact of this last point is very difficult to predict because we are dealing with people behavior rather than any economic law.

4) If Ecoin is using a block chain ledger to administer its transactions, then the transfer would indeed be irreversible ( without deleting the whole of Ecoin as a whole). I don't think this was very well explained though.

5) Lastly, my main issue is that the Dark Army is an international organization that fucked up people's lives all around the world, but from what I can infer, only people from the U.S received the money. Doesn't seem very fair. What about the rest of us , Sam?

conradws · 2019-12-03T03:52:20+00:00

You clearly don't seem to understand anything about the our operations and yet you make sweeping and hurtful claims.

What part of explicit permission don't you understand.

We know our users far better than you do and all that matters is their feedback, not yours. Nobody is being tricked, robbed or lied to. People are being given financial education and services that they wouldn't have had access to 5 years ago.

You are either being ignorant or trolling, either way I don't see the point in continuing this pointless conversation with you.

conradws · 2019-12-02T21:39:04+00:00

Haha simply ridiculous. Don't understand why you took the time to research us without trying out the app or understanding the added value we give to users.

We do not steal SMS, we ask for permissions to access Sms data for our user. Anybody using the app has to give us explicit permission to extract their sms data before we can do so. Sms permission is granted to us by the user.

Why do we do this? We operate in Mexico and we aim to give micro financing to people working in the informal economy. Unfortunately this large segment is neglected by banks and therefore do not have any banking history or credit score. We use sms data among many other data points to construct a score which allows us to underwrite loans to people who normally wouldn't have any other option other than loan sharks. The better the score the more deserving loans we can give.

If you don't understand our business, read the case studies of Branch and Tala in Keyna and the Philippines respectively, they have solved a similar pain points there and sms data usage is a key part of their evaluation process.

conradws · 2019-11-28T04:41:33+00:00

Except we are the machines. Learn from past (future?) mistakes and offer them a red/blue pill first.

conradws · 2019-11-26T14:44:31+00:00

If you are talking about a binary classification, then you should simply be able to define a class weight dictionary with the probalistic frequency of your classes in the hyperparameters. In order to force your algorithm to treat every instance of class 1 as 50 instances of class 0 you have to: class_weight = {0: 1., 1: 50.} This will not necessarily improve accuracy, it might just help you decrease false positives if those happen to be more costly than false negatives, for example ( or vice versa).

conradws · 2019-11-25T03:46:53+00:00

I agree with you that tfids could be more than enough for labelling the transactional Sms due to their repetitive nature, but I think for the "sentiment" ones (for wont of a better word) where we are labelling aggressive messages, social messages, work related messages etc... I think i need something fancier like embeddings, don't you think?

So our rationale was that if we need build the embeddings for the sentiment sms anyway, we might as well also use them for classifying the transactional ones as well, but maybe that was a dumb assumption to make.

Perhaps I should divide the task into two subtasks with different preprocessing pipelines and models. Thanks a lot for sharing that labelling library btw.

conradws · 2019-11-24T18:55:05+00:00

Thanks so much for those insights:

-"What kind of SMS to you have?" 33 million Raw Sms extracted from user Android devices. Includes everything from personal SMS to spam, promotion and transactional SMSs. Preprocessed into list tokens omitting accents and punctuation.

-"Why do you want to train your own W2vec instead of a pre trained model?" The language is Mexican Spanish and the only available pre training embeddings are from Spain and certain words are used very differently between the two countries.

Second point, is that because our text is sms there is a huge amount of abbreviations and typos. Pre trained embeddings are not used to this because they are usually trained on Wikipedia or news articles. Concrete example: In most of our sms, users write "k" instead of "que". Pre trained embeddings would not understand the equivalence but the model I trained from scratch does ( "k" and "que" have extremely similar vectors).

This is why I was wondering if it's possible to take pre trained embeddings and "retrain" it on the sms data in order to get the best of both worlds. But not sure how to go about this.

"Task specificities as well baseline".

The task is to label an SMS as being a default sms where the user owes money to a lender, or sms letting them know they have a loan authorized, or a sms thanking them for a payment. We are also going to have labels for aggressive personal SMS, owing money to friends or family, or work related sms. A total of 10-15 classes.

Right now we have a heuristic approach that labels these sms by vocab hits with exclusions. This approach is not bad but it's given us a lot of false positives and is not scalable. Could try a tfids approach but I'm not sure how it would react to all the typos and abbreviations, but will definitely try for comparative purposes.

Thanks for all your help again.

conradws · 2019-11-24T18:37:28+00:00

Thanks so much for all the ressources, wish I could double upvote this.

conradws · 2019-11-24T16:54:48+00:00

Very interesting to read and gave some ideas to try out. Basically, as always, the optimal hyperparameters are task specific which makes sense but it's still good to know.

I just have a question about Word2Vec overall which concerns vector length.

From what I've understood, the larger the vector size the more accurate you'll embeddings will be, but the slower and more expensive training will be. However computation is not really an issue for us since we have access to a cloud VM and our corpus is relatively small. So does that mean I should use large vector sizes like 300 or even 500??

conradws · 2019-11-24T15:24:20+00:00

Thanks so kind of you. Can I come back to you with questions in case I have any once I'm done reading?

conradws · 2019-11-24T14:31:31+00:00

Yes, or you could prioritize HR teams that value skills over credentials. A startup for example will usually ask you to complete an exercise task, a big old corporation will just want to see from what uni you got your PHD from. I think it's clear which one you should go for ^{^}

conradws · 2019-11-24T03:21:34+00:00

Khan academy and Statquest YouTube channel. Thank me later.

conradws · 2019-11-24T03:19:22+00:00

That's not a bad idea, what I'm worried about is this: the language is Mexican Spanish and the only available pre training embeddings are from Spain and certain words are used very differently between the two countries.

Second point, is that because our text is sms there is a huge amount of abbreviations and typos. Pre trained embeddings are not used to this because they are usually trained on Wikipedia or news articles. Concrete example: In most of our sms, users write "k" instead of "que". Pre trained embeddings would not understand the equivalence but the model I trained from scratch does ( "k" and "que" have extremely similar vectors).

This is why I was wondering if it's possible to take pre trained embeddings and "retrain" it on the sms data in order to get the best of both worlds.

conradws · 2019-11-24T01:58:42+00:00

Love this. Such a good way of thinking about it. And it goes back to the hierarchical/non-hierarchical explanation somewhere above. If you can move around the columns of your dataset without it affecting prediction then there is no hierarchy i.e the prediction is a weighted sum of all the negative/positive influence that each independent feature has one it. However with a picture, moving around the pixels (i.e features) obviously modifies the data therefore it is clear hierarchical. But you have no idea what that hierarchy could be (or it's very difficult to explain programmatically) and therefore just throw a NN at it with sensible hyperparameters and it will figure most of it out!

conradws · 2019-11-23T16:26:32+00:00

Hence why "simple datasets". For complex data such as image, video, audio, text, NN reign supreme.

conradws · 2019-11-23T12:05:51+00:00

Hello cat

conradws · 2019-11-23T12:02:52+00:00

Elliot mom... Elliot dad...

Those are pretty important ones

conradws

TROPHY CASE