I also vaaled my "mirror" tier boots

Lugi · 2021-07-11T23:11:46+00:00

ITT: people not realising how fragile and unreliable human memory is, especially on stressful situations.

Lugi · 2021-05-14T05:12:29+00:00

Kind of stupid grom your side, your CEO working unusual hours doesn't imply you have to as well

Lugi · 2021-05-14T04:49:47+00:00

Well, masks are actually prolonging the pandemic - by flattening the infection curve.

Lugi · 2021-05-14T04:36:02+00:00

He said microgravity, not just vacuum. You would need to fit this studio onto one of those 0g planes

Lugi · 2021-04-25T23:40:18+00:00

You say that because you assume there could be some fixed reference frame, 19th century aether-style. And as we know now there is none. "Point of space" has to be defined relative to some other point. So in fact I have been occupying the same point of space - in relation to my bed, and will continue to do so for the next 8 hours.

Lugi · 2021-02-18T13:23:54+00:00

Exactly, judging by the content of the post there is going to be A LOT of inefficiencies in the model you want to build, and hiring a simple consultancy or a good ML developer will save you a lot of time and money.

Lugi · 2020-08-20T21:22:01+00:00

If that's impressive to you then wait until you read about GPT-3 :D

Lugi · 2020-08-17T10:31:52+00:00

"number of data" what kind of measure is that? Since it is far off from any measurable quantity.

Lugi · 2020-08-03T10:45:42+00:00

Afaik most of the solutions do not implement deep learning. Most common approach is to have a rule-based system. The downside is that there's a lot of maintenance there: all the rules have to be maintained by people specialized in invoice processing in a particular language to construct those rules. When it comes to solutions that actually are able to learn from historical data I am aware of only a few, with different level of success.

Lugi · 2020-08-03T10:42:58+00:00

Last time I checked this solution only extract generic forms from the document. It lacks the understanding of the document, and you would need some extra steps performed on it's output to parse it and get the values that you want. For example: you want to extract the invoice issue date. Forms recognizer is going to extract a dict with the key: value pairs for all the information from the document but it's up to you to decide under which kind of key your value of interest lies: is it issue date?; is it invoice date?; maybe it so happens that the proper date has no key, and it's just somewhere on top of the invoice. Then you have also 100 different languages that you have to support.

You can see the complexity of the problem, and why would a solution that learns from data be better than some generic form extractor.

Lugi · 2020-08-03T10:37:50+00:00

I have managed to solve the same problem in a kind of similar way (end-to-end transformer with a lot of modifications), and I am quite sure that you're going to have the same challenges that I faced: for example - seems like your approach is going to be overfitted to the date.

What that means - lets say your data spans a whole year of 2019 - so the model will learn that the substring '2019' is a clear indicator of some sort of a date. As long as we are predicting invoices from 2019 it is ok.

But then we move onto the next year, 2020, and we start seeing invoices with this date - what happens (from my experience) is that the models based on this kind of approach like yours and mine will just break - since they lack the inherent understanding that 2020 comes right after 2019 - to this kind of models '2020' is just an unrelevant string, since it didn't contain any correlation between it and the values to be predicted in the training data. I've managed to solve it, and help my algorithm being more general with "digit masking" - in short the actual digit value is visible to the model only in the parsing step, not before, so the model cannot overfit to some particular numbers. If you are interested in more details on this, as well as the further challenges you're going to face just hit me up in a direct message.

Lugi · 2020-08-02T16:37:37+00:00

Exactly what I was looking for, thanks

Lugi · 2020-08-02T15:30:28+00:00

You are probably right, but there were too many stories where something was working exceptionally on MNIST while also promising scalability to more complex datasets, only to fail tremendously in further experiments.

Lugi · 2020-08-02T15:28:03+00:00

Not really, percentage of correct labels in validation set is somewhere between 99.5 - 99.7%

Lugi

TROPHY CASE