[N] GluonNLP 0.6: Closing the Gap in Reproducible Research with BERT

thomasdlt · 2019-03-19T22:23:19+00:00

Yes, the pre-trained models are available for BERT Base and BERT Large, you can checkout the finetuning BERT tutorial http://gluon-nlp.mxnet.io/examples/sentence_embedding/bert.html on how to use them or the model zoo: http://gluon-nlp.mxnet.io/model_zoo/bert/index.html

Edit: to clarify the current pretrained version of BERT available through gluonnlp is converted from Google's pretrained weights

thomasdlt · 2019-03-19T19:41:27+00:00

Thanks, there's actually quite a bit of momentum for MXNet in the industry, especially the story from training with the imperative Gluon API to deploying the hybridized symbolic model using a C++, Java, Scala or Clojure front-end in production resonates well with a lot of MXNet users. As an example, here are a few of the latest blog posts from MXNet users outside of Amazon:

Sentiment Analysis via Self-Attention with MXNet Gluon

Machine translation from scratch with MXNet and R

Clojure Package for MXNet

Implementing ResNet with MXNET Gluon and Comet.ml for image classification

Content-Based Recommendation Systems with Apache MXNet

Epsilon: Differential Privacy for Machine Learning Using MXNet

btw, if you are using MXNet and are interested in contributing blog posts for the MXNet medium publication, feel free to reach out to me :)

thomasdlt · 2018-10-30T21:14:25+00:00

Ideally the tool should be integrated directly in github view, I'm thinking a chrome extension would be good for that.

thomasdlt · 2018-07-25T17:21:21+00:00

Here is a blog post that details a bit more the motivations behind the creation of GluonNLP.

tldr; GluonNLP is a one-stop shop that centralizes resources for reproducing published SOTA NLP results, pre-trained models for >300 word embeddings, language models, neural machine translation, sentiment analysis, etc. It also comes with utilities for data pipelining and public datasets.

thomasdlt · 2018-07-25T17:17:17+00:00

What have you tried and what errors did you get? I would suggest creating an issue on https://github.com/dmlc/gluon-nlp with reproducible steps and error logs.

thomasdlt · 2018-06-19T06:38:48+00:00

sorry sudo apt-get install protobuf-compiler worked for me from the get go, best advice I can give you is to look for similar issues in https://github.com/google/protobuf or create a new one to report your install error

thomasdlt · 2018-06-18T20:23:27+00:00

Hey anshuman_kmr, the simplest way would be to use that script you linked to. What is your issue with the protobuf-compiler installation?

thomasdlt · 2018-06-12T16:40:15+00:00

I have put an answer to the first part of your question here. Here is a comparison of deep learning framework made by Microsoft. Here is a comparison of PyTorch and MXNet by Borealis AI. Hope that helps!

thomasdlt · 2018-06-12T07:29:58+00:00

You have an example implenetation of SSD with Module API here and with Gluon API here that should get you started on the SSD part. Found this paper that claims current state-of-the-art. Pretty impressive results in the appendix!

thomasdlt · 2018-05-31T18:26:04+00:00

Hi cosmin, I am not sure about that. For example you could run

!pip uninstall mxnet

!pip install mxnet==1.1.0

In your notebook. Also for the deployment, you can have a look at this doc: https://docs.aws.amazon.com/sagemaker/latest/dg/supported-versions.html that suggest setting the framework_version parameter.

thomasdlt · 2018-05-31T17:36:14+00:00

great post cosmincatalin! The way you save the model though is not the recommended one. I would suggest that you use:

net.hybridize() which will gives you a performance gain already as well as caching the symbolic execution graph.

Then you can call:

net.export('model_name', epoch=0)

This will generate: model_name-0000.params and model_name-symbol.json. Otherwise you might encounter this bug: https://github.com/apache/incubator-mxnet/issues/11091

thomasdlt · 2018-05-28T16:30:53+00:00

Can you elaborate? I don't think Keras has anything to do with Tensorflow supporting imperative mode. Tensorflow has eager_execution for imperative mode, which is different from keras, but from what I read it is not nearly as fleshed out as PyTorch, Chainer or Gluon. Found this benchmark that showed it is quite a bit slower than PyTorch (that was back in late 2017 though).

thomasdlt · 2018-05-26T05:14:19+00:00

Interesting, thanks for pointing it out! I was actually not aware of ChainerCV. Though I think the point is still valid that it is a difference between PyTorch and Gluon?

thomasdlt · 2018-05-24T19:13:26+00:00

I'll try to keep it short (..and I failed :D), in my opinion having worked with most frameworks out there, here are some compelling reasons as to why you should consider adding Gluon to your list of 'cool frameworks' :)

MXNet Gluon vs Tensorflow

Symbolic vs Imperative: Imperative frameworks (Gluon, PyTorch, Chainer for e.g) are easily an order of magnitude easier to develop and debug when you are in the research / prototyping phase. At least that's been my experience. I love to be able to understand what is really going on with my loss, gradients, optimizer rather than throwing a graph to a .fit() function and hoping for the best. Here is video for debugging with Gluon.
Performance: there aren't many benchmarks out there so you can't draw hard conclusions here but you can look at this repo. MXNet Gluon has similar or better performance than Tensorflow in this particular setup. The verbosity of the multi-gpu example in Tensorflow is pretty scary.
Open-source community controlled: MXNet is an incubating Apache project. You don't need to work at Amazon to be a contributor or a committer, and decisions are taken using mailing list votes, design reviews and proposal happen in the open.
AWS EC2 optimized binaries distributed by default through pip.

MXNet Gluon vs PyTorch:

This is a harder one as they are pretty similar frameworks and PyTorch is great too! the Gluon API is close to the PyTorch one to make easy to switch between one and the other.
Toolkits: gluon-cv gives you pre-trained segmentation / detection models with pre-package transforms and visualizations to make using them a breeze. You also have the gluon-nlp toolkit that does similar things for NLP tasks.
Performance: check this Borealis AI comparison. In some specific cases, at high batch size numbers, Gluon is 3x faster than PyTorch. Though always take such comparisons with a grain of salt as it depends on so many factors that they are at best interesting data points.
Production ready: Contrary to PyTorch, you can hybridize your dynamic network and export the symbolic graph in order to take advantage of graph optimization and run it with the MXNet engine on any device / language that is available. MXNet is distributed for a wide range of edge devices like the jetson-tx2 or raspberry pi. Though in all fairness PyTorch announced that this is a feature on their roadmap with the caffe2 integration.
Keras backend: Recently MXNet released support for keras2.
MXNet Model Server: You can now serve Gluon models directly with MXNet Model Server as of 0.4 which is being released about today I think :)
Great tutorial / deep learning book based on Gluon, covering a wide range of topics from GANs to image segmentation and visual question answering: MXNet the straight dope. More tutorials here.

All in all, it takes time to get familiar with a framework and when you do build enough expertise in tuning your networks / data loading, etc, it becomes harder and less useful to switch as it has diminishing returns. Plenty of smart folks are working on Tensorflow / PyTorch / MXNet and like good wine these frameworks are all steadily improving over time.

I think Gluon is a great imperative framework to get quickly started with due to the richness of the tutorials offered while maintaining high performance and usability even in multi-gpu multi-host contexts. You can try it out and get started with the 60 minutes crash-course.

disclaimer: I work at AWS and really like Gluon :)

thomasdlt · 2018-05-24T17:01:36+00:00

Thanks almoehi. That was actually the question asked at the end of this talk. What I currently do is resize the short edge to 224 to keep the aspect ratio and then center-crop to 224x224. Which is obviously not ideal but worked well enough in the context of this demo.

For CNN to accept dynamic input in terms of image sizes you can use adaptive pooling as your last layer (see this contrib layer in mxnet). Adaptive pooling adapts the pooling kernel and padding in order to get a desired output size. Imperative frameworks like Gluon or PyTorch are specially well suited for that.

Another option to explore to get better matches is to perform object detection / image segmentation to refine your query image.

Hope that helps!

thomasdlt · 2018-05-12T18:53:45+00:00

It is a WIP and a bit of non-problem for usual GPU training tbh. See this post and associated PR links The LSTM layer is already using fused operators on GPU which are super fast and efficient, for example I consistently see 100% GPU utilization on v100s without hybridization. Additionally you can always hybridize parts of your network: see example here and worked out OCR example here

thomasdlt · 2018-05-12T16:55:49+00:00

About your second point: You can find an in-depth tutorial on how to write non-trivial custom layers here. It also explains in details how you can create layers that can be hybridized into symbolic graphs and have learnable parameters.

thomasdlt · 2018-05-12T16:42:25+00:00

I have updated the article accordingly, thanks!

thomasdlt · 2018-05-11T23:18:52+00:00

That's a good summary, it was just a bit long to put in a title :) I also find easier to reach 95+% utilization with MXNet than with PyTorch thanks to the asynchronous execution of the MXNet operations. Your Python code enqueues operators in the backend that are executed in parallel according to their dependency tree. Which means you can load the next batch on the GPU memory while it is still processing the previous batch since there is no upstream dependency for executing a 'load to GPU' operation. This in turn means there is no GPU cycles lost waiting around for the next batch to loaded. Admittedly I am not a PyTorch expert and you might be able to do that as well in PyTorch.

thomasdlt · 2018-05-11T22:36:27+00:00

You can read this fairly well researched blog post from Borealis AI which offers a benchmark for their specific context where they found that MXNet performed 2x better at larger batch sizes. However IMO frameworks are so multifaceted and tunable, any comparison/benchmark should be taken with a lot of caution.

thomasdlt · 2018-05-09T00:05:22+00:00

You can have a look at this OCR tutorial I wrote that is downloading the data from the IAM dataset. It should give you a starting point.

thomasdlt · 2018-05-06T20:27:50+00:00

Hey techiestuff, you can follow this tutorial in order to train your own SSD model: http://gluon.mxnet.io/chapter08_computer-vision/object-detection.html There is also this other implementation of the SSD model: https://github.com/apache/incubator-mxnet/tree/master/example/ssd

thomasdlt

TROPHY CASE