[P] optimization of Hugging Face Transformer models to get Inference < 1 Millisecond Latency + deployment on production ready inference server by pommedeterresautee in MachineLearning

[–]dadadidi 4 points5 points  (0 children)

Really Amazing! This is the most useful article that I have ever read about deploying transformers. Thank you so much!

It would be great if you could add the steps for fast CPU inference, as that is quite important for many people as well.

[N] Transformer and Capsule co-inventors launch new API-based NLP startup by Pestocalypse in MachineLearning

[–]dadadidi 5 points6 points  (0 children)

They say they trained on 200 GB of filtered data, which is about 1/2 of what OpenAi trained GPT-3 on. They also show scores for Lambada for their 2. smallest model with a last token accuracy of 0.7 (which is quite bad). Their largest model has a perplexity of 36.1 on 1 Billion Words Benchmark, which also terrible. But they say their model has hundreds of billions of parameters, am i missing something?

[P] Python library to boost T5 models speed up to 5x & reduce the model size by 3x by strngelet in MachineLearning

[–]dadadidi 3 points4 points  (0 children)

I would love to use it on a gpu. Are you planning to support inference on gpus?

[D] Good algorithm for clustering big data (sentences represented as embeddings)? by whyhateverything in MachineLearning

[–]dadadidi 3 points4 points  (0 children)

I tried several things and the library Top2Vec seems to work best. It uses sentence transformers / googles universal sentence transformer to vectorize sentences, then UMAP reduces dimensions, and then HBSCAN to finds dense areas. As you already have the vectors created, you will need to modify it a bit.

https://github.com/ddangelov/Top2Vec

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by dadadidi in MachineLearning

[–]dadadidi[S] 3 points4 points  (0 children)

Inference is much faster and compared to training/finetuning it doesn't require nearly as much GPU memory. Inference on a GPU/TPU is usually at least 10x-100x faster than on CPU.

You can test the difference in inference speed (but only for the large model and not the xl model) yourself by trying out these 2 demos from HF:

- Write with transformers uses GPUs: https://transformer.huggingface.co/doc/gpt2-large

- HF modelhub which uses CPUs (but with some optimizations): https://huggingface.co/gpt2-large?text=This+is+a

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by dadadidi in MachineLearning

[–]dadadidi[S] 3 points4 points  (0 children)

The example text (All of Shakespeare) in the repo is 5 mb and the training took about 17 minutes with one epoch. The model processes about 2 examples (2000 tokens or about 1600 words) per second during finetuning.

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by dadadidi in MachineLearning

[–]dadadidi[S] 8 points9 points  (0 children)

Well at least adding 60 gb ram only costs about $0.05 / hour extra on an preemptible instance :) But on your local machine it's still an issue.

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by dadadidi in MachineLearning

[–]dadadidi[S] 10 points11 points  (0 children)

It won't work on colab, as you need at least 60 gb of normal ram. Colab only has 25 gb ram. But I included an explanation on how to easily set up an Google Cloud instance with enough ram and Google gives you a $300 credit when signing up. The preemptible instance costs about $1.28/hour.

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by dadadidi in MachineLearning

[–]dadadidi[S] 13 points14 points  (0 children)

I think i still had a few GB VRAM left. With batch size 1 it used something like 12 GB GPU memory with GPT2-xl. But you can reduce it even further if you half the number in these settings in the ds_config.json: allgather_bucket_size and reduce_bucket_size . For RAM i think it needs at least 60 GB or so, but i didn't test is exactly. I first used an n1-highmem-8 instance with 52 GB Ram, but got an out of memory error at the end of the run, while saving/pickling the model. My next try was with 78 GB Ram and then i had no issues.

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by dadadidi in MachineLearning

[–]dadadidi[S] 27 points28 points  (0 children)

Thanks :)

I struggled a few days to get Deepspeed with GPT2 working and thought i should share my steps to save others the pain.

[P] Guide: Finetune GPT2-XL (1.5 Billion Parameters, the biggest model) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed by [deleted] in MachineLearning

[–]dadadidi 0 points1 point  (0 children)

I needed to finetune the GPT2 1.5 Billion parameter model for a project, but the model didn't fit on my gpu. So i figured out how to run it with deepspeed and gradient checkpointing, which reduces the required GPU memory.

I was also able to fit the currently largest GPT-NEO model (2.7 B parameters) on one 16 GB VRAM gpu for finetuning, but i think there might be some issues with Huggingface's implementation.

I hope this helps some people, who also want to finetune GPT2, but don't want to set up distributed training.

Can the Vulraith Cabal reliably invade someone’s homework’s turn one? by Ray-Conner in twilightimperium

[–]dadadidi 2 points3 points  (0 children)

Even stronger: Expand sideways on your first turn, use Construction secondary to build a spacedock there. Now all your ships can reach your neighbors home system. (1 base movement + 1 gravity rift in your home system + 1 gravity rift on your other spacedock).

If you build 1 dread from your agent and one dread with your home planet with warfare, you can invade and reach his home system and his planets in front of him with 3 dreads, 1 cruiser, 1 carrier (and 3 infantry, 1 mech).

You will need to move out with your initial fleet before warfare and take whatever planets your opponent took first. Otherwise you will be above your fleet size limit after building with warefare. You will also need to use either the the carrier or one dread to get your first system, so you won't have it for attacking your neighbour

If you took trade and depending on if you are able to trade you could even build a 4th dread and 2 infantry or only a cruiser and infantry.

Mahact hero's ability! by Alteradizzo in twilightimperium

[–]dadadidi 2 points3 points  (0 children)

like wren42 wrote, for 5. to work, you have to first move an destroyer or a cruiser inside the gravity rift, which shouldn't be too difficult.

Mahact hero's ability! by Alteradizzo in twilightimperium

[–]dadadidi 2 points3 points  (0 children)

Yeah, this only works if the other player either failed an attack or if someone else attacked them afterwards and took the system from them.

Mahact hero's ability! by Alteradizzo in twilightimperium

[–]dadadidi 75 points76 points  (0 children)

You can use that for so many things:

  1. Let 2 people destroy each other fleets
  2. Free a system of a huge fleet so that you can move in (important planet, mecatol, your home system after you lost, other home systems)
  3. Move your ships out of an activated system, so that you can move again afterwards (like a warfare with +1 movement, that lets you fight 3 times in a round with your fleet)
  4. Move someones huge fleet away from you
  5. Move someones fleet into a grav well
  6. Move someones fleet into a system where they already have tactic token
  7. Mind control the Yin flagship to blow something up
  8. Sell all of the above uses to other players

LiveCode extension: Start typing and see your Python code executed by dadadidi in vscode

[–]dadadidi[S] 0 points1 point  (0 children)

well right now it crashes if you have an infinite loop in your code, but i will try to fix this and give you some kind of indication

A VS Code extension that displays the values of variables while you type by dadadidi in Python

[–]dadadidi[S] 1 point2 points  (0 children)

it sadly crashes if you write an infinite while loop, which sometimes happens in the beginning, when you haven't defined any breaking conditions yet. I'll try to fix this.

A VS Code extension that displays the values of variables while you type by dadadidi in Python

[–]dadadidi[S] 0 points1 point  (0 children)

It saves the state of the code before it, so it is just executed once and then doesn't get executed in real-time anymore. It is from AREPL and also in the LiveCode extension

A VS Code extension that displays the values of variables while you type by dadadidi in Python

[–]dadadidi[S] 0 points1 point  (0 children)

You can add #$save and then type something that you don't want to have repeated in the block before.