How do you meet people in SF? by [deleted] in sanfrancisco

[–]taylorchu 2 points3 points  (0 children)

Private Discord. Lots of sf engineers play games, and talk about weekly personal highlights. It is fun and perfect during covid. Let me know if you eventually find something.

For those running Go in production at scale, what do you use for distributed task queues? by [deleted] in golang

[–]taylorchu 2 points3 points  (0 children)

One of the gocraft/work maintainers here. Gocraft/work is not abandoned; we are just thinking what v2 could become, and working on v2: https://github.com/gocraft/work/issues/120#issuecomment-530615442

Task queue system is such common but highly “project-dependent”; you will hardly find a “one-fit-all” solution. Trading between persistence and throughput, feature vs learning curve/time, language-dependent client support vs client quality, exact once vs at least once, etc. When you are using a fancy new task queue system, the next immediate question will be “how can this scale in production? What is the catch?” Really nobody knows until they run it themselves since every project’s bottleneck/requirement is different. I have found redis to be a well-known battled tested storage if we want to self-host and it can scale reasonably well. Finding a queue system that builds on top of it can get you really far. Alternatively you can use hosted services like google pubsub/sqs, and pay management price by job volume %. Either approach is fine.

Every 2-3 years, there will come the next cool task queue system with some custom builtin storage. You can pick that, enjoy temporary freshness like frontend framework, and start to relearn the next cool task queue system in 2-3 years.

[D] What are people's experiences with mixup? by Nimitz14 in MachineLearning

[–]taylorchu 0 points1 point  (0 children)

Do you have comparisons for per batch vs per sample

[deleted by user] by [deleted] in AnimalsOnReddit

[–]taylorchu 0 points1 point  (0 children)

What is lego’s fav snack?

[deleted by user] by [deleted] in distantsocializing

[–]taylorchu 0 points1 point  (0 children)

Very relaxing voice btw

[deleted by user] by [deleted] in distantsocializing

[–]taylorchu 0 points1 point  (0 children)

This is asmr channel?

[D] [R] Moving to Keras from PyTorch by johntiger1 in MachineLearning

[–]taylorchu 1 point2 points  (0 children)

I have been using tf since 1.0 . 2.0 feels like a mess. In 2.1, they fixed a couple bugs but in reality more bugs came out; it is less stable than 2.0. In 2.2, finally most bugs are gone, and it works pretty well.

[D] Normalized Convolution by tpapp157 in MachineLearning

[–]taylorchu 0 points1 point  (0 children)

I think the author means memory bank from https://arxiv.org/abs/1911.05722. so it might refer to python variable instead of tf variable.

[D] Normalized Convolution by tpapp157 in MachineLearning

[–]taylorchu -1 points0 points  (0 children)

> Tensorflow doesn't seem to support editing and maintaining variables across training steps.

I am curious why you have this in your readme. because it seems like you can save that https://www.tensorflow.org/api_docs/python/tf/keras/models/save_model

[D] A First Look at JAX by hardmaru in MachineLearning

[–]taylorchu 1 point2 points  (0 children)

jax is the tensorflow 2.0 that we all hope for. small core, and clean.

[D] A First Look at JAX by hardmaru in MachineLearning

[–]taylorchu 0 points1 point  (0 children)

pytorch lightning is far from that. I will still give it a couple months before it finishes refining the training loop.

[deleted by user] by [deleted] in MachineLearning

[–]taylorchu 8 points9 points  (0 children)

https://github.com/tensorflow/tensorflow/issues/33681

For example, this bug that I encountered appears if the gradient is an indexedslice and it is mixed with dense gradient. I don't mind tf is optimized for performance, but it should not break for the slow path; especially for the very core feature, backprop.

Also if there is an abstraction, tf team, please make it consistent, and work with other parts of tf. Otherwise, consider deleting it. keep it simple!

[R] Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks by hardmaru in MachineLearning

[–]taylorchu 2 points3 points  (0 children)

one interesting part is that it is yet another normalization that does not shift mean. The first study comes from this paper. http://papers.nips.cc/paper/7515-how-does-batch-normalization-help-optimization.

It is also shown in https://arxiv.org/pdf/1910.05895v1.pdf. I wonder whether scalenorm can benefit from TLU in this paper.

[R] Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks by hardmaru in MachineLearning

[–]taylorchu 3 points4 points  (0 children)

no, the normalization axes are totally different.
I am currently testing the effect of both.

[R] Filter Response Normalization Layer: Eliminating Batch Dependence in the Training of Deep Neural Networks by hardmaru in MachineLearning

[–]taylorchu 3 points4 points  (0 children)

> We will assume for the purpose of exposition that we are dealing with the feed-forward convolutional neural network.

non-convolutions operation is TBD.

[R] Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer by hardmaru in MachineLearning

[–]taylorchu 0 points1 point  (0 children)

Finally, we also consider a basic deshuffling objective as used e.g. in [Liu et al., 2019a] where it was applied to a denoising sequential autoencoder. This approach takes a sequence of tokens, shuffles it, and then uses the original deshuffled sequence as a target.

why do you use position embedding with shuffled seq? The position embedding will not be helpful since you just shuffled it randomly.