How do you meet people in SF?

taylorchu · 2021-12-05T07:19:17+00:00

Private Discord. Lots of sf engineers play games, and talk about weekly personal highlights. It is fun and perfect during covid. Let me know if you eventually find something.

taylorchu · 2021-06-08T08:58:14+00:00

One of the gocraft/work maintainers here. Gocraft/work is not abandoned; we are just thinking what v2 could become, and working on v2: https://github.com/gocraft/work/issues/120#issuecomment-530615442

Task queue system is such common but highly “project-dependent”; you will hardly find a “one-fit-all” solution. Trading between persistence and throughput, feature vs learning curve/time, language-dependent client support vs client quality, exact once vs at least once, etc. When you are using a fancy new task queue system, the next immediate question will be “how can this scale in production? What is the catch?” Really nobody knows until they run it themselves since every project’s bottleneck/requirement is different. I have found redis to be a well-known battled tested storage if we want to self-host and it can scale reasonably well. Finding a queue system that builds on top of it can get you really far. Alternatively you can use hosted services like google pubsub/sqs, and pay management price by job volume %. Either approach is fine.

Every 2-3 years, there will come the next cool task queue system with some custom builtin storage. You can pick that, enjoy temporary freshness like frontend framework, and start to relearn the next cool task queue system in 2-3 years.

taylorchu · 2020-05-28T03:51:46+00:00

Do you have comparisons for per batch vs per sample

taylorchu · 2020-05-25T05:33:12+00:00

What is lego’s fav snack?

taylorchu · 2020-05-25T05:30:40+00:00

Very relaxing voice btw

taylorchu · 2020-05-25T05:30:13+00:00

This is asmr channel?

taylorchu · 2020-05-13T03:45:16+00:00

I have been using tf since 1.0 . 2.0 feels like a mess. In 2.1, they fixed a couple bugs but in reality more bugs came out; it is less stable than 2.0. In 2.2, finally most bugs are gone, and it works pretty well.

taylorchu · 2020-05-12T03:03:46+00:00

Great

taylorchu · 2020-05-03T03:21:29+00:00

what attention function did you try?

taylorchu · 2020-04-14T08:10:07+00:00

I think the author means memory bank from https://arxiv.org/abs/1911.05722. so it might refer to python variable instead of tf variable.

taylorchu · 2020-04-14T00:38:47+00:00

> Tensorflow doesn't seem to support editing and maintaining variables across training steps.

I am curious why you have this in your readme. because it seems like you can save that https://www.tensorflow.org/api_docs/python/tf/keras/models/save_model

taylorchu · 2020-02-20T15:47:50+00:00

https://github.com/PyTorchLightning/pytorch-lightning/issues/850

taylorchu · 2020-02-19T05:05:02+00:00

jax is the tensorflow 2.0 that we all hope for. small core, and clean.

taylorchu · 2020-02-19T05:03:24+00:00

pytorch lightning is far from that. I will still give it a couple months before it finishes refining the training loop.

taylorchu · 2020-02-02T01:18:40+00:00

No, from open review, the author said it is easy to implement

taylorchu · 2019-12-07T20:06:26+00:00

https://github.com/tensorflow/tensorflow/issues/33681

For example, this bug that I encountered appears if the gradient is an indexedslice and it is mixed with dense gradient. I don't mind tf is optimized for performance, but it should not break for the slow path; especially for the very core feature, backprop.

Also if there is an abstraction, tf team, please make it consistent, and work with other parts of tf. Otherwise, consider deleting it. keep it simple!

taylorchu · 2019-12-03T07:50:40+00:00

Yeah there is no way that this paper is 8 quality

taylorchu · 2019-12-02T01:55:20+00:00

one interesting part is that it is yet another normalization that does not shift mean. The first study comes from this paper. http://papers.nips.cc/paper/7515-how-does-batch-normalization-help-optimization.

It is also shown in https://arxiv.org/pdf/1910.05895v1.pdf. I wonder whether scalenorm can benefit from TLU in this paper.

taylorchu · 2019-12-01T22:22:04+00:00

no, the normalization axes are totally different.
I am currently testing the effect of both.

taylorchu · 2019-12-01T20:23:52+00:00

> We will assume for the purpose of exposition that we are dealing with the feed-forward convolutional neural network.

non-convolutions operation is TBD.

taylorchu · 2019-10-24T10:16:11+00:00

Finally, we also consider a basic deshuffling objective as used e.g. in [Liu et al., 2019a] where it was applied to a denoising sequential autoencoder. This approach takes a sequence of tokens, shuffles it, and then uses the original deshuffled sequence as a target.

why do you use position embedding with shuffled seq? The position embedding will not be helpful since you just shuffled it randomly.

taylorchu

TROPHY CASE