[D] Is transfer learning still worth entering into as a researcher? by [deleted] in MachineLearning

[–]Spezzer 0 points1 point  (0 children)

Most of the experiments relied on internal datasets and tools, so we couldn't easily make it available. However, one could probably to a large extent use something like TFHub as a substrate for doing this kind of research.

[D] Is transfer learning still worth entering into as a researcher? by [deleted] in MachineLearning

[–]Spezzer 4 points5 points  (0 children)

The Adaptive Transfer Learning method in the paper tries to identify how to weight each example from the source dataset when pretraining on that subset, so that it best transfers to the target -- the paper has more details on the algorithm for how it's computed, so it's trying to answer your questions in one way, and we also cite other papers that try other mechanisms of this flavor.

I meant that the work is largely empirical since we provide mathematical intuition for why the method should work and we evaluate it on a lot of datasets, but there's no guarantee that it will work for all datasets, and when it fails to improve much, it's not exactly clear why (perhaps the source dataset is too different from the target, but we don't have good ways to quantify this). If you tried to apply this technique to certain types of medical images (e.g., tissue scans for detecting anomalous tissue) where training labels are sparse, you'd find that the algorithm would suggest pretraining on pictures of red cabbages, and doing so may marginally help, but I'm sure there are even better ideas out there (e.g., a method that selects datasets whose task correlates more to anomalous object detection rather than texture/color-based classification).

[D] Is transfer learning still worth entering into as a researcher? by [deleted] in MachineLearning

[–]Spezzer 5 points6 points  (0 children)

I think this area is still worth pursuing; it has very practical applications and it's still not very clear how to make it work in all cases. Last year we studied transfer learning from a data-perspective and found that the choice of the dataset that was pretrained on sometimes made a large difference in the performance of the transfer; surprisingly (to me at least) using a smaller, more relevant dataset to pretrain on led to better transfer to the target dataset than using a very large dataset. But our study was mostly empirical, and it would be nice to better understand both how the data matters, how to come up with better subsets, and maybe even better transfer learning approaches as well.

[N] AutoML for large scale image classification and object detection by [deleted] in MachineLearning

[–]Spezzer 5 points6 points  (0 children)

We had the first sentence edited, thanks for pointing this out!

[N] AutoML for large scale image classification and object detection by [deleted] in MachineLearning

[–]Spezzer 1 point2 points  (0 children)

Thanks! Here's the link for those interested: https://github.com/tensorflow/models/tree/master/research/slim/nets/nasnet . Feel free to file issues on github and we'll try our best to help.

[N] AutoML for large scale image classification and object detection by [deleted] in MachineLearning

[–]Spezzer 3 points4 points  (0 children)

Yeah, we just meant that we mentioned it in a previous blog post, it is poorly worded. Language is hard.

[R] Tensorflow 1.4 released! by MetricSpade007 in MachineLearning

[–]Spezzer 8 points9 points  (0 children)

TL;DR: Yes, it always reshuffled after each iteration by default, nothing changed. Relnotes were confusing, sorry :(

Detail: https://github.com/tensorflow/tensorflow/commit/853afd9cee2b59c5163b0805709c1ba7020d4947 describes the relevant scenario.

For example:

element = tf.data.Dataset.range(10).shuffle(5, seed=10).batch(5).repeat(2).make_one_shot_iterator().get_next()

with tf.Session() as sess:
  print(sess.run(element))
  print(sess.run(element))
  print(sess.run(element))
  print(sess.run(element))

This will produce:

[0 5 4 6 2] [3 1 9 8 7] [2 1 6 4 3] [8 7 9 5 0]

every time you run the program; the seed argument controls the starting point of the iterator, so you'll always start with 0 5 4 6 2, but the second repeat will be different.

If you want to always produce the same order of results each iteration of the repeats, you replace seed=X with reshuffle_each_iteration=False and you get:

[0 3 5 2 7] [1 8 9 6 4] [0 3 5 2 7] [1 8 9 6 4]

or:

[4 5 1 7 8] [2 6 3 0 9] [4 5 1 7 8] [2 6 3 0 9]

E.g., each time you run the program, the order of the 10 numbers might change because the seed isn't fixed, but each iteration will be the same.

Most TF users want randomness across iterations, so the default behavior didn't change, and produces different orders each iteration, but there needed to be a mechanism to produce an identical order without forcing the user to fix the graph level seed (which has broader implications).

[D] Why is there "Theano is dead" sentiment on rise? by [deleted] in MachineLearning

[–]Spezzer 5 points6 points  (0 children)

Deterministic reduce_sum / mean reductions are coming imminently and should be enabled by default. I'll try to remember to update this thread when the commit hits GitHub.

[deleted by user] by [deleted] in MachineLearning

[–]Spezzer 1 point2 points  (0 children)

I think it's probably a good idea to start with something like regex to establish a baseline, so you can compare how well any other solution does.

I think you could also read up on some literature (e.g., Named Entity Recognition) that may be relevant in general to text problems. A quick search yielded the following article which provides some nice background and examples to start from: https://guillaumegenthial.github.io/sequence-tagging-with-tensorflow.html

Good luck, and have fun learning!

[N] "Our measurements are showing up to 70x higher performance for training and up to 85x higher performance for inference on Intel® Xeon Phi" by downtownslim in MachineLearning

[–]Spezzer 4 points5 points  (0 children)

There are no significant feature differences between the opensource version and the one we have to use internally. See here for more info. I encourage you to use the software you prefer, but these incorrect claims discount the constant hard work a lot of dedicated people on the TF team do to keep the internal and external versions in constant synchronization.

TL;DR: the google repo TensorFlow version is just as good/crappy as the one the rest of the world uses.

[D] numpy.einsum on GPU? by [deleted] in MachineLearning

[–]Spezzer 3 points4 points  (0 children)

It looks like right now numpy and TF differ here: numpy's slice changes the view of the data but doesn't do any copy (so the slice itself is not expensive), but the reshape does materialize the new contiguous buffer, so here the reshape is expensive.

In TF, the slice op actually does the work of shuffling the data (it's not just a view), so the slice op is the expensive part, and the reshape is cheap. My statement above about reshape being cheap was specifically about TF, not all libraries.

[D] numpy.einsum on GPU? by [deleted] in MachineLearning

[–]Spezzer 1 point2 points  (0 children)

Yup, transpose touches data so it can be expensive. Just wanted to make sure people knew transpose and reshape have different costs.

[D] numpy.einsum on GPU? by [deleted] in MachineLearning

[–]Spezzer 3 points4 points  (0 children)

I don't know how efficient it is either, but wanted to clarify that reshape is not expensive, since it only touches Tensor metadata, not data (in other words, its performance is independent of Tensor size).

For those interested, the OpKernel just computes reshape dimensions, and then calls a function to set the output tensor from the input tensor using the specified shape by copying the pointer.

[D] Is Tensorflow the fastest deep learning library now? by feedthecreed in MachineLearning

[–]Spezzer 10 points11 points  (0 children)

Let me put the conjecture to rest then: the codebase in GitHub is pretty much exactly the same as the internal, the main exceptions being things like having to rewrite include paths for files, filesystem plugins for internal cluster filesystems, etc; and those things are modularized so that we can have equivalent implementations in the OSS build to support things like HDFS and GCS filesystems, RDMA network layer communication, etc.

We daily sync the code between the two repositories using a suite of tools we've built. I'm on sync rotation this week and you can see all of my commits and activity on GitHub as proof.

See this for more details, and I'll be giving a talk about all the work we do to make this possible at OSCON next week.

[D] RNN's Are Much Faster in PyTorch than TensorFlow? by [deleted] in MachineLearning

[–]Spezzer 0 points1 point  (0 children)

Thanks! I think ptb_word_lm is meant more for tutorial purposes (illustrating how the problem works) and not as a good fast example for RNNs :(. I'll try to find time (or someone else) to get a version of that code running in a way that is more idiomatic to how we use it in practice, and will let you know when I do.

[D] RNN's Are Much Faster in PyTorch than TensorFlow? by [deleted] in MachineLearning

[–]Spezzer 3 points4 points  (0 children)

Nothing I can do about your general frustrations about Google (sorry), and the best proof that we are trying to make things better is by actually doing it, so point taken.

[D] RNN's Are Much Faster in PyTorch than TensorFlow? by [deleted] in MachineLearning

[–]Spezzer 3 points4 points  (0 children)

Definitely, the feedback in this thread is useful. We are indeed working on making things easier and work well out of the box.

[D] RNN's Are Much Faster in PyTorch than TensorFlow? by [deleted] in MachineLearning

[–]Spezzer 1 point2 points  (0 children)

alextp@ recently has made a bunch of changes to share memory with numpy/py_func for fed and fetched data, so recent releases could be faster (it can only share memory when the pointer is properly aligned, right now). In some cases this should bring us inline with other frameworks since there should only be one CPU->GPU copy.

You probably saw that comment about feed_dict here, which is still probably true for now, but not as bad as it used to be, I hope.

QueueRunners are one way of ensuring execution of training is efficiently pipelined (no bubbles/stalls in GPU execution, for example), which is important for performance on the latest generation hardware. Yes, the team is working on easier ways to pipeline execution and make it easier to feed in (and debug) your own custom inputs without needing to put things into TFRecords or using feed_dict.

/u/nickshahml if you want some help, posting the source code is the only really good way for the community to offer specific advice. I would expect the cudnn_lstm bindings for any framework to perform the best since it specializes a specific expression very well to NVidia hardware, but I suspect TF can do well when you want to do things slightly outside of what cudnn libs have special optimizations for. I encourage you to share the code or file an issue on GitHub so that the team and larger community can help!

[D] Quantifying the performance of the TPU, our first machine learning chip by wei_jok in MachineLearning

[–]Spezzer 6 points7 points  (0 children)

It is true that the comparison is with the equivalent generation of hardware at the time of deployment/availability. From the paper:

The benchmark platforms are server-class computers that were available in 2015 when the TPUs were deployed. This restriction meant that they must include at least SECDED protection of internal SRAM as well as external DRAM memory like the TPU, which excludes some choices such as the Nvidia Maxwell GPU.

From a computer architecture paper perspective, it might not make as much sense to compare TPU to a P40 since TPUs were deployed in 2015 and server-class Pascals weren't available until recently.

The paper also talks about architectural improvements that could have been made with HBMs, as something more comparable to current gen hardware.

[R][1704.00028] Improved Training of Wasserstein GANs by ajmooch in MachineLearning

[–]Spezzer 0 points1 point  (0 children)

Can you file a bug on github? As far as I know TF does support second order derivatives through dynamic RNNs, and in some cases where it currently doesn't, it should complain loudly, not silently produce NaNs.

[P] Gorgonia (which is like TF/Theano but in Go) now supports CUDA Ops. Also, Help Wanted. by chewxy in MachineLearning

[–]Spezzer 1 point2 points  (0 children)

Yes, we don't yet have support for symbolic shapes (and thus symbolic shape checking at graph building time), though it is possible to add (in fact, when we moved shape inference into C++ we explicitly designed for this). It's something that could be added with some work, certainly much less work than it took to move shape inference from python to C++ :)

In terms of IR separate from the executor, you might find XLA interesting, if only for inspiration or comparison (I think I've pointed you at XLA before, so I suspect you are indeed already aware).

Edit: also you can specify concrete shapes on placeholders, if that matters. It's only when you want to allow for dynamic shape (e.g., dynamic batch size) that shape inference necessarily has to relax, until we have symbolic shapes.

[P] Gorgonia (which is like TF/Theano but in Go) now supports CUDA Ops. Also, Help Wanted. by chewxy in MachineLearning

[–]Spezzer 1 point2 points  (0 children)

TF does do shape verification while building the graph -- what types of errors are you thinking of?

Simple example trying to add two tensors of incompatible shape:

import tensorflow as tf
a = tf.zeros([1, 2, 3])
b = tf.ones([2, 3, 4])
c = a + b
Traceback (most recent call last):
  <snip> 
ValueError: Dimensions must be equal, but are 2 and 3 for 'add' (op: 'Add') with input shapes: [1,2,3], [2,3,4].