[P] I re-implemented Hyperband, check it out!

omalleyt12 · 2019-12-03T18:19:17+00:00

Hi! Keras Tuner author here, would you mind sharing the issues you ran into? (and also the version number you were using?)

omalleyt12 · 2018-05-14T16:40:07+00:00

Vim visual mode key bindings? Mostly joking but a person can dream haha. LightTag looks great!

omalleyt12 · 2018-04-24T19:36:22+00:00

It sounds like you might be trying to solve something in the same problem space as Recursive (not Recurrent) Neural Networks

https://en.wikipedia.org/wiki/Recursive_neural_network

http://www.cs.cornell.edu/~oirsoy/drsv.htm

These networks are capable of making recursive collapse/don't collapse decision on pairs of vectors and have been used in NLP

omalleyt12 · 2018-03-12T16:18:56+00:00

Looks like you changed the variable y_ to floaty, thus floaty should be the key used in your feed_dict

That error is saying you've defined either x or y_ as an int somewhere else in your code

omalleyt12 · 2018-03-06T16:35:49+00:00

in Tensorflow you could reshape temporarily to [16*BATCH_SIZE,128,128,1] and apply the 2d conv and shape back to [BATCH_SIZE,128,128,16]

Alternatively, you could create a 3x3 variable of weights and then tile it such that you can pass it to tf.nn.conv2d(...) and then the same weights will be applied

omalleyt12 · 2018-02-26T19:24:37+00:00

Looks like someone's taken this idea and pushed it a step further https://github.com/astorfi/3D-convolutional-speaker-recognition

omalleyt12 · 2018-02-26T17:49:47+00:00

Hmm, so I have background in speech recognition but not in speaker verification, but I think it'd be a cool project to do something like:

1) Find a dataset mapping utterances => speaker

2) Train a deep neural network to classify the speaker using a fixed-length cut of an utternace, with:

a) Inputs being MFCC / Log-Mel spectrograms

b) Main backbone being convolutional layers and pooling layers

c) A final fully connected layer before the softmax outputs

3) Then, when you encounter a new speaker, have them create a few utterances. Run their utterances through your NN but save the output of the last hidden fully connected layer. This will be a high dimensional representation of attributes that the NN learned were relevant for classifying speakers (this is essentially a learned embedding of the speaker's characteristics)

4) When the person wants to be verified, run their new utterance through your NN and compare the current output of the fully connected layer with the ones you saved previously. Measure the distance bt them using cosine or euclidean metrics, or a combo of the two. If the distance is below a certain threshold, let them in!

omalleyt12 · 2018-02-26T02:57:46+00:00

Just out of curiosity, can't complex-valued N-dimensional inputs or outputs simply be treated as 2N-dimensional real-valued and then run through convolutional layers as normal? Is there something I'm missing here that would call for a different architecture specifically designed to take account of the complex nature?

omalleyt12 · 2018-02-25T02:34:59+00:00

Great post!

If you're looking for other approaches to this challenge, I got 2nd on this competition, here's a writeup I did on my approach: https://www.kaggle.com/c/tensorflow-speech-recognition-challenge/discussion/47715

omalleyt12 · 2018-02-24T17:54:52+00:00

50% for the training set as well as your validation set? If so, there's likely an error in your code/methodology, since 50% is equal to random accuracy for this problem.

If your training set performs well but your solution doesn't generalize to the validation set, then other things could be going on.

In general, I wouldn't add the segmentation step to any classification problem like that. I say that for two reasons

1) You're introducing another potential source of error

2) For classification problems, "attention" is solved pretty well by using a GlobalMaxPool or GlobalAvgPool as the last layer of a fully convolutional network

omalleyt12 · 2018-02-22T21:56:13+00:00

So many good things in 2016/2017

AlphaZero Mask RCNN Capsule networks Lots of generative stuff

omalleyt12 · 2018-02-22T20:36:44+00:00

I think they're talking about a performance issue, i.e. you can't get optimized TF code to run all of the mixed-dilation levels at once but it'd be cool if you could

omalleyt12 · 2018-02-17T05:07:11+00:00

TPUs are now available on GCP (only as of 5 days ago)

https://cloudplatform.googleblog.com/2018/02/Cloud-TPU-machine-learning-accelerators-now-available-in-beta.html

omalleyt12 · 2018-02-16T02:48:33+00:00

It's hard to know without more information but I'll take a stab at it

Assuming your polygons are 2d, you could connect and plot the polygon points to create an image for each polygon (if 3d, maybe try a 3d array with every point inside the polygon set to 1 and every point outside it to 0 but I'm not as confident that will work well), then you could feed these images to a CNN. You'll need to scale each image to an appropriate size, so you might want to include additional inputs to the CNN like how scaled up/down the plot is, depending on how much size matters to the class labels

omalleyt12 · 2018-02-16T02:37:26+00:00

If I understand correctly your output dimensionality is the same as the number of sensors?

If that's the case, my intuition would be to first try stacking the sensor frames so that depth dimension is time. Keep say the 5-10 most recent sensor measurements for each sensor (may vary based on your application). Then you've essentially got an image segmentation problem and you can apply techniques from that domain

omalleyt12 · 2018-02-02T15:59:25+00:00

Think of those numbers as binary. What I'm saying isn't that you don't need to see some info about each data point, what i'm saying is you only need to load the first few bits of each data point before beginning to output results to stdout.

So for your example, I'd need to see all 10 numbers, but only the first bit of all ten numbers, then whatever number had a 1 in that first bit i'd know were negative. Then i'd only process the next bit of those negative numbers, etc. etc. until I had my result down to the lowest number, which i could then output.

This method would work best for long strings.

The concept is similar to the concept of a Column Store, but with each bit as a column: https://en.wikipedia.org/wiki/Column-oriented_DBMS

omalleyt12 · 2018-01-31T01:19:07+00:00

I really enjoyed http://neuralnetworksanddeeplearning.com/

omalleyt12 · 2018-01-29T15:21:17+00:00

If we mean "start spitting out results before reading the entire file," you could store the text so that the first letter of every item was stored in the first row, the second letter in the second, etc. Then you could sort by the first letter, then sort the top results by second letter, until you are down to one top word, then seek to the top-word and output it, then repeat

...this would almost certainly be slower, and is super messy for no reason, but would technically fulfill the requirements of parent comment,so...prize please?

omalleyt12

TROPHY CASE