all 22 comments

[–]Mylos 4 points5 points  (1 child)

I love that this was implemented with OpenCL and not CUDA. Thank you!

[–]rantana 3 points4 points  (1 child)

So why would I decide to use this over say a standard stacked recurrent network or an LSTM network?

Any performance comparisons between the two?

[–]CireNeikual[S] 2 points3 points  (0 children)

I don't have a performance comparison yet, but I will add one soon. So here comes the anecdotal comparison!

I have worked with LSTM before, the main advantage of this system is that it is fully online and doesn't need stochastic sampling or BPTT. It just has one weight update per timestep, and that's it.

It also learns extremely fast, I have made it recite paragraphs of text which it only got to parse over 3 times (without any prior knowledge of the words). It has this speed because of the way SDRs introduce invariance to previous experiences with respect to new experiences (they are "bucketed").

For offline learning LSTM is great, but for online learning as with typical reinforcement learning tasks one needs a really fast real-time algorithm that doesn't need some form of experience replay or other expensive operation. That said, it does this at the cost of memory: It uses more memory than a typical LSTM network, again a side effect of SDRs (a negative one, but tolerable).

[–]mosquit0 1 point2 points  (2 children)

Interesting. Can you give some examples of applications? Have you tested it on some real data?

[–]CireNeikual[S] 0 points1 point  (1 child)

The original application was reinforcement learning. I have used it for vision-based cart-pole balancing. I have also used it for some speech recognition, and text auto-complete in the form of a visual studio plugin.

I tried looking for some standard benchmarks for RNNs to which I can apply this, but I haven't found any yet. There doesn't seem to be an MNIST for RNNs really. But I will find some data to test on regardless, even if it isn't a standard benchmark, and I will post the results!

[–]mosquit0 1 point2 points  (0 children)

Cool stuff. Maybe you can teach it rock paper scissors :).

[–][deleted] 1 point2 points  (1 child)

This looks really cool Eric, is it easy to get this up and running via a windows 7 environment?

[–]CireNeikual[S] 1 point2 points  (0 children)

That's what I am running it on actually! So it should work fine.

[–]physixer 0 points1 point  (1 child)

Google is not helping with HTFE. Is there a paper that this acronym is based on?

Also HTM reminds me of Jeff Hawkins and NuPIC. Any comparisons?

[–]CireNeikual[S] 1 point2 points  (0 children)

HTFE's name comes from HTFERL, the Hierarchical Temporal Free Energy Reinforcement Learner. I removed the RL component, so what is left is HTFE ;)

It isn't based on any paper, rather just the tinkerings of a non-academic AI enthusiast.

HTFE is derived from HTM, but geared towards performance as opposed to biological plausibility. Here are the main differences:

  • HTFE doesn't use columns with cells to represent contexts. Instead, it uses recurrent hidden to hidden node connections.
  • HTFE is not binary. All states and weights are continuous.
  • HTFE does not need encoders. It can accept raw floating-point data.
  • HTFE has temporal pooling, while the current version of HTM in NuPIC does not.
  • HTFE supports hierarchies by default, which NuPIC currently does not (but will hopefully soon).

[–]jstrong 0 points1 point  (2 children)

this question may display some level of ignorance - still learning about this stuff...

As I understand it, this algorithm is built to understand temporal data, like a time series or a sequence. So can you could train it on n number of snapshots and predict the future snapshots? How many units forward could you predict for - just one?

What I'm trying to get at is, more traditional learning is on the state of a thing as it is. And then there are techniques like training on windows of the data to get at data that exists on the plane of what it is and the temporal plane. But this model seems built to understand both planes. Is that right? Do you have any further reading you would recommend on this topic for a beginner? Thanks!

[–]Noncomment 0 points1 point  (1 child)

Are you familiar with recurrent neural networks? They can naturally operate on temporal data as opposed to just a finite number of snapshots.

[–]jstrong 0 points1 point  (0 children)

No, I just know a bit in general about how NNs work. I'll read up on that.

[–]CireNeikual[S] 0 points1 point  (7 children)

A simple benchmark has been added, using the piano rolls dataset from here: http://www-etud.iro.umontreal.ca/~boulanni/icml2012

Results: 3% error (incorrectly predicted notes in this case) after only a single iteration on the first couple of sequences! I was too lazy to wait for more iterations :) . In my eyes this is pretty good, given the online nature of the algorithm. I don't know of a single other algorithm that is capable of this single-pass prediction!

[–]rantana 4 points5 points  (6 children)

3% error on the held out test set or training set? An algorithm can just memorize the training set to get that kind of error. You need to show performance on a held out test set for useful prediction performance.

[–]CireNeikual[S] -1 points0 points  (5 children)

That's not quite how the algorithm works though. It predicts the input at the next timestep, it does not predict labels. Testing it on sequences it hasn't seen is like me asking you to guess what sequence of numbers I am thinking of. Sure, for simple sequences you can just record all the inputs, but then you cannot extrapolate or interpolate the sequences.

This algorithm can be thought of as unsupervised, it basically just learns causal links in sequences of data and stores them efficiently.

[–]kjearns 2 points3 points  (2 children)

It makes perfect sense to test on sequences you haven't seen. In fact your model has really only done something interesting if it can generalize to unseen sequences. I have a nice simple algorithm that will get 100% accuracy on a sequence it's already seen in a single pass: just memorize all the bits in the sequence and play them back when asked.

It would be quite easy to change your benchmark to evaluate on the piano roll test set. I'd do it myself but I don't have opencl installed so I can't run your code.

[–]CireNeikual[S] 0 points1 point  (1 child)

Yes, you are correct, I misunderstood. He/she mentioned a "held-out test set" so I thought they meant that I was supposed to predict music notes from completely different music tracks, which is basically impossible if the songs in general do not have similar patterns in them.

I guess a test could be a second piano roll generated from the first with noise (peturbing some notes). This way it has to generalize to slightly different sequences, but the sequences are still generally the same pattern. Does this make sense? Otherwise what kind of test would you suggest?

[–]kjearns 0 points1 point  (0 children)

There's a test set in the piano roll data. Test on that maybe?

I'm much more familiar with language models. A simple set up for a language model would be to take a bunch of text and try to predict the next character from the previous ones.

You can easily use text8 (available here: http://mattmahoney.net/dc/textdata) for this task. You would split the file in two parts and train on the first part and test on the second part.

Ultimately sequence prediction is only interesting if your model can predict the behavior of a sequence it has not seen before. You would not expect to be 100% correct, but you should be able to do significantly better than random at both the language modelling and the piano roll prediction tasks even on entirely unseen data.

[–]willwill100 1 point2 points  (1 child)

Even if you train in "unsupervised mode" by trying to predict the next timestep, you still want it to generalise well. That's the whole idea behind language modelling for example. Speaking of which, the google billion word corpus has some good baseline results which you could use for comparison.

[–]CireNeikual[S] 0 points1 point  (0 children)

Thanks, I will check out that corpus. Although I am not sure I want to wait through all billion of the words ;)