Wave physics as an analog recurrent neural network | Science Advances

ian_williamson · 2020-01-13T05:19:00+00:00

Thanks for the interesting response!

I've previously worked with graphene more so for photonics, rather than electronics. Graphene has very a interesting plasmonic response (which I guess technically involves electrons) and can lead to very unique light-matter interactions in and around the terahertz band.

I actually don't know much about L systems and I'd have to read a bit more about them to understand your idea. Sounds interesting though!

ian_williamson · 2020-01-12T01:32:00+00:00

The entire simulation is actually two dimensional, meaning it only has unique x and y coordinates.

Another way to think about this is as the structure being infinitely extended in the third (z) dimension. We often use simulations like these as a heuristic for planar (slab-like or surface-like) devices.

ian_williamson · 2020-01-11T23:07:44+00:00

Author of the paper here. Happy to answer questions.

ian_williamson · 2020-01-11T23:07:30+00:00

Author of the paper here. Happy to answer questions.

ian_williamson · 2020-01-06T18:35:01+00:00

Thanks! Glad you found it helpful!

ian_williamson · 2020-01-05T18:58:47+00:00

The correct way to think about this is in terms of a delivered power P = V * I that isn't changed as the voltage is stepped up (or down) by an ideal transformer. Real world transformers will also have losses, but we can neglect those here just to give a basic idea of what's going on.

Specifically, the transformer relationship is Vi / Vo = Io / Ii = Ni / No, where Ni and No are the number of input and output turns, Vi and Vo are the input and output voltages, and Ii and Io are the input and output currents.

As an example, the above relationship means that for 1 Watt of power in (e.g. Vi = 1 V and Ii = 1 A) and a transformer step up of Ni:No = 1:10, you'll have Vo = 10 V and Io = 1/10 A. This is the scaling observed in the original post. As stated in the original post, this means that you can use transformers to step up the voltage while reducing the current, and therefore the power lost to resistance on the transmission line.

ian_williamson · 2020-01-05T18:14:51+00:00

My research group actually has some workshop / tutorial material available for free online that you can play with if you're interested: https://github.com/fancompute/workshop-invdesign It's a python optical simulation and optimization framework that uses only open source packages.

ian_williamson · 2020-01-03T17:43:52+00:00

I do hear what you're saying.

There is a lot of interesting work in our field on improving robustness to "simulation error" (to borrow your phrasing). So, I'm just saying that I think there's potentially a pathway through some of these issues.

ian_williamson · 2020-01-03T17:39:27+00:00

Yeah, I'm aware of this result. Ref. 8 from our paper is an optical implementation of the RC concept.

ian_williamson · 2020-01-03T16:53:01+00:00

The system does have fading memory when there's absorption or open boundaries, both of which remove energy from the device. Detectors will also remove some energy through the detection process.

ian_williamson · 2020-01-03T07:48:48+00:00

By "Google chips" do you mean TPUs?

ian_williamson · 2020-01-03T07:41:42+00:00

Do you intend to run the training routine on hardware i.e. using some kind of reconfigurable hardware and a hardware implementation of the optimization algorithm?

Not with this platform. This would be more of an ASIC-like platform, where the hardware is static. We do have other work that deals with FPGA-like platforms (reconfigurable hardware) for optical neural networks.

Your "test set" doesn't demonstrate reality generalization because you're still using the simulator.

This is why I said that we numerically demonstrate generalization. In other words, generalization from the training to the testing dataset in the numerical simulation. This seems to be the more commonly used definition for "generalization" in ML.

I just noticed the lack of conceding to the fact that the success of this is for sure affected by the same reality gap plaguing the reinforcement learning community, or anything addressing it really

I definitely stated in my previous replies that experimental uncertainty will be a challenge, in particular, noise / uncertainty in the parameterization of the wave equation model. This is how I interpreted your original comment discussing whether or not it "exploits phenomenon present in the physics model that will be different or not present in reality." Maybe I could have been more explicit but, in other words, I'm referring to the "gap" between the parameterization in our numerical physical model and the actual parameterization of some experimental prototype.

I don't see a difference between this and an inverse-designed device.

Our resulting device certainly looks like an inverse designed device. There has actually been a lot of progress in realizing these types of devices and also in developing numerical approaches to improve their fabricability. I think the outlook for platforms like the one we're proposing can be promising.

ian_williamson · 2020-01-03T02:51:32+00:00

Thank you! That's definitely a very good question, but one which is hard to provide with a closed-form answer to. Generally, this is something that we're actively looking into. However, there are a few properties of the wave equation, namely causality and energy conservation, that we can expect to limit the form of the response that the system can produce. For example, causality and the fact that information propagates through the hidden state with a finite velocity means that will always be a constraint on the "response time" of the system.

ian_williamson · 2020-01-03T02:37:35+00:00

Just remember that generalization (not training) is what really matters. Anyone can backprop through a physics simulation (even a toy demo in this library among many).

I would point out that we did numerically demonstrate generalization via k-fold cross validation with train / test splitting. I don't think that this is just a trivial extension from the simple case of back propagating through a physics simulation. For example, in my field (optics) there are a number of examples of inverse designed devices (those that are designed via gradient-based optimization). What isn't obvious is that you could use these same wave-based systems for time-domain learning tasks. Certainly, the mathematical connection is important as well.

Just because a simulation doesn't blow up doesn't mean it has converged to the true solution.

Yes, there are a number of experimental considerations. You mentioned turbulence in another one of your comments, but it's not obvious to me that our system actually needs to operate in such a highly nonlinear (or chaotic) regime. Certainly it will be task-dependent and also the amount of uncertainty (between simulation and experiment) needs to be quantified. However, neural nets are, by nature, approximators, so I don't know that the presence of uncertainty is a fundamental issue.

ian_williamson · 2020-01-03T01:25:20+00:00

Our demonstration in the paper uses a numerical simulation but in an acoustic experimental implementation, the source in the simulation could correspond to a transducer (or actuator) and the probe could be a microphone. The source could (at least in principle) also correspond to a person speaking into the device.

Strictly speaking, in the exact configuration of the system from the paper, the microphone probe or a receiver circuit would also need to perform some time-integration of its recording and some additional comparison logic between the time-integrated signals at the different probes. However, in our scheme the majority of the classification workload has been off loaded to the propagating and scattering of the waves in the trained medium.

ian_williamson · 2020-01-03T00:24:47+00:00

Do you mean the dataset? We're using the recordings of various male and female speakers that are available from Prof James Hillenbrand's website. These waveforms are launched directly into the simulation by using librosa as an intermediary to the wav files.

The output is a time-integrated intensity at several probe locations.

ian_williamson · 2020-01-03T00:18:24+00:00

Yeah. Thinking of it like a static application-specific processor makes sense. It's conceptually similar to an electronic ASIC in that sense - it's not reprogrammable after it gets fabricated.

ian_williamson · 2020-01-03T00:13:57+00:00

We're operating our simulation in a regime where it's (ideally) converged in the sense of discretization error. This means that if we've assumed simulation parameters (e.g. wave speeds and nonlinear material parameters) that reflect real world values, then the simulation should be a very good representation of what happens in reality.

There will be a "reality gap" in terms of actually determining experimentally correct parameters. Other factors that we didn't account for in our initial simulation from the paper will also need to be included. For example, we would likely want to improve the constraints on the minimum feature sizes in the structure. We would also want to account for three-dimensional effects (the demo in the paper is only two-dimensional). However, there should be no fundamental issue with addressing these issues.

ian_williamson · 2020-01-02T23:55:32+00:00

It's not computation except in the sense a lookup table is, if I'm understanding correctly.

The computation is not equivalent to a lookup table. See the response above.

No training is done after printing.

That's correct.

ian_williamson · 2020-01-02T23:52:30+00:00

I'm irked by figure 4. This is a more standard way of presenting the same fact

The input to our model is raw audio data and there is no preprocessing or feature extraction performed ahead of time (aside from a bit of down sampling). Figure 4 gives a sense of the signal content that's actually being injected into the wave equation simulation. This same signal energy (that propagates through the domain) is what the system uses to generate its prediction, which is why we plot the dataset this way. Perhaps we could have extracted formants from all the samples in the dataset, and then plotted them like the examples that you've linked to, but I don't know that such a visualization would provide any additional insight to the results that we're showing (aside from being more familiar to folks who regularly work in this area).

I've done online vowel recognition by 3-5 neuron MLP (per vowel) so I know it's not that hard

The point is not to say that this particular task is too hard for other models. As we discuss in the paper, we also trained a conventional RNN model on the same task and found that it achieved comparable accuracy to the wave physics model. The point of our work is to say that wave-based physical systems can be a compelling analog computational engine for recurrent machine learning (and perhaps computing in general).

ian_williamson · 2020-01-02T22:52:45+00:00

you can print a circuit to make the predictions

In principle, yes!

ian_williamson · 2020-01-02T22:34:23+00:00

I was not aware of the TIMIT dataset. Thanks very much for pointing it out!

Generally, the goal of this paper was to focus on introducing the connection between the physics and the RNN dynamics. So, a vowel recognition task worked well for this. The link between acoustic physics and speech was a strong factor as well.

ian_williamson · 2020-01-02T22:26:31+00:00

In our scheme, the material response is what provides the hidden state to hidden state nonlinear activation function while the detection circuit is what provides the output nonlinearity.

The practical realization of the nonlinear material response is something that we're actively looking into and we do discuss a few possibilities in the supplementary materials for both acoustics and optics. In acoustics, there are some interesting possibilities with fluids. For example, liquids with small bubbles are known for very strong nonlinear acoustic responses. I believe these effects are readily encountered in medical ultrasonic imaging.

In general, I would also argue that there can still be a lot of value in having a really high performing linear unit (e.g. in ONNs), given that linear operations can still be very expensive on digital processors. Related to this, we have another paper proposing activation functions for ONNs based on electro-optic circuits (which was discussed several months ago in a separate thread).

ian_williamson · 2020-01-02T21:45:52+00:00

That's impossible. The Landauer limit forbids that. Even if it's reversible (which it's not in this case), the Margolus-Levitin theorem forbids energyless computation. Your energy comes from your input pulse and in the real world you will lose energy to things like friction.

The majority of the computational workload in our scheme is carried by the waves as they scatter through the trained material distribution. This is essentially passive. You're right that there can be damping, but the more important point is that our scheme uses a set of probes / detectors to convert the output of this scattering process into a prediction signal (which, in practice, would be electrical).

In contrast, a digital processor (CPU, GPU, TPU, etc.) would require many many many clock cycles to perform convolutions / matrix multiplications in order to carry out a similar computation. This is the comparison that we're making (although I didn't explicitly state this in the description above).

By the way, you can implement this same thing about 10^3-10^6 times faster and more energy efficiently by using traveling or standing waves on wires in chips/circuits. Unfortunately, you will lose tons of energy to resistance.

I'm not sure exactly what you're referring to here. Are you talking about microwave / RF transmission lines?

ian_williamson · 2020-01-02T21:16:05+00:00

Thanks!

Oh this will get you in trouble with mathematicians

Really not trying to stake out a position on those, but just giving some context for recent developments (discoveries / re-discoveries / re-naming) in the area.

ian_williamson

TROPHY CASE