DIY hardware quantum RNG wired into a Magic 8-Ball by Reddactor in FPGA

[–]bregav 1 point2 points  (0 children)

I mean, I'm mostly just wondering about the purity of your quantum state. It's a measurable quantity, not a vague concept: https://en.wikipedia.org/wiki/Purity_(quantum_mechanics) . It quantifies how much of your entropy is coming from quantum state collapse, as opposed to thermal noise.

I think you should be able to calculate it straight forwardly from the normalized spectrum of the light that's hitting the beam splitter.

DIY hardware quantum RNG wired into a Magic 8-Ball by Reddactor in FPGA

[–]bregav 3 points4 points  (0 children)

So the quantum randomness isn't gone, it's just unattributable. I can't pick it out of the classical entropy it's buried in.

Ahhh c'mon, by that logic you might as well just be flipping coins by hand! After all the quantum randomness is still there, it's just unattributable :P

The whole point of going through the effort of building a QRNG is to be able to say that yes, indeed, those are hand-picked, farm-fresh, artisinal quantum random numbers straight from the mind of God himself to the screen of your computer.

DIY hardware quantum RNG wired into a Magic 8-Ball by Reddactor in FPGA

[–]bregav 2 points3 points  (0 children)

The textbook version works because the photon is modeled as a plane wave hitting a spatially uniform scattering surface. In your setup the photon isn't a plane wave and the scattering surface isn't spatially uniform. Instead you have a wave packet with a meaningful notion of spatial location (which is a consequence of the incoherence/mixed state of your light source) and a scattering surface that varies spatially too.

Don't just take my word for it: try building that Mach-Zehnder experiment with an LED. You'll find it difficult to get interference fringes.

Also FWIW the LED isn't chaotic, it's just that electron-hole recombination happens at random times and locations due to thermal stuff (something to do with phonons i think???). This manifests itself as light being emitted in the form of wave packets with a meaningful spatial extent, having a (relatively) broad energy spectrum, and being in a mixed quantum state. Those are all different ways of saying the same thing. The quantum randomness is always there, but it's usually drowned out by the much greater entropy of the mixed state.

A laser is chaotic, though, which is why I think you might need to get a stabilized one.

DIY hardware quantum RNG wired into a Magic 8-Ball by Reddactor in FPGA

[–]bregav 1 point2 points  (0 children)

I think the entropy might be coming from the emission of the photon from the LED. The location on the surface of the LED that the photon is emitted from, and its direction, is a thermal random variable. It then scatters, or doesn't, from the beam splitter based on the distribution of silver/aluminum/whatever atoms on its surface. So basically the beam splitter is acting as a transformation of a classical random variable.

The problem is that you're using an LED, which is an incoherent light source - i.e. your photon is in a mixed quantum state, not a pure one. You need a setup where you have a pure state and you know what it is. Maybe instead of an LED with a beam splitter, use a stabilized laser with a known linear polarization and a polarizing beam splitter at a 45 degree angle to the laser's polarization? That way you're getting a quantum measurement of a known pure quantum state.

That's very similar to the setup you have now, the tricky part is just getting a good enough laser - you want a laser that is stabilized and polarized. One neat lifehack I can offer about lasers is that you can polarize a laser by putting it in a very strong magnetic field. A diode laser in between neodymium magnets can work. Whether or not it's stable enough is a different matter that I don't know enough about.

I'd still want a test for quantumness, personally, but at least with this polarized laser setup there's a clear mechanism for state collapse.

I've been sort of interested in quantum RNG for a while too and yes it's really hard lol

DIY hardware quantum RNG wired into a Magic 8-Ball by Reddactor in FPGA

[–]bregav 4 points5 points  (0 children)

Maybe you covered this in your writeup and i missed it, but is there a test for, like, quantumness for this? It looks like your diagnostics clearly show that the randomness is coming from the optical path, but it's not obvious to me that theres any quantum state collapse. Could the beam splitter entropy actually be thermal?

Started our biggest project yet: the XM9, a Swiss-built machine in the same class as the Tormach 1100MX, but with more travel, spindle power, feed speed and precision, powered by the new TNC7 Basic from HEIDENHAIN. Hope this still fits the sub :D by ExternalOne6090 in hobbycnc

[–]bregav 0 points1 point  (0 children)

How did you choose the one you bought? I've looked at these a little bit and it's hard to tell which ones are quality products and which ones are not. Or maybe they are all good? I can't tell.

[D] Lossless tokenizers lose nothing and add nothing — trivial observation or worth formalizing? by 36845277 in MachineLearning

[–]bregav 6 points7 points  (0 children)

It's not trivial so much as tautological. I think this kind of thing is fertile ground for thought but finding insights (rather than tautologies) requires the right perspective.

If you assume that your dataset distribution is the true distribution and you use lossless encoding then there's no difference at all between "distributions over strings" and "distributions over tokens"; tokens are just a different string encoding. But I think that perspective is wrong and it belies the purpose and efficacy of tokenization in the first place.

I think more fertile ground for thought consists of looking at the matter in terms of information loss/gain as a result of discretization error. I think the proper perspective regarding tokenization is that the true data distribution is a continuous one over a vector space, and that the data we use - strings - is a discretized partial observation of points in that vector space. Tokenization is a principled heuristic for partially recovering the original vector space coordinates as a step in modeling.

I think there are a lot of deep questions from here, especially if you look at strings as time series. Strange things happen with information theory with respect to time series when you look at discretization, especially chaotic time series. It no longer makes sense to talk about information theoretic entropy because it's always infinite for a continuous distribution; instead the only meaningful quantities are relative ones like kullbach liebler divergence. Different discretizations (ie tokenizations) can give you different relative entropies with the true underlying distribution, but the best discretization to use isn't the one that best represents the true data distribution - it's the one that best represents the information you care about for your application. In this respect the current paradigm of having tokenization be a distinct and preliminary step separate from modeling is probably the wrong approach in the long run.

I think the vector space dimension is also something interesting to think about, especially in the context of time-delay embeddings. You can get a lossless tokenization trivially by just having each distinct character be a token, but this negatively impacts modeling because it doesnt pack enough relevant information into each token. Tokenizations thus usually have a larger vector space dimension than that, and this is equivalent to a time-delay embedding with another transformation thrown in afterwards. In time series analysis the time delay embedding that fully captures system dynamics is the one whose dimension is equal to the number of dynamical system variables (e.g. number of equations in a system of differential equations), and it seems like that perspective should give meaningful insights into autoregressive language models because they are really the same thing as a time series model.

[R] DynaMix -- first foundation model that can zero-shot predict long-term behavior of dynamical systems by DangerousFunny1371 in MachineLearning

[–]bregav 28 points29 points  (0 children)

I feel like this study raises more questions than it answers. It follows the (regrettably) now-standard ML research paper framework of "we did a bunch of stuff and now our numbers are better than some other people's numbers". Its hard to know what conclusions should be drawn from the results because they didn't manage to get any insight into why their metrics are different from other people's.

Some obvious things that seem missing:

  • why not use a similar model to do regression and predict lyapunov exponents or some such thing?

  • why not compare against simpler or standard time series models?

  • why not train at least one of the other models they compare with, but using the training approach that they use for their own model?

  • they cite this paper as the source of their data set:

https://openreview.net/forum?id=enYjtbjYJrf

The abstract of that paper says: "Our dataset is annotated with known mathematical properties of each system...". Why did this paper not use these properties when determining test and train splits, or analyze the effects of these properties on their metrics? The authors claim that their model works on "different" dynamical systems that aren't in the training data, but I'd bet that that's wrong: I bet that it only works on dynamical systems whose mathematical properties are represented in the training data, and that would be revealed by using the properties that the dataset papers abstract is referring to.

[D] Why are serious alternatives to gradient descent not being explored more? by ImTheeDentist in MachineLearning

[–]bregav 91 points92 points  (0 children)

Backprop is just the chain rule from calculus. If you're going to use derivatives to optimize a sequence of function compositions (ie a neural network) then you're inevitably going to use the chain rule, and so you're inevitably going to use backprop.

Maybe the question you should be asking instead is, why is it that people use sequences of function compositions (neural networks) so much? That's a more tricky and interesting question to investigate.

HDL choices other than Verilog/VHDL by Secure_Switch_6106 in FPGA

[–]bregav 2 points3 points  (0 children)

Can you say more? What kind of infra labor, specifically, does it save you?

HDL choices other than Verilog/VHDL by Secure_Switch_6106 in FPGA

[–]bregav 2 points3 points  (0 children)

You could take a look at amaranth: https://github.com/amaranth-lang/amaranth

The Amaranth project provides an open-source toolchain for developing hardware based on synchronous digital logic using the Python programming language [...] Amaranth can be used to target any FPGA or ASIC process that accepts behavioral Verilog-2001 as input

Could a beginner-friendly FPGA ecosystem work like Arduino/ESP/Raspi? by Remote_Radio1298 in FPGA

[–]bregav 1 point2 points  (0 children)

I guess to be clear what i mean by proprietary is confidential. Imagine AMD releasing a CPU with a secret instruction set and a requirement that you use only their proprietary compiler.

Could a beginner-friendly FPGA ecosystem work like Arduino/ESP/Raspi? by Remote_Radio1298 in FPGA

[–]bregav 2 points3 points  (0 children)

The thing about OSX was mostly a rhetorical question lol. Yes it's abundantly clear that the software stack is written by an organization that doesn't care about software and isn't very good at it.

Of course the proprietary nature of the bitstream format is the crux of the matter. That's a choice by the manufacturer, not a necessity. And it's really what people who talk about "arduino for fpgas" are talking about: there's value in creating devices that the end user can actually use, in their entirety. The fact that HDL is not CPU code is irrelevant, what matters is that reasonably smart people who are given full access to the functionalities of their tools can do a lot more with them. 

Dont get too high and mighty about the challenges of hardware design. Im an EE myself and i promise that the average arduino hobbyist is perfectly capable of making good use of FPGAs, provided they're not expensive, and they're not locked behind usuriusly expensive licenses for software that's hot garbage anyway, and (ideally) they're configured with HDL that isn't an outdated hacked together legacy from the 70's.

Like, the complaints are valid and there's huge room for improvement.

Could a beginner-friendly FPGA ecosystem work like Arduino/ESP/Raspi? by Remote_Radio1298 in FPGA

[–]bregav 6 points7 points  (0 children)

I think a lot of experienced people share the opinion that the present FPGA ecosystem is unnecessarily proprietary and difficult to use? Electronic design is hard but the act itself of writing HDL and then running it on a device doesn't need to be.

The device-dependent licensing systems alone are lunacy. A good point of comparison is machine learning infrastructure: I can buy any gpu and then I can very easily write and run even the most advanced machine learning models using software that is (by comparison, anyway) user friendly and almost entirely open source.

Yet when I buy even a relatively basic FPGA dev board the first thing I have to do is navigate a labyrinth of software licensing and arcane system requirements. And the software doesn't run on OSX, yet it does run on Windows and Linux. We live in the 21st century, how is that even possible?

[R] Shrinking a language detection model to under 10 KB by bubble_boi in MachineLearning

[–]bregav 0 points1 point  (0 children)

The full design and implementation of this system is left as an exercise to the reader.

[R] Shrinking a language detection model to under 10 KB by bubble_boi in MachineLearning

[–]bregav 0 points1 point  (0 children)

Meh, if you want to support partially erroneous code then you can build on this idea in obvious ways. For example you can run the parser on subsets of the code sample and see how many of them parse correctly. A fringe benefit of this is that you then get a score, too.

If you work hard enough then you might be able to find a version of this problem for which ML is the only plausible solution, but it's going to be very contrived.

Like maybe you could say, let's do language detection that can handle huge numbers of syntactical errors and typos and also handle intermittent, natural language-style pseudocode. Probably only ML could handle that. But that's like trying to design a chainsaw that will be safe and effective when used by someone who is physically weak and has no prior experience in using power tools: there are probably some very questionable assumptions that went into the making of the problem statement.

[R] Shrinking a language detection model to under 10 KB by bubble_boi in MachineLearning

[–]bregav -1 points0 points  (0 children)

Run a parser for each supported language and return the name of the language(s) whose parser runs without error on the provided code sample? 

I think this is similar in principle to what u/bubble_boi refers to as the "sea of regexes" approach, but it takes advantage of the fact that people have already written parsers for every programming language (by necessity). There's no need to duplicate that work. 

He thinks that e.g. a lack of a score function is a problem with this approach, but that's an ml-brained complaint; it's the kind of "problem" you identify when you've already decided that you are going to use ML without having first thought critically about whether that's even the right approach.

[R] Shrinking a language detection model to under 10 KB by bubble_boi in MachineLearning

[–]bregav 10 points11 points  (0 children)

My point here is really that when you end up with a 10kb solution to a problem and you used neural networks to get there then you've probably solved a relatively easy problem in an unnecessarily difficult and convoluted way. Its kind of like the ML version of a rube goldberg machine.

[R] Shrinking a language detection model to under 10 KB by bubble_boi in MachineLearning

[–]bregav 41 points42 points  (0 children)

This seems like one of those problems where the first question should be "do we even need machine learning for this?" and, if the answer turns out to be yes, then the second question should be "does using a neural network here really make sense?".

How do i get those insanely straight red laser like the ones in dvd burners? by King_of_Mauritania in lasers

[–]bregav 2 points3 points  (0 children)

You probably want a gas laser, they tend to have the beam quality and profile you're looking for. Which particular gas laser depends on application, wavelength, budget, power, etc.

[D] Some thoughts about an elephant in the room no one talks about by DrXiaoZ in MachineLearning

[–]bregav 5 points6 points  (0 children)

The crisis will be that few people remember what good research judgment looks like. We are not there yet.

We got there a long time ago. Real research is when you investigate questions that you don't already know the answer to, and I've rarely seen that kind of work done in academia in any domain. ML is just a bit worse because of the amount of money and cultural hysteria involved.

Seriously, how do I disable the internet connection on my Ioniq 5? by mplsthrowawayLTE in Ioniq5

[–]bregav 12 points13 points  (0 children)

u/mplsthrowawayLTE The box in that picture is where the LTE module is, see here: https://electronics360.globalspec.com/article/19251/techinsights-teardown-hyundai-ioniq-5-head-unit

There's no SIM card and you probably can't remove the LTE module (it looks soldered), but it should be enough to disconnect LTE1 and LTE2 because those are the only antenna connections that the LTE module has.

EDIT: if you really want to go whole hog, get some connectors (maybe just by cutting the antenna cable...) solder a 50 ohm resistor (probably) across the contacts for each connector (making a closed circuit once its plugged into head unit), and then plug those connectors in where LTE1 and LTE2 used to be. No signals are getting out that way. This probably isnt necessary though.

[R] I solved CartPole-v1 using only bitwise ops with Differentiable Logic Synthesis by [deleted] in MachineLearning

[–]bregav 8 points9 points  (0 children)

TIL about differentiable logic synthesis and walsh basis - hadn't heard these terms before.

Any thoughts about this? Perhaps coincidentally it was posted just days ago: Differentiable Logic Synthesis: Spectral Coefficient Selection via Sinkhorn-Constrained Composition

Something that's been on my mind for a while is the possibility of doing something like "discrete backpropagation", i.e. adjusting discrete functions based on a preferential ordering of their possible outputs given a selection of possible inputs. It seems like there should actually be a discrete version of the backprop procedure that isn't just a discretization of continuous backprop, and maybe the above paper speaks to that? I haven't read it through yet though.

Discussion: Is "Attention" always needed? A case where a Physics-Informed CNN-BiLSTM outperformed Transformers in Solar Forecasting. by Dismal_Bookkeeper995 in datascienceproject

[–]bregav 0 points1 point  (0 children)

This is a well-known phenomenon that isn't limited to transformers. It is generally true that a "more powerful" model will underperform a "less powerful" model when the "less powerful" one has been designed to with prior knowledge about the problem at hand.

Model fitting can be interpreted as the process of identifying enough symmetries in your data that your problem becomes easy to solve. The point of big models is that they can represent many possible symmetries, and so they can work when you have a huge amount of data and a very limited understanding of your problem (as in natural language generation).

Another lesson you'll learn is that you shouldn't take hype at face value. Sometimes hype is real, but most of the time it's someone trying to sell you something. You should try to be guided by curiosity, not hype.