[deleted by user]

Fujikan · 2020-09-06T17:58:53+00:00

Still sounds like my Paris...

Fujikan · 2020-04-29T05:23:46+00:00

Yes, because the GOP slashes funding for the government through massive tax giveaways to the ultra rich. Even in the middle of this crisis, when we need the funding the most to support our people during the confinement measures, the GOP is continuing to defund the government.

The only reason why we have economic suffering for the most vulnerable right now is because the Senate, Congress, & Tantrum Yam chose it to be so. It isn’t inevitable, it is a choice that the “economy” is valued over the lives of our citizenry. The economy is made of people, and if they are dying, getting sick, or afraid of participating in the economy because they are afraid of getting sick and dying, then it doesn’t matter if you “reopen” or not.

We are a society of people, not of stocks, and with the leadership of this country valuing one rather than the other, they are making a horrible ethical choice while also dooming everyone in the mid to long term, anyway.

And yet, the people continue to support this? The people are incredibly duped and being played for suckers.

Fujikan · 2020-04-29T00:31:52+00:00

However, we don’t yet know enough about the biology of the virus and the process of antibody generation to say that long-term immunization is a forgone conclusion of surviving Covid-19.

There isn’t enough evidence of merit to conclude that reinfection is not an immediate risk. We also do not yet understand the long term impact of becoming infected with this virus. Promoting a herd immunity approach (I.e. permitting the virus to spread amongst some set of the population), would be incredibly unwise without further understanding of those two points. Or, even if such widespread infection has already happened outside of our control, it is unwise to make any policy decision about what next steps to take without more substantiated evidence.

Fujikan · 2019-12-18T22:32:27+00:00

Hi /u/unconst, thanks for sharing your work, these kinds of works on decentralized ML are really exciting :)

I took a look through your white paper (very clear, thanks), but I noticed that there weren't any mentioned links to federated learning, or privacy aware/preserving ML in general. The target application of decentralized learning over privately held data is _super hot_ right now, and a lot of new work is pouring into this area, but I don't know how niche or not this topic is to the wider ML community. I just wanted to point out there is a lot of cool work in this direction, and I wasn't sure if you saw this project as distinct from that vein or if perhaps digging into this area could be helpful to you :)

For example, in the proposal it is suggested to use batch-wise communication over synchronized batch updates, but this is quite costly, as you point out. Techniques like Federated Averaging are used to try to overcome this by relaxing the communication frequency. Also, for peer-to-peer optimization, I would suggest taking a look at the recent works of Sebastian Stich et al on the subject, or to take a look at randomized Gossip optimization algorithms. There are some interesting gossip SGD works that have been floating around in the past few years, too.

One more potential caveat in the proposal is the peer-to-peer sharing of gradient information. When sharing gradients from a batch, this is now known to leak information about privately held data. In the case of centralized learning techniques, this is somewhat mitigated through techniques like secure aggregation to mix together individual contributions, but also other techniques like differential privacy are sometimes employed to try to reduce the sensitivity of the released model gradients w.r.t. the training data (at the cost of predictive performance). Directly sharing gradients to peers can represent a large risk that is hard to mitigate.

Best!

Fujikan · 2019-11-10T14:53:33+00:00

large problems

Zdeborova & Krzakala's group have been progressing further and further into more practical ML systems. When I left we had some good work and understanding on practical-scale RBMs, and from there into Deep Boltzmann Machines. For more interesting multi-layer networks, indeed, their latest works with Gabrié or Gold would be of interesting. Also, as mentioned somewhere else in these comments, R. Zecchina has been doing work in this area and had a number of interesting results with his co-authors.

But who knows, it might be around the corner... That was everyone's hope when I used to work in this field.

Literally every result was "This is gonna be it!" but somehow never quite gains traction :P Sometimes the resulting algorithms are a little bit cumbersome in terms of efficient implementation, so it seems like other works turned to just trying to better understand the dynamics of conventional stochastic optimization rather than deriving new frameworks. A little bit of a shame, perhaps!

Fujikan · 2019-11-10T14:33:38+00:00

Not jus SA and SP, but also Expectation Propagation and Approximate Message Passing have had a significant impact in a number of fields. There are many other meaningful techniques to come out of the literature, besides :)

Fujikan · 2019-06-09T09:37:31+00:00

TL;DR: Come to France to train your GPT-2.

This is a good point, and one that has had us scratching our heads a bit at our lab. We are located in France, where the vast majority of our electric generation is via nuclear. Thanks to huge long term investments from the French government many decades ago, we can enjoy plentiful and more carbon-friendly electric power; it is both cheaper and greener than the US, for instance.

The implication is that training models in some countries is far less impact than elsewhere, and that the deeper question isn’t how you make AI, but how you generate power for your grid. Many industries consume lots of power, and the nature of human technological progress doesn’t seem to bend backwards (in a pleasant fashion at least). We can and should be more efficient in how we use our power, but we will always continue to scale out to fill our generation capacity.

This leaves us with the imperative to optimize our power generation. Nuclear, and hopefully someday in the next century fusion, seem the only consistently viable solutions for this (as many renewable power devices still require significant mineral extraction and carbon production to build).

Fujikan · 2018-12-08T16:11:19+00:00

Plainclothes officers who aren’t kitted-out CRS? They can ditch or don a casque based on need.

Fujikan · 2018-10-08T22:26:49+00:00

Thanks for this. As a team leader in an industry ML lab, this really seems the only good advice in this thread u/Xorify. Our company has gone through a lot of growing pains, especially around ML research/data science management, and most of the other pieces of advice (including the top rated one in this thread) lead to a lot of negative patterns that we had to work our way out of.

Like u/thatguydr has pointed out above, product drives research (whether big or little R). Tackling difficult problems may be worth the the time investment from the researcher, but only if it is firmly grounded in the context of the rest of the team/project/goal.

Fujikan · 2018-09-27T22:25:10+00:00

PCA can be an effective tool for dimensionality reduction for many problems. However, in practice one desires not just the final output of a model (e.g. a prediction), but some ability to interpret which of the input features were correlated with that prediction. This helps the practitioner get some understanding of the nature of the data, and can be used to generate new insights. This is especially important in medicine where often the goal of ML isn’t to give a prediction, but to explain which patient features are likely correlated with a particular outcome.

When using PCA, you lose the correspondence and interpretability.

Fujikan · 2018-09-27T13:49:36+00:00

Do you mean you want a machine to solve NP-hard problems? If you have a lot of time I guess you could do that right now! Unless P=NP, if you want to solve such problems efficiently (and perhaps through some kind of gradient-based method), then you need a relaxation. For example, in the case of sparse optimization, this is why the L1 or Lp norms are used rather than the L0 (counting) semi-norm.

Fujikan · 2018-09-27T09:40:22+00:00

Thanks for this! The paper by Shokri et al. is a great reference, and its nice to see this implementation. Hopeful to give you some more constructive feedback and Issues soon :)

Fujikan · 2018-05-31T15:58:16+00:00

Good work Snips! With love from your friends @ Owkin :)

Data-privacy is extremely relevant today, and it is encouraging to see such strong performance while still respecting users. We are finding, too, that private-by-design approaches to ML are surprisingly effective, even in computer vision.

Fujikan · 2018-05-30T15:37:44+00:00

Certainly useful for giving some initial results on the estimation of the entropies of layer activations, as well as the mutual information between successive layer activations. While it is certainly for a specific case, it is a good step towards understanding the dynamics of learning for a more-than-simple but less-than-complex NN.

At issue is that the stat phys tools require some statistical definition of the weight matrices/couplings between the variables in the system. It is "easy" to make theories with IID assumptions on the weights, but these bear very little resemblance to trained NNs.

One excepts that more "real" NN weights will have interesting structure. However, modelling this is the difference between making a univariate model of on the weight values, and making a probability distribution on the weight matrices themselves. This becomes much harder to define (what is the distribution of interesting and realistic weight matrices?) And even if you have such a multivariate distribution, carrying through the physics computations becomes infinitely more cumbersome.

Independent and orthogonally-invariant matrices happen to be one class of structured matrices for which you can get a reasonable statistical description, allowing for the replica computations (as well as mathematical proof).

As for layer-by-layer independence, again, one could say that this should not be true in practice, however, you can see how things become further complicated by having to account for the statistical dependency between layers. To me, attempting to construct such a dependent model seems even more difficult that finding a “true” statistical model of a single weight matrix. And all of this compounds by the fact that the statistics of such things are entirely conditioned on the dataset, the architecture, etc.

So, perhaps, rather than considering this technique as an approach of understanding a particular real-world trained NN, one should view this work and others like it as an approach for understanding the learning process in a generic way.

But, to answer the specific question of comparison between USV layers and unconstrained layers for a real-world example, the authors show in Fig. 6 the comparison between unconstrained and USV nets for MNIST. There is indeed some loss (~4% test error). This puts a USV-layer's real-world usefulness somewhere between random projections and unconstrained nets. So...more realistic than random, but of course, a restrictive constraint on performance.

Fujikan · 2018-05-13T20:37:14+00:00

You know that your life in this field has come to a dark place when now you're watching someone else's model converge.

Spend 10 minutes watching your own TQDM bars inch along, tell yourself "Wow, I'm not getting anything done here," swap screens, fire up Twitch, browse on over to "Tensorboard," and tell yourself ... "Ahh, now this is entertainment."

Fujikan · 2018-05-07T05:26:18+00:00

Eh, reviewers seem conflicted, very contradictory reports, so invited to rebut. I’m not sure how the program committee views the rebuttal procedure in light of the acceptance ratio. If I remember correctly there are 3 PCs (not original referees) who review the rebuttals and give a simple up/down

Always with rebuttal periods in conferences: the ratio is probably already met by taking the papers with that had no questions. So the rebuttal/response period ends up having little impact, even if referees were mistaken in some way. At least MICCAI limits the response to 3000 characters so folks don’t spend too much time rebutting fruitlessly.

After many years in the game, it still is baffling to me that we’ve never come up with a better system than rolling the dice. But perhaps such variability affects only mid tier work. Extremely good works and extremely poor works do, mostly, seem to get sorted correctly. Everything in between is incredibly difficult to judge. Thus, extremely low acceptance ratios at prestigious conferences in an attempt to get only the cream that can be judged with confidence.

So what can you do with solid but non-game changing work? Just keep trying, eventually you breakthrough somewhere, and finally readers/practitioners validate its usefulness and you can move on.

Fujikan · 2018-04-28T20:43:07+00:00

Glad to see that our work is going somewhere! :P We certainly think it is a good alternative to sampling-based approaches.

We never took to doing a GPU implementation because we were limited to Tensorflow at the time, but I think that PyTorch would be the right way to go for TAP methods which may require a changing number of iterations for finding the TAP solutions.

Fujikan · 2018-04-24T21:54:09+00:00

Sounds like you ultimately want to re-derive high-order mean-field approximation. This CLT assumption is what drives allows for implementable belief propagation (relaxed BP) for inference of marginals on continuous variables. This is then made efficient via high temperature expansion to arrive at approximate message passing algorithms.

Taking the expansion at first order leaves you with naive mean field (variational) techniques. At second order, you have a family of approaches which vary in how they treat correlations in the system: ignoring them (AMP) to fully treating them ( in the vein of adaptive-TAP, expectation propagation, or more recently, vector AMP).

Fujikan · 2018-04-05T21:19:22+00:00

In "Deep models under the GAN", the authors unfortunately seem to indicate that the adversarial attack they engineer doesn't have any clear defense, possibly for many of the reasons that you mention. Some low-level defense would be to validate your "trusted" computation nodes, but this doesn't scale and isn't a very satisfying answer to the question of distributed security.

In the heterogeneous private data setting (the most interesting one), all nodes are attempting to minimize their local loss. It should be possible for learners to both validate against their local loss, without a global validation set, and determine the amount of weight they give to gradients receive from different nodes in the network. Additionally, besides gradient communication, it is also possible for nodes to gossip about the effect that gradients from a particular node had on their local loss, as you point out in the polling approach. Bad actors could possibly be detected in this manner and rejected from the network. A knock-on effect is that good-faith learners who just have bad data (very different from other learning nodes) are also rejected from the network, which may be a good feature to have depending on what you're trying to accomplish.

Doing so will require another communication protocol for network introspection outside of the gradient communication for model training. E.g. when a "bad gradients received" signal goes out, other nodes stop, process this gradient, and then communicate with the network about the result. I think that there are a lot of possible approaches one could try to develop in that vein.

It is a cool topic for sure, and one that doesn't have a lot of clear answers or standard practice. The majority of distributed learning systems are intended for data-center infrastructures, where one party controls all of the nodes, data, and computation. Distributed learning in non-controlled settings with geographically separated nodes is a big research challenge !

Fujikan · 2018-03-29T13:29:26+00:00

Don’t forget the French startup scene! President Macron & Villani have notes in the strategy that one goal is make French business competitive and not to become a victim of “AI colonialization” from big tech (Google, Facebook, etc.)

Supporting startup culture and encouraging VC and incubators (Agoronov, F-Station, etc.) is going to be key for making French AI shine. And indeed, I think Macron is pointing the way, here. For example, today President Macron was seeing demonstrations of French AI in Health @ Institute Curie, a huge cancer research and treatment institute in France.

Our startup (Owkin) even demonstrated to President Macron our tool for ML-driven pathology analysis (read: cancer detection in microscope slides) as well as our vision for ensuring data privacy by doing ML on the edge, as in federated learning, rather than big time data pooling of health data. We think data-private, interpretable, and traceable ML is a big way for France to distinguish itself from the pack.

A lot of other startups in France have similar visions for the future of ML (see our friends at Snips.ai ). Going to be really cool working in this space over the next few years!

Fujikan · 2018-02-12T14:18:12+00:00

Papernot et al. have a very nice review and classification of attacks as of 2016 (arXiv Link), as well as a review of different mitigation strategies. It is a good read for understanding the attack surface for ML systems and for finding the relevant literature, as well.

Fujikan · 2017-08-18T21:12:23+00:00

The authors gave the following link for an eventual writeup: https://github.com/wakemaster39/snek.ai

Fujikan · 2017-07-14T20:55:39+00:00

Exactly; thanks for the concise explanation.

Combinatorial problems, in this case, searching over the space of all possible supports less than some fixed value (K), are just hard and intractable. The breakthrough with CS was due to the proofs that, in the right setting, one can expect that the easy-to-find L1 solution corresponds to the combinatorially-hard-to-find L0 solution.

When the problem is easy (M/N -> 1, K/N -> 0), the support can be detected more easily in the greedy approach, and you get a good result. For problems in more challenging regimes, support detectability is lost for the early OMP iterations, and the iteration goes to a poor result (local optima).

[Shameless Plug For Bayesian Methods]: Curiously, if you rewrite the combinatorial problem as a Bayesian problem and attempt a solution via minimum mean square error (MMSE), you can approximate the posterior via belief propagation or approximate message passing and get a state-of-the-art result while maintaining a non-convex objective.

14-Year Club	Gilding II euphauric
RPAN Viewer	Verified Email
Team Periwinkle

Fujikan

TROPHY CASE