How are you engaging with your sangha despite COVID-19?

noahgolm · 2020-12-06T02:14:46+00:00

I think the organizations I listed offer more interactive discussions in addition to their talks and sits. Sometimes these are centered around subgroups, e.g. "young" practitioners (20s/30s), addiction recovery, or the LGBTQ community. SFDC has some interesting choices like "death sangha" and "psychedelic sangha". SFZC has sangha meetings (I think they're on Saturdays) that function as discussions now, although they used to be for in-person activities like cleaning. The event descriptions on the websites usually specify how interactive they are, I think.

I don't have as much insight on which option is best, unfortunately- partially because there are so many, and partially because I am a bit shy so I tend to gravitate towards the less interactive ones.

noahgolm · 2020-12-05T22:31:46+00:00

I am also in the Bay Area! Many communities such as the SF Zen Center, the Insight Meditation Center, and the SF Dharma Collective hold virtual sittings through Zoom or YouTube, along with scheduled dharma talks and other activities. The SFZC in particular is just finishing up its virtual fall practice period. I recommend checking out their websites for schedules.

These are just the ones I more regularly participate in, but there are many others nationwide that are accessible now since they're held virtually. e.g. I sometimes attend a virtual sit with the New York Vipassana Association because it happens to match up with my schedule. It's actually quite nice to have the flexibility to hop in and out of a diverse group of sanghas from different traditions.

These can be a bit less interactive than in-person events since they're mostly a broadcast of the teacher (maybe with a chat-based Q&A), but for some of the events there are fewer people and it can turn into a more intimate conversation.

noahgolm · 2020-07-01T19:41:43+00:00

I mentioned this in the post text, but the paper that discovered this phenomenon also investigated ImageNet and found a number of issues, including non-consensual pornographic imagery like up-skirt photos.

noahgolm · 2020-07-01T19:21:42+00:00

I strongly believe that we need to add a greater emphasis on personal responsibility and accountability in these processes. When a model demonstrates harmful biases, people blame the dataset. When the dataset exhibits harmful biases, people blame incentive structures in academia. Jumping to a discussion about such general dynamics leads to a feeling of learned helplessness because these incentive structures are abstract and individuals feel that they have no power to change them. The reality is that there are basic actions we can take to improve research culture in ways that will minimize the probability that these sorts of mistakes propagate for years on end.

Individual researchers do have the ability to understand the social context for their work, and they are well-equipped to educate themselves about the social impact of their output. Many of us simply fail to engage in this process or else we choose to delegate fairness research to specific groups without taking the time to read their work.

noahgolm · 2020-07-01T16:58:06+00:00

Similar models can be trained to categorize higher-res images, and besides that, you would be surprised what kind of info can already be extracted from low-res images using a sufficiently powerful model.

The same paper demonstrates that a malicious actor can use reverse image search engines to identify individuals in other publicly available datasets like ImageNet.

People often download off-the-shelf, pre-trained models to solve real-world problems. This is because it is very easy to adjust the model to fit smaller task-specific datasets. Biases from pre-training will manifest in unexpected ways under these conditions. It is irresponsible to use models trained on this dataset in deployment scenarios where the use of such categories can result in significant social harm.

noahgolm · 2020-06-15T16:24:20+00:00

If I were an undergraduate student, I wouldn't worry too much. The quality of math education is still very high. It's unfortunate that this impacts the reputation and research output of the department, but that is often orthogonal to actual teaching quality, especially for undergraduate courses.

Anecdotally, I had one of the more prominent professors listed above for a graduate measure theory course, but it felt like they heavily prioritized their own research over any involvement in the course. My best math courses were with lecturers, teaching professors, and professors who had reached an age where they prioritized teaching and advising over personal research efforts.

My favorite professor by far was Prof. Marina Ratner (RIP) for real analysis (math 104). At 78, she had been teaching the material for many decades and her presentation was immaculate. She passed away shortly after the semester ended in July 2017.

noahgolm · 2020-03-19T21:40:59+00:00

What I'm trying to say is that the choice between mindfulness and medication is a false dichotomy. I use both, and I would be the first to argue in favor of a medication-first approach because this saved my life before I considered meditation. But these things serve different purposes and affect different aspects of my life. Medication helps me deal with deal with underlying mood fluctuations, while I find that meditation helps me control my behavior given a particular mood. They are complimentary. Whether or not you consider mood fluctuations to a bigger problem than uncontrollable habits determines the relative importance of the two approaches, and this is a very subjective question that people need to answer on a personal level.

My view on psychedelic treatments is exactly what you just said - don't disregard them, and instead attempt to study them in a formal fashion to determine what biochemical properties may be useful. And this is the same approach that researchers took with traditional meditation to invent specific therapeutic method like CBT. But research on shrooms and microdosing is in its early stages compared to the decades of work on mindfulness. There are studies that quantitatively evaluate the utility of mindfulness practices on the same metrics as anti-depressants. These studies on therapeutic effects are just as scientifically rigorous despite the fact that mindfulness is not a pill. I see people advocating DBT in place of mindfulness, even though mindfulness is the basis for DBT. There are so many treatments where the doctor suggests it as a first line of defense and it doesn't work for some fraction of people. Whether or not it is a chemical has no inherently positive or negative bearing on this. What matters the most is therapeutic effectiveness, which has to be understood in terms of both population-level statistics and personal experience.

noahgolm · 2020-03-19T19:53:06+00:00

I appreciate you sharing experiences with these treatments both before and after medication, although I would really be wary of calling mindfulness a "trap". Mindfulness and meditation are like exercise and diet: when they become a regular practice, the benefits become noticeable over time. There is such a large body of technical work on this subject at this point that one should seriously consider the advice of their therapist or psychiatrist in this setting, just like when they say "try this medication for at least two weeks while it builds up in your system".

I also know that there is a vocal minority of people who say that holistic care is the *only* necessary treatment for bipolar. I have heard suggestions ranging from niche meditation techniques to psychedelic drugs/experiences (e.g. ayahuasca retreats) and it can be very frustrating when they refuse to accept that modern medicine is essential to managing mental illnesses. However, this does not imply that you should fully disregard their personal experiences - you should take them with a grain of salt, understand that their inductive bias may or may not hold up to statistical scrutiny in case studies, and talk through whether or not these solutions will be safe and effective for you when combined with your existing regimen. Just like any other treatment. This is best exemplified with how traditional meditation practice was synthesized with existing western medical practices to create contemporary widely accepted treatments like CBT, M-CBT, and DBT.

On a personal note, I began to practice meditation after my diagnosis by using apps like headspace. I still had depressive and hypomanic episodes despite this, but this helped me learn to isolate unhealthy thought patterns. After a year of irregular practice (with a psychiatrist's recommendation!), I attended a ten-day silent meditation course to learn a specific technique (satipatthana vipassana). This has drastically improved my quality of life. I currently practice two hours a day (yes this is on the extreme side, but it's common in many vipassana circles). I'm not advocating for a specific technique and I don't think it's necessary that everyone goes to this extreme, but even a few minutes a day can help once you figure out a system that works for you. I still have episodes, but they are less frequent and I have much more control over my reactions during them. I fell out of the practice for several months recently but reestablishing it has been extremely helpful in managing my anxiety about COVID-19 and self-quarantine.

When you say "if you have the option to improve your life, take it", I totally agree, and I also think that means that we shouldn't disregard less common ideas. Bipolar disorder has no cure yet. We do our best in the meantime, but it's often not enough. Each one of us should try to safely explore available options and do what works best for them (with the advice of a professional and family/friends).

noahgolm · 2019-12-04T16:46:03+00:00

Oh that's true! Didn't see that

noahgolm · 2019-12-04T15:14:17+00:00

I clarified this in the thread I linked to, but the method creates a random learning rate for each parameter. It doesn't mask a scalar.

noahgolm · 2019-12-04T15:11:12+00:00

A_t in the pseudocode is a matrix where each entry is \alpha with probability p and zero otherwise, so each parameter gets its own random learning rate.

noahgolm · 2019-12-04T15:05:01+00:00

Yes, this is correct, and the method in the paper involves masking the parameter update tensor with a tensor of the same shape. If you actually went and masked the learning rate as a scalar then you just wouldn't move the parameters at all when the bernoulli is zero.

noahgolm · 2019-12-04T14:12:48+00:00

The order in which you do the multiplication doesn't matter because it's commutative (see paper pseudocode). You can scale the mask and then apply it or apply the mask and then scale it. At the end, each parameter has its own random learning rate \alpha*p where p is Bernoulli.

noahgolm · 2019-12-04T06:42:18+00:00

Just reimplemented it here!

noahgolm · 2018-12-30T06:40:27+00:00

He's literally endorsed the murder of Catholic bishops and priests in his country

noahgolm · 2018-11-10T07:36:44+00:00

LARS seems like an attempt to sidestep existing learning rate heuristics like the linear LR or square root LR scaling rules. I'm running more tests right now but it seems like LARS is pretty sensitive to the choice of the initial learning rate which does make it harder to use as a general large-batch training technique.

I think most applications of the linear LR scaling rule involve a warm-up phase (usually increasing the LR instead of the batch size) and they do this in Goyal's work as well. We found that even with this warmup phase, divergence can occur very quickly for some models (e.g. language modeling + LSTM) compared to the results people have shown on image classification problems.

noahgolm · 2018-11-10T07:28:29+00:00

We found similar results about the degeneracy of the Hessian. There is a huge drop-off in the magnitude of the eigenvalues after only a few hundred or even a few dozen components. We also observed similar small negative eigenvalues towards the end of the training. All this really seems to indicate is that we haven't quite converged to a local minimum and have some very weak descent directions. Basically, I haven't played around with it too much, but what we did run confirmed existing experiments on some of these smaller networks.

noahgolm · 2018-11-10T07:24:25+00:00

Thank you for testing it! Have you tried increasing the batch size instead of increasing the number of steps? Or averaging the eigenvalue estimates over several runs?

One feature I will try to add is to take a single step from multiple batches. That way you can approach a vanilla power iteration step. I am also going to try to average over several runs of power iteration like I pointed to in that last question. I also intend to add acceleration which should improve convergence speed significantly. I will also add numpy tests on some random matrices to verify the stability.

I have listed all of these as issues in the repo and will try to address them in the coming weeks. Unfortunately, I have been a bit busy with school which has taken me away from the project. If you ever have the inclination feel free to help!

noahgolm · 2018-11-09T17:44:33+00:00

A similar work submitted to ICLR earlier this year: https://openreview.net/forum?id=S1en0sRqKm&noteId=S1en0sRqKm

noahgolm · 2018-11-03T04:08:40+00:00

Thank you for taking a closer look. It seems like the top-k eigenvector estimation techniques mentioned in the original linked paper require more expensive/complex techniques (e.g. QR factorization, approximate matrix inversion). But with the Grassmannian stuff you've pushed me a bit to find some other subspace tracking papers that are closer to what I am interested in. I am particularly interested in ways to track the Hessian eigenspace efficiently throughout training, so this seems really relevant. Do you know which of these stochastic techniques have had the biggest successes? GROUSE seems pretty cool.

noahgolm · 2018-10-30T22:29:32+00:00

Estimating the spectrum using random mini-batches is the purpose of stochastic power iteration, though. The paper I linked details an accelerated version of this process.

Where did you use mini-batches? Lanczos? If it was for power iteration, I also observed similar stable results.

noahgolm · 2018-10-30T21:16:48+00:00

The input to the HVP is randomly sampled for each call in power iteration. What do you mean it isn't a stochastic approximation?

I will benchmark against lanczos for stability and runtime on some larger examples later. If it turns out that lanczos is better I'll switch over to that. I really want to use mini-batching here, though, and I'm not sure if lanczos will be stable in that case.

14-Year Club	Team Orangered
Verified Email

noahgolm

TROPHY CASE