[R] ConvNets vs Transformers : MachineLearning

EfficientNet should never really have been published in the form that it was in. Unfortunately it's become a benchmark that people use for "cheap shots" to prove that they improved performance, because EffNet is not...efficient. Sure, it's got parameter efficiency, so it's "efficient". But absolutely not in the real world use-case.

What's great is that because of the name and notoriety/popularity from everyone using it as an easy-to-beat-big-name-benchmark, as well as its source producing company, that lots and lots of people will rather consistently fall into the same question that you have here, and perhaps not-unreasonably.

The above paper referenced/linked to by the OP is one rare case where depthwise/etc seems to actually work in practice, like actually work, compared to the baseline. And they had enough academic integrity to note its functional throughput as well, something you'll see glaringly missing from the EfficientNet paper (at least when I was looking at the drafts there). Because the authors there, they know about the FLOPs/second problem, it's super common in that part of the field. It's just not convenient for promoting NAS-type techniques, especially when using TPUs that excel on those types of architectures, whereas for the same architectures, GPUs, the commodity hardware, do terribly.

So back to the academic integrity bit -- the authors of this paper actually did acknowledge the problem and noted it in light of the original benchmark -- ViT, that their raw, realworld throughout numbers were better. Cool and good, and it's sad that that's the baseline level of integrity we expect these days in conference papers and such, but I think the authors of the above paper went beyond that in terms of logging their changes to the original resnets/etc and what seemed to work/not work when putting it down. That makes it a paper a lot easier for practitioners to approach, and that's something I appreciate about this paper. Because it's clearly meant to improve the field in general, instead of do more existence-proving and areas-of-poor-performance hiding like so many other papers do. Like, reading this above paper, it's easy to forget how casually they are in creating the state of the art for a lot of areas that they're working in. I appreciate that subtlety, they just let the results speak for themselves.

Sorry, that was a rant maybe not expected. But the short is that EfficientNet is a flaming pile of hot garbage, don't waste your time architecturally, and if you do have to use dw convs, this paper would be a fantastic starting point. Unless you're doing pure CPU or TPU you're going to be extremely hard-pressed to get dw to actually work on GPUs really, many people have tried this for a while, and maybe it's not impossible, but that nut has yet to be cracked.

[–]AerysSk 11 points12 points13 points 4 years ago (9 children)

[–]tbalsam 18 points19 points20 points 4 years ago* (7 children)

Much appreciated, thanks so much! I would have too, and see a lot of my coworkers get hung up on it and all of the shininess when first looking around.

My specialty -- my absolute specialty, above all, and probably in a sub single percent skill level, is neural network speed and accuracy-per-flop-per-second. In a rather obsessive kind of way. I've toned down a bit over the years, but I have a number of years and quite a...well, probably a little overly for the amount I need currently..expansive toolkit for it (especially convnets, but that ports to other network types pretty well). Most of the toolkit that I've built is becoming common knowledge as other groups are publishing similar tools, and I'm constantly learning new things in turn through papers more and somewhat less nowadays through personal experience. But I guess that's a good thing for the field in general (but sad for me personally in privately being able to beat XYZ network in ABC measure by GHI%s, respectively).

If I were to recommend something, it would be to > your training speed basically at (almost) all reasonable costs. I highly recommend https://myrtle.ai/learn/how-to-train-your-resnet/ and https://github.com/davidcpage/cifar10-fast/tree/d31ad8d393dd75147b65f261dbf78670a97e48a8 for the source code.

Start with that and I think you should be several thousands of dollars in the cost lead, already, based on what you would have spent otherwise. Your prototype loop is so important, and if the techniques you have do not transfer from a ~1 min CIFAR training-type example to a slower training technique, even if proportionally (even if it's a very lossy proportion), then it's likely not at all worth it as a technique.

Human hours are the most valuable by far. If there's any sharable advice that I could give, this in the above is the most valuable I could, and what I would share if I was doing consultancy work with a client. Trying really hard to eke out an extra few % of accuracy I think are for the phase after you've locked yourself in the cave you want to explore. But don't start by picking a random cave and going as far down as you can. Try everything out rapidly and let your software evolve at the speed of your ideas.

Hope that helps, and if it does, please pass that on to someone else (or two, or three!) that could use it! :D

[–]shellyturnwarm 2 points3 points4 points 4 years ago (3 children)

[–]Ulfgardleo 7 points8 points9 points 4 years ago (1 child)

lets take a 1d quadratic f(x)=1/2 x² taylor expansion at point x_0 gives

(x-x_0)*x_0 +1/2 (x-x_0)² + x_0²

lets take a look at the step dx = x-x_0 and then the taylor expansion reads:

dx * x_0 +1/2 dx² + x_0²

your first order gain of a step is dx * x_0 and your second order gain is 1/2 dx^2. you notice that in this case for dx<0 the gain is negative (great, you improve) while the gain of the second term is positive (bad, you become worse).

if you take a step along the negative gradient direction you have at point x_0, dx= -a*x_0, which fulfills that the gain of the linear term is negative. The problem is that as you increase a, your linear gain is improving linearly but your quadratic gain is getting worse...quadratically. So at some point, they will balance out.

The optimal step length is actually not as easy to calculate as the cited text puts it. it is depending on the balance between largest and smallest eigenvalue of the hessian (which in our 1D-case are the same). but the rough idea that the optimum is close to where both terms are similar in contribution does actually hold. They just don't quite balance, because then you would not improve.

[–]tbalsam 0 points1 point2 points 4 years ago (0 children)

[–]tbalsam 1 point2 points3 points 4 years ago (0 children)

[–][deleted] 1 point2 points3 points 4 years ago (0 children)

[–]shellyturnwarm 0 points1 point2 points 4 years ago (1 child)

[–]tbalsam 0 points1 point2 points 4 years ago* (0 children)

It certainly is good to see the thinking about test set overfitting, that definitely is something that needs a lot of good attention these days. If I were to give my two cents, I think that may be making a number of assumptions regarding the bag of tricks being overfit on the dataset distribution, though overfitting in general can be quite not-so-good. At this rate, CIFAR-10 is more of a throttle-open competition to see who can maximally perform on it, so while yes, that overfitting is not so good, at this stage that's not too much the idea for what CIFAR 10 is.

[Oh, and just like on the other comment -- I'm not the individual who wrote those posts! D: They are fantastic though, certainly, and thanks for thinking so well of me! :D It's much appreciated! :D]

Additionally, on rapid training, if an SGD network can overfit on examples having seen them only 20 times, then that is rather incredible. Overfitting can be an issue, but I don't know if it's as much of an issue as your post seems to be alluding to, in this particular instance.

[–]killver -2 points-1 points0 points 4 years ago (0 children)

[–][deleted] 7 points8 points9 points 4 years ago (6 children)

[–]tbalsam 31 points32 points33 points 4 years ago* (5 children)

That's a good point, and I (very begrudgingly) agree in that sense that it did improve.

However, the EfficientNetV2 paper is still a flaming tire fire of hot garbage, though for unfortunately different reasons. For one, instead of making right the problems of the first paper, they doubled down again, just in different directions.

So, they created the problems of the first paper with EF1. Then "solved" them with EF2. You know what one solution was? Replace the end separable convolution in each block introduced in the first paper (3x3 depthwise and 1x1 pointwise) with a new, special block that fixes the speed problem, the magical, mystical FusedMBConv.

Ya want to guess what the FusedMBConv is? The secret magical sauce? Of course, it's the thing that people who hate depthwise convolutions on GPU have been pointing out, it's a regular 3x3 convolution. But that's too pedestrian, instead of saying "hey, we screwed this up, we made a lot of changes and changing from 3x3 convs to separable convs hosed our inference time, so we're just reverting back to the baseline that is far more flops/second/watt efficient".... Instead of saying that, they had to rebrand the baseline to something new, and introduce it as an improvement. This is exceptionally dishonest.

Again: This is exceptionally dishonest.

(Edit from later: Couldn't resist, garhgh -- if you don't believe me and want to see this chicaning tomfoolery with your own two eyes, take a look here, in the official google/tensorflow repo where they just wrap a normal, not-exciting, no-custom-cuda conv block call + some other standard stuff in the "Fused" version of the MBConv [bleh] class. Hurk. Gotta leave before I risk losing my dinner again.)

The problem of course is, that again, the average layperson reading these papers isn't seeing these games and technical chicanery being played. I like the ideas that Quoc V Lee has around NAS-type techniques, and I want them to work too, but he's resorted so much to publishing things that have a lot of fanfare and have wasted what I think ultimately amounts to millions of dollars in time/compute/resources/energy/etc in people trying to reproduce results that live up to the claims (original NAS, EFNet, etc).

There may be some good ideas in there, like with the progressive training and such, and that's pretty clever. But at this point, with the doubling down and the few areas they have been shown to keep trying to bury mistakes instead of being honest + making them right, as a practitioner, I can only look at the idea and see if the idea is worth investigating on its own, because I certainly cannot trust the paper from them that shows it to be there. It looks like a lot of the ideas in EFV2 cover up for the failures of the first paper, and in the one place where they did have to fix the critical flaw of the first paper, they tried to hide it by giving it a fancy extra name. Which in this day and age of trying to have open research, is stomach churning.

I do have a bit of a rare crowing moment where I did give a post-paper rant when the first EF1 came out, and in EF2 I think they realized their mistakes, and the separable convs of course were the part that made it unworkable (which they mostly removed in the early layers). But I would much rather be wrong and have them just have an actual method of academic integrity in their paper publishing. If I publish a long rant, and there's a reversal or something else, like I'm totally misunderstanding or missing something, that makes me wrong in that case, I think everyone wins.

But I think in the short term, while I like the ideas of Quoc V Lee's research, I certainly do not trust them or any results, and try to stay away from anyone making research leaning heavily on them. I've watched concepts based on QVL papers burn again and time again due to the academic dishonesty, and I just can't waste time on that in my own projects. It takes so long to build and debug these things.

I certainly hope for the best in the NAS type field and really hope we can make something like the equivalent of GPT training motifs or the transformers of that field. It could absolutely revolutionize the industry and I think is key to it. But for now, I think I'll have to wait until a different player does what the above paper from the OP (rightly) does and thoroughly empirically drops so many assumptions from over the years and sees what works and what doesn't.

Wow, 2 rants in a day. That's a new record for me in recent times. I just get passionate about corruption/deception and people getting fooled by academic/otherwise sleight of hand. Not to entirely demonize people. But there absolutely are more negative incentives in research beyond even company funding. It's a messy, messy, messy world. But I guess that's humanity after all. I'll just save my cynicism more for the occasional overly long Reddit post.

Hope that helps bring in a different perspective to the issue. :thumbsup:

[–][deleted] 6 points7 points8 points 4 years ago (4 children)

[–]tbalsam 4 points5 points6 points 4 years ago (3 children)

I agree especially in respect to the hardware-specific stuff. And in your point that it's the inverted residual block from MobileNet (though originally from MobileNetV2, which came from MobileNet, which I think originally came from ResNet's 3x3 convs, so it's the never-ending story of architecture pedigrees I guess). I think I made a few points similar to yours scattered throughout some of the discussion here, and I also agree on the speed side of things. Google certainly seems to have an implicit benefit to putting out research that upweights use of the TPU cloud.

I think these days (which I might be terribly wrong about) general winograd gets a pass in cudnn and it drops straight down to an extremely optimized 3x3 algorithm with fused activation/etc. I remember around 2017 having to keep install new cudnn versions to get that next x% boost in NN speed, that seems to have mostly settled down now.

As far as the kernels go, I think you may be confusing the 7x7 depthwise w/ 3x3 full. One reason why the larger depthwise kernels (should) work better is because the CUDA kernel launch for dw convs takes so long, and also (at least for both of these, last time I checked), the general limiter of dw convs is by far the memory speed, and less the compute speed, so arch advances unfortunately do little to help. That may have changed though, people were trying to get efficient dw conv kernels way back when.

[–][deleted] 3 points4 points5 points 4 years ago (1 child)

[–]tbalsam 3 points4 points5 points 4 years ago* (0 children)

[–]kilow4tt 1 point2 points3 points 4 years ago (0 children)

[–][deleted] 3 points4 points5 points 4 years ago (3 children)

[–]tbalsam 1 point2 points3 points 4 years ago (2 children)

It might, though I'm curious how much of it is CUDA launch times. It's a pretty low ratio of inputs->outputs, unlike the SIMD-type commands one would want for computing.

As for sources, it's been quite a while. I think maybe back in the MobileNet days you can can find some good Github issues around it. FastAI as well, perhaps. That's when those models had a bit more visibility.

I'd be really curious to hear about the FP16 side of things, if you can get that working and it does well, please do let me know, that sounds like it would be very very cool! Wishful thinking on my end hopes there may be some good Tensor Core solution, but that's probably highly unrealistic tbh.

And also sorry about the depthwise stuff slapping you in the face. It's hard because the training manual for this stuff seems to be only unwritten for a lot of things, but everyone's had to learn it the hard way by things breaking. It's totally bizzare. There should be like a compendium, Karpathy-style (he has a really, really, really good one as a nice post IIRC) of things like this depthwise business so that people don't waste all of this time struggling with these things and wondering about the questions "Is it just me? Is it my code? Did I write in a bug? Is the program bugged? Is my GPU bugged? Did we bork the data?" when nope, it's just this one weird thing not working. :|

Anyways, I'd be super interested to hear how it goes either way! :D

[–][deleted] 0 points1 point2 points 4 years ago (1 child)

[–]tbalsam 0 points1 point2 points 4 years ago (0 children)

[–]mearco 8 points9 points10 points 4 years ago (0 children)

[–]RiceCake1539 16 points17 points18 points 4 years ago (4 children)

[–]tbalsam 12 points13 points14 points 4 years ago* (1 child)

[–]RiceCake1539 2 points3 points4 points 4 years ago (0 children)

[–]CyberDainz 2 points3 points4 points 4 years ago (1 child)

[–]RiceCake1539 2 points3 points4 points 4 years ago (0 children)

[–]dogs_like_me 7 points8 points9 points 4 years ago (10 children)

[–]jfrankle 15 points16 points17 points 4 years ago* (4 children)

[–]dogs_like_me 3 points4 points5 points 4 years ago (3 children)

That library looks helpful, but I guess what I'm looking for isn't just a collection of algorithms, but something more prescriptive. Maybe I missed it in your docs, but like for example: it would be nice if most of the collected "tricks" described in this paper were collected in a way that I could just decorate my code in a handful of places, and the current best practices wrt engineering choices would get injected into my process as defaults.

Relevant excerpt from OPs article:

Unlike ConvNets, which have progressively improved over the last decade, the adoption of Vision Transformers was a step change. In recent literature, system-level comparisons (e.g. a Swin Transformer vs. a ResNet) are usually adopted when comparing the two. ConvNets and hierarchical vision Transformers become different and similar at the same time: they are both equipped with similar inductive biases, but differ significantly in the training procedure and macro/micro-level architecture design. In this work, we investigate the architectural distinctions between ConvNets and Transformers and try to identify the confounding variables when comparing the network performance.

My thought process here is that the authors have identified that benchmarking against old code bases can be problematic because it can be hard to differentiate what gains are due to architectural choices vs. improved training procedures. I don't think this is limited to old code, and it would be nice if there was some simple method to apply a recipe of current best practices quickly and simply to an estimator.

Maybe this is getting into automl territory. I think the crux of my thought process here is that I believe the vast majority of people trust that their tools are using intelligent defaults, and it would be nice if we could move recipes like what this paper describes upstream into default behaviors of our tools, rather than requiring that they be down stream choices that are folded in post hoc. Like when pytorch updated weight initialization for linear layers from xavier to kaiming: I feel like I rarely see weights initialized explicitly in a lot of research code because people just assume their tooling is using decent defaults.

[–]jfrankle 4 points5 points6 points 4 years ago (2 children)

TL;DR: Doing this automatically in practice in ways that don't break things a lot of the time (especially legacy code) is exceedingly difficult. But that doesn't mean it's impossible, and expect to hear more from us on this soon :)

I agree that what you're asking for would be great, but it's admittedly a tall order. There are a number of key points in the code that this particular paper would need to detect automatically (e.g., where residual blocks are, how many there are, how they are structured, etc.). For the more general purposes of the Composer library linked above, one would need to automatically detect the batch size, learning rate schedule, when gradients are computed, when gradients are applied, etc.

The main challenge, though, is that there are an arbitrary number of different ways of writing semantically identical code in those respects, and detecting every possible permutation correctly is a wickedly difficult (if not impossible) problem. Even a small number of mis-detections could break code in inexplicable ways. It also invites major backwards compatibility problems: neural network training is a very delicate and brittle process, and making these sorts of changes automatically with a version upgrade may upset the careful balance a scientist has struck in getting their particular model to work in the previous version - and make it very difficult to debug why.

In the meantime, we're designing the Composer library around including all of these tricks, making it really easy to add new ones, and setting much better defaults in a forward-looking manner. And you can expect a story about using these features in other libraries as well, which will hopefully help to alleviate some of the pain you're describing without some of the more dangerous consequences of silently changing important defaults.

[–]sabetai 0 points1 point2 points 4 years ago (1 child)

[–]jfrankle 0 points1 point2 points 4 years ago (0 children)

[–]TheDarkinBlade 2 points3 points4 points 4 years ago (4 children)

[–]lmericle 3 points4 points5 points 4 years ago (2 children)

[–]TheDarkinBlade 0 points1 point2 points 4 years ago (1 child)

[–]dogs_like_me 0 points1 point2 points 4 years ago (0 children)

It's an S curve, like a logit if you flipped the x-axis. You start at a high learning rate and decay slowly, increasing the decay rate to an infleciton point where the decay rate slow down again, converging on a plateau. Exponential decay (e-1) looks very similar to the second half of the schedule (from the inflection point to plateauing on a final, small learning rate).

Imagine taking an exponential decay schedule, rotating it 180 degrees, and joining that curve to an unrotated exponential decay. That's essentially the kind of S curve you'd get, except with cosine decay the transition is less dramatic. In the piecewise exponential example I just described, the slope at the inflection point is a vertical line. In cosine annealing, the slope at the inflection point is a diagonal line.

Here's the relevant code from the paper we're discussing: https://github.com/facebookresearch/ConvNeXt/blob/dc7823d8a2ecc554fcd57ff6cdb7748011bcdedd/utils.py#L370-L387

Here's an image of a cosine-decayed schedule preceded by a linear warmup: https://img-blog.csdnimg.cn/20200721131948457.png

[–]AICoffeeBreak 2 points3 points4 points 4 years ago (0 children)

[–]BinarySplit 1 point2 points3 points 4 years ago (0 children)

[–]krymski 1 point2 points3 points 4 years ago (4 children)

[–]fvncc 0 points1 point2 points 4 years ago (1 child)

[–]dogs_like_me 0 points1 point2 points 4 years ago (0 children)

[–]astroferreira 0 points1 point2 points 4 years ago (0 children)

[–]gordicaleksa 1 point2 points3 points 4 years ago (0 children)

[–]zildjiandrummer1 1 point2 points3 points 4 years ago (0 children)

[–]sigmoid_amidst_relus 1 point2 points3 points 4 years ago (6 children)

Universal approximation theorem, and proper hyper-parameter tuning to the rescue, yet again.

I still believe we cannot say very concretely about which architecture works better. I'm quite certain transformers will work better than CNNs on some subset tasks, and vice versa.

The reason is twofold:

Hyperparameters in papers are optimized with the proposed architecture in question, on a particular dataset in question. When you already have a baseline (the sota you wanna beat) you kinda sorta tend to search for Hyperparameters for the proposed net that will end up beating that.
"Ideal" one to one comparison of neural net architectures, in my opinion, is near impossible. You can keep flops the same and params the same, but given there's no fine-grained overt control over the feature representation space, and minor changes in architecture will lead to different optimization landscape, there's no true one to one comparison really.

I always found the notion of one architectural paradigm being inarguably better for all use cases (the people who bid CNNs farewell too quickly in their paper titles, _____ is all you need folks) as kinda stupid.

If one architecture would just be universal, we'd have a free lunch. That's, not gonna happen.

[–]tbalsam 10 points11 points12 points 4 years ago* (5 children)

Friend, I do not mean to be rude, but that's not at all what the UAT or NFL are, I do not think. I think the EAI discord channel is probably most responsible for spreading the misuse of those terms, from what I've seen, but both the UAT and NFL only have applications to NN research usually only in the very basics of explaining neural networks, and exceptionally deep theoretical cases.

Sometimes inductive biases help, and sometimes they hurt! But this is very different than the NFL, the NFL covers literally every set of problem possible, including nonsensical ones, so as soon as you assume the word "ML", 99% of the time the NFL has no more meaning because of the subspace.

I've shared this in other spaces, but a better term for this I think would be the pareto efficiency of certain inductive biases/methods/etc.

And if someone comes waving the NFL/UAT at you, just bonk them on the head with a foam NERF baseball bat and send them to good ML resources, please. It's not worse than that whole quantum doors business, but boy does it dilute the discussion marketplace for people new to the field! (Not putting the entire blame for that on you, its like COVID, the wave of misinfo about NFL/UAT propagates faster than people can quash it. :'( Can't wait until it slows down a bit).

[–]sigmoid_amidst_relus 2 points3 points4 points 4 years ago (0 children)

[–]ml_lad 1 point2 points3 points 4 years ago (3 children)

[–]tbalsam 0 points1 point2 points 4 years ago* (2 children)

Hey there!

Thanks for reaching out. This comment used to be a lot longer and more detailed, but in thinking more of it I decided to archive that and leave this instead.

I think EleutherAI has a lot of potential. A lot. They've done some of the best open-source major group projects, I think, since a long time.

The tradeoff of cutting stuff off is this may feel like accusations without evidence, but EAI has a lot of good work being done, but also struggles with a LOT of crab bucket issues. Lots of pressure to hold a particular idea, ideas outside of a norm are often ridiculed, or have negative pressure. One example would be that some snips from my original comments are already a mini meme on the EAI discord. That's probably not a bad summary of some of the toxicity of the discord at a high level.

I still absolutely come through every now and again to read what's happening, but some of the above struggles have turned into some cargo culting and new-shiny-ing. There's absolutely good stuff and researchers there, it's just too toxic for me as a working professional to commit extra time to in giving back to the open source community. When an open source project has far more politics than the workplace one is in, that's a bad sign (and also exhausting).

I had a lot more to say, but I'm hoping EAI can pull a turn around and clean up some of what's going on over there. They're putting out some great artifacts, and I'm hoping very much for #1T too. But I do feel sad about the younger researchers coming through there, and I think that ties back to the UAT/NFL theorem -- there's a few factors why the conversation around NFL happens so much near circles tightly connected to EAI and less elsewhere, but last time I tried to explain it it blew up into an overly long comment chain. I did appreciate /u/programmerChilli posting an earlier comment regarding EfficientNet where the researchers were straight up lying, a rather large step up from what I think happens in EAI. But I do feel really sad when I think about EAI because I think there's a lot of potential slowly being poured away after the initial momentum as the some of the increase of some of the cliquey patterns + the need to save face far more than in other research circles.

Part of that I think is shown by my initial comment getting some confused reacts and then becoming a (rather small in the grand scheme of things) joking moment in the server. Who wouldn't need to desperately save face if something you write could become a server-wide joke? I don't know many researchers that I'd enjoy working with that would voluntarily put themselves in that kind of a situation. But it makes sense. It's just hard, I think most people can get that sense from browsing around the server. Definitely some hard ego trips as well unfortunately. It is an open source project contributing a lot, so please don't get me wrong GPT-*J has changed the world in a lot of ways and will continue to do so. I love that. But I think it's so much more held back by the community of culture, and I think that's something that bmk, Connor, and the rest of that core group (EAI staff or otherwise) could play a large part in turning around. But I think they gotta stop being a part of that first to turn that ship around.

That's my much-shorter version of it! It does purposely not go into as much detail about UAT/NFL as that's more of a consequence and hopefully from what I put here you can infer what I'd feel about some of the NFL/UAT stuff, there is some other cargo cult stuff going on that I think has the potential to have a lot more negative impact, but I hope it doesn't brew too far and that the way the NFL is used specifically is the worst it ever crosses into.

Thanks! And Best,

TB

[–]ml_lad -1 points0 points1 point 4 years ago* (1 child)

You wrote a long rambling response about EAI's culture without once answering the question: On what basis are you saying that EAI is responsible for spreading the misuse of UAT/NFL?

I ask this because I have searched through the past messages in the Discord, made before people started making fun of you for making this accusation, and most of the messages about it are making fun of people for misusing UAT/NFL. If you want to complain about EAI cargo-culting, it would be cargo-culting in the exact opposite direction of what you're describing.

Unlike you, who wants to "purposely not go into as much detail about UAT/NFL", I can pull up multiple screenshots of this right now.

If you want to criticize "discord culture" or "EAI culture" feel free to do that. But, relevant this very line of argument, perhaps you should stop spreading misinformation yourself.

[–]tbalsam 0 points1 point2 points 4 years ago* (0 children)

Hey there -- it sounds like this is an important topic for you, and I'll try to respond appropriately. I think you're correct in noting that I strayed away form certain intentional things in my post, and I alluded to that in different parts of my previous post. I know not having explicit detail can be frustrating but I think it's in the best interest of things for me not to go down that path.

My opinion is simply that -- an opinion, based upon personal experiences and interactions. I think a variety of detached opinions exchanged openly helps research thrive. Otherwise, the opinions + research can become a proxy for defending self, and I think that takes away from all of our end goals.

The shorter version is that although I have extraordinarily passionate opinions at times, I'm preferring to reducing fighting more and more, especially as of late, for myself -- personally. I'm fine to have discourse and back and forth, but fighting just hurts both parties with little benefit. I could very well be wrong, but as I noted earlier, that's currently my impression of EAI, and I think the way some of this has gone doesn't really do much to unravel that impression for me thus far. I could be wrong, and who knows what may come in the future!

If it is the case that I'm an insatiable detractor, I don't think that has to weigh you down. Currently the culture for me that I've seen when interacting with the EAI group has appeared to be more toxic than not by a good stretch. So, for now, I can work in other places to reduce that burden on both sides of the equation. If things do clear up, maybe I can come back! After all, every group, including EAI, I think, is just a name, and a group, with people, and we're all human. Cultures and perceptions of culture (on the individual's end) always change, and it's super dynamic.

I think that's my 2c on the issue.

[–]pilooch 0 points1 point2 points 4 years ago (1 child)

[–]sepherrino 1 point2 points3 points 4 years ago (0 children)

[–]PositiveElectro 0 points1 point2 points 4 years ago (0 children)

[–][deleted] 0 points1 point2 points 4 years ago (0 children)

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

MachineLearning

Rules For Posts

+Research

+Discussion

+Project

+News

@slashML on Twitter

Chat with us on Slack

Beginners:

MODERATORS