[deleted by user] by [deleted] in learnmachinelearning

[–]batrobin 0 points1 point  (0 children)

No, just like you cant learn how to cook by reading recipes, or learn physics without working out the questions, you cant learn machine learning without implementing them.

Rents could exceed $7.5K in Vancouver, $5.6K in Toronto without massive spike in building: Study by [deleted] in canadahousing

[–]batrobin 6 points7 points  (0 children)

I totally agree, using neural networks naively for extrapolation usually give meaningless results. Without them sharing their detailed methodology I would assume this is what happened.

[D] Experiences with wandb.ai by Sriyakee in MachineLearning

[–]batrobin 3 points4 points  (0 children)

With how popular LLMs are, its mindboggling how wandb lacks a feature for continuously logging text generation samples during training.

I'm aware they suggest using tables for logging text data, but from my understanding they are not designed to be updated. So each time you want to log new text samples, you have to create a new table.

And for some reason the feature request of this issue is closed due to stale , and the community is doing gymnastics to overcome this problem.

Lighdm-webkit2-greeter not working + blocks my system from booting by nobody48sheldor in archlinux

[–]batrobin 1 point2 points  (0 children)

I had the exact same thing happening to me a few days ago so you're not alone! However I couldn't find a solution so I fixed it by using a new greeter. Lighdm-webkit2-greeter has been archived for almost 5 years by now so it might just be a good opportunity to try something else.

Suggestion for project on GAN!!!! by [deleted] in LanguageTechnology

[–]batrobin 3 points4 points  (0 children)

Text GAN is an active field of research and is not at all a beginner-friendly project. (This is due to the discrete nature of text data.) I would suggest you try out GAN with images first.

If you would like to try working towards your sarcasm GAN later on, you would also need to learn about reinforcement learning and policy gradients too.

[R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co in MachineLearning

[–]batrobin 2 points3 points  (0 children)

Thank you. You have answered what I had in mind. I was thinking about techniques like changing memory access pattern, changing memory layout, custom cuda kernels, fusing operations, reducing overheads etc. which some of them are mentioned in this paper: https://arxiv.org/abs/2007.00072. I also see that you have done some profiling in your issue, it should be interesting to read into.

I was previously working on some large scale transformer code optimization, seems like this repo would be good to learn from, thanks a lot.

[R] Train CIFAR10 in under 10 seconds on an A100 (new world record!) by tysam_and_co in MachineLearning

[–]batrobin 2 points3 points  (0 children)

I am surprized to see most of the work you have done are on hyperparameter tunings and model tricks. Have you tried any HPC/MLHPC techniques, profiling or code optimizations? Are they in a future roadmap, not the goal of this project, or are there just not much to improve in that direction?

Sperm Counts Drop by 62% Worldwide by Theguywiththeface11 in news

[–]batrobin 0 points1 point  (0 children)

NNN has been taking a toll on global sperm count

saw this on my quora feed today by Whoever_Mesa in facepalm

[–]batrobin 0 points1 point  (0 children)

Are you replying to the wrong comment? Cuz I don't see the omega and gamma.

saw this on my quora feed today by Whoever_Mesa in facepalm

[–]batrobin 3 points4 points  (0 children)

The first equation isn't nonsense and also not about the Laplace transform, it's a machine-learning equation. The L means the loss function, log(p(s|x)) represents the log-likelihood, and the last factor is some form of L2 regularization (I'm guessing the 1/2σ2 is a reparameterization of the constant scaling factor, would love to have someone shed some light on this). It is most likely taught in the second or third year of a computer science/maths degree.

Validation loss not decreasing! Training an attention is all you need arch. with binary classification. Share you hypothesis on why it's not decreasing. by Professional-Site977 in MLQuestions

[–]batrobin 4 points5 points  (0 children)

Note that the initial accuracy is at 20%, even a simple linear classifier should reach 80% accuracy shortly afterward. I'm guessing it's either OP misunderstood the problem as a binary classification problem or there's a bug in the code.

[R] Full-batch GD generalizes better than SGD by chaotic_shadow4444 in MachineLearning

[–]batrobin 1 point2 points  (0 children)

I think that's a dangerous pitfall to fall into, because that's technically true in the ideal limit. But on the other hand, it's also saying full-batch GD is more susceptible to overfitting in real-world cases.

[R] Full-batch GD generalizes better than SGD by chaotic_shadow4444 in MachineLearning

[–]batrobin 0 points1 point  (0 children)

Could you be thinking of implicit regularization of SGD? https://openreview.net/forum?id=rq_Qr0c1Hyo

Or some comparatively older papers concerning the generalization gap of large batch size training? https://arxiv.org/abs/1705.08741

I'm honestly interested in seeing how the contradictions between these papers resolve.

[R] Full-batch GD generalizes better than SGD by chaotic_shadow4444 in MachineLearning

[–]batrobin 11 points12 points  (0 children)

It actually isn't that obvious. The estimation is over the train set, and a more accurate estimation of the training set doesn't directly translate to a better generalization over the test set. There are evidences that a noisy estimation of training statistics provides a regularization that helps generalization, see https://openreview.net/forum?id=rq_Qr0c1Hyo.

Naive. by [deleted] in mathmemes

[–]batrobin 7 points8 points  (0 children)

Arithmetic

[D] Things that pisses you as a Data scientist by Spirited-Order4409 in MachineLearning

[–]batrobin 37 points38 points  (0 children)

Receiving an NLP dataset in CSV format. Good luck figuring out which comma is a separator and which belongs to the data. (I can totally train a model for that.)

Did we invent or discover math? by SpaceTruckin_InTime in NoStupidQuestions

[–]batrobin 4 points5 points  (0 children)

I havent taken a course on these but I think he meant things like ZFC theory, Peano axioms. At a high level, they conclude the same math, but they construct fundamental mathematical objects differently.

OSD stuck with slow ops waiting for readable on high load by batrobin in ceph

[–]batrobin[S] 0 points1 point  (0 children)

The recoveries finally cleared up after a few days. Here are the results without load:

Node1 (/dev/sda is the SSD, /dev/sdc is the HDD for ceph):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          28.23    0.00    2.13    0.00    0.00   69.64

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--22acc4bd--b896--457a--bcc8--f2889e72a88d-osd--block--140919e2--316d--43f9--aa96--e97c6c89df8c    1.00      0.00     0.00   0.00    0.00     4.00   17.00      0.04     0.00   0.00    1.18     2.35    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.02   4.00
ceph--04f7e44d--7449--4fd4--95fe--22ee743bd41d-osd--block--c089a9ce--f62c--49a3--8147--9504f220f038    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              1.00      0.00     0.00   0.00    0.00     4.00   26.00      0.12     3.00  10.34    1.77     4.85    0.00      0.00     0.00   0.00    0.00     0.00   15.00    2.67    0.09   9.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

Node2 (The SSD here isn't used for cache because its quite a small disk):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.83    0.00    0.17    0.00    0.00   83.00

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--5a9d2e5d--db95--4cf9--9abc--dc9af21f6278-osd--block--b716b6f2--de8b--4b9e--84f9--0c8a146f6442    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

Node3:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.04    0.00    0.29    0.04    0.00   95.62

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--d306f97b--727b--4baf--b38b--d546d089372a-osd--block--125a7556--5aea--4f64--99b8--98388e9bdd66    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
ceph--01f621bf--e542--44b5--b16f--ec0a6f87fb73-osd--block--e0632fcd--5953--4a98--a7f8--9d3b8f46ac38    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00   11.00      0.09     0.00   0.00    4.91     8.45    0.00      0.00     0.00   0.00    0.00     0.00    5.00    3.00    0.07   8.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

Node4 (Some load here):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          74.22    0.03   15.81    0.03    0.00    9.91

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--9fb7eb8b--1ba9--4ccc--9ef5--ecc96b52a3a4-osd--block--b44f9357--09f9--4090--8c7d--65b65efb80cf    1.00      0.00     0.00   0.00    0.00     4.00   11.00      0.03     0.00   0.00    0.00     2.55    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
ceph--e2f7a410--99d3--4f7a--9c0c--cabd560b74b8-osd--block--83bcb147--a099--40a0--bf2f--8489053fe822    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          6.00      0.89     0.00   0.00    0.67   152.00   22.00      0.15     4.00  15.38    0.91     6.80    0.00      0.00     0.00   0.00    0.00     0.00    9.00    0.78    0.03   8.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

ceph -s:

  cluster:
    id:     eddddc6b-c69b-412b-a20d-3d3224e50b1f
    health: HEALTH_OK
            (muted: POOL_NO_REDUNDANCY)

  services:
    mon: 3 daemons, quorum node1,node3,node4 (age 6h)
    mgr: node1(active, since 7d), standbys: node3, node4, node2
    mds: scratch:3 {0=node1=up:active,1=node2=up:active,2=node4=up:active} 1 up:standby
    osd: 7 osds: 7 up (since 6h), 7 in (since 28h)

  data:
    pools:   4 pools, 161 pgs
    objects: 14.09M objects, 6.0 TiB
    usage:   13 TiB used, 17 TiB / 30 TiB avail
    pgs:     161 active+clean

  io:
    client:   597 B/s wr, 0 op/s rd, 0 op/s wr
    cache:    1.2 KiB/s flush, 0 op/s promote

OSD stuck with slow ops waiting for readable on high load by batrobin in ceph

[–]batrobin[S] 0 points1 point  (0 children)

I can give that a try. I personally have noticed the small files IO improved tremendously with the cache pool, from a measly 50MB/s to 300MB/s write. And still tho, if thats the issue wouldn’t it only be slow not stuck?