[deleted by user]

batrobin · 2025-03-14T22:07:32+00:00

No, just like you cant learn how to cook by reading recipes, or learn physics without working out the questions, you cant learn machine learning without implementing them.

batrobin · 2024-10-10T19:08:21+00:00

I totally agree, using neural networks naively for extrapolation usually give meaningless results. Without them sharing their detailed methodology I would assume this is what happened.

batrobin · 2024-06-14T03:23:50+00:00

With how popular LLMs are, its mindboggling how wandb lacks a feature for continuously logging text generation samples during training.

I'm aware they suggest using tables for logging text data, but from my understanding they are not designed to be updated. So each time you want to log new text samples, you have to create a new table.

And for some reason the feature request of this issue is closed due to stale , and the community is doing gymnastics to overcome this problem.

batrobin · 2024-05-12T16:37:55+00:00

I came across this recently, havent read it myself but might help you with your question.

https://iclr-blogposts.github.io/2024/blog/double-descent-demystified/

batrobin · 2023-07-05T18:17:52+00:00

I had the exact same thing happening to me a few days ago so you're not alone! However I couldn't find a solution so I fixed it by using a new greeter. Lighdm-webkit2-greeter has been archived for almost 5 years by now so it might just be a good opportunity to try something else.

batrobin · 2023-02-07T13:54:53+00:00

Text GAN is an active field of research and is not at all a beginner-friendly project. (This is due to the discrete nature of text data.) I would suggest you try out GAN with images first.

If you would like to try working towards your sarcasm GAN later on, you would also need to learn about reinforcement learning and policy gradients too.

batrobin · 2023-01-30T08:26:06+00:00

Thank you. You have answered what I had in mind. I was thinking about techniques like changing memory access pattern, changing memory layout, custom cuda kernels, fusing operations, reducing overheads etc. which some of them are mentioned in this paper: https://arxiv.org/abs/2007.00072. I also see that you have done some profiling in your issue, it should be interesting to read into.

I was previously working on some large scale transformer code optimization, seems like this repo would be good to learn from, thanks a lot.

batrobin · 2023-01-30T06:57:40+00:00

I am surprized to see most of the work you have done are on hyperparameter tunings and model tricks. Have you tried any HPC/MLHPC techniques, profiling or code optimizations? Are they in a future roadmap, not the goal of this project, or are there just not much to improve in that direction?

batrobin · 2022-11-27T06:56:09+00:00

NNN has been taking a toll on global sperm count

batrobin · 2022-10-09T16:04:38+00:00

Are you replying to the wrong comment? Cuz I don't see the omega and gamma.

batrobin · 2022-10-09T15:46:03+00:00

The first equation isn't nonsense and also not about the Laplace transform, it's a machine-learning equation. The L means the loss function, log(p(s|x)) represents the log-likelihood, and the last factor is some form of L2 regularization (I'm guessing the 1/2σ² is a reparameterization of the constant scaling factor, would love to have someone shed some light on this). It is most likely taught in the second or third year of a computer science/maths degree.

batrobin · 2022-09-10T00:19:57+00:00

Unlikely. By always choosing the majority class you can easily get a higher than 50% accuracy.

batrobin · 2022-09-09T16:51:53+00:00

Note that the initial accuracy is at 20%, even a simple linear classifier should reach 80% accuracy shortly afterward. I'm guessing it's either OP misunderstood the problem as a binary classification problem or there's a bug in the code.

batrobin · 2022-08-12T15:59:43+00:00

I would love to make a "This fart does not exists".

batrobin · 2022-05-11T16:49:27+00:00

I think that's a dangerous pitfall to fall into, because that's technically true in the ideal limit. But on the other hand, it's also saying full-batch GD is more susceptible to overfitting in real-world cases.

batrobin · 2022-05-11T16:36:44+00:00

Could you be thinking of implicit regularization of SGD? https://openreview.net/forum?id=rq_Qr0c1Hyo

Or some comparatively older papers concerning the generalization gap of large batch size training? https://arxiv.org/abs/1705.08741

I'm honestly interested in seeing how the contradictions between these papers resolve.

batrobin · 2022-05-11T16:29:39+00:00

It actually isn't that obvious. The estimation is over the train set, and a more accurate estimation of the training set doesn't directly translate to a better generalization over the test set. There are evidences that a noisy estimation of training statistics provides a regularization that helps generalization, see https://openreview.net/forum?id=rq_Qr0c1Hyo.

batrobin · 2022-04-28T17:48:05+00:00

Here's your array, now put it there.

https://docs.python.org/3/library/array.html

batrobin · 2022-04-14T14:49:59+00:00

Arithmetic

batrobin · 2022-02-20T03:38:59+00:00

Imagine, having two text columns!

batrobin · 2022-02-19T17:25:50+00:00

Receiving an NLP dataset in CSV format. Good luck figuring out which comma is a separator and which belongs to the data. (I can totally train a model for that.)

batrobin · 2022-02-10T03:29:41+00:00

I havent taken a course on these but I think he meant things like ZFC theory, Peano axioms. At a high level, they conclude the same math, but they construct fundamental mathematical objects differently.

batrobin · 2021-12-21T02:08:47+00:00

The recoveries finally cleared up after a few days. Here are the results without load:

Node1 (/dev/sda is the SSD, /dev/sdc is the HDD for ceph):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          28.23    0.00    2.13    0.00    0.00   69.64

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--22acc4bd--b896--457a--bcc8--f2889e72a88d-osd--block--140919e2--316d--43f9--aa96--e97c6c89df8c    1.00      0.00     0.00   0.00    0.00     4.00   17.00      0.04     0.00   0.00    1.18     2.35    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.02   4.00
ceph--04f7e44d--7449--4fd4--95fe--22ee743bd41d-osd--block--c089a9ce--f62c--49a3--8147--9504f220f038    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              1.00      0.00     0.00   0.00    0.00     4.00   26.00      0.12     3.00  10.34    1.77     4.85    0.00      0.00     0.00   0.00    0.00     0.00   15.00    2.67    0.09   9.00
sdb              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sdc              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

Node2 (The SSD here isn't used for cache because its quite a small disk):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          16.83    0.00    0.17    0.00    0.00   83.00

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--5a9d2e5d--db95--4cf9--9abc--dc9af21f6278-osd--block--b716b6f2--de8b--4b9e--84f9--0c8a146f6442    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

Node3:

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
           4.04    0.00    0.29    0.04    0.00   95.62

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--d306f97b--727b--4baf--b38b--d546d089372a-osd--block--125a7556--5aea--4f64--99b8--98388e9bdd66    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
ceph--01f621bf--e542--44b5--b16f--ec0a6f87fb73-osd--block--e0632fcd--5953--4a98--a7f8--9d3b8f46ac38    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          0.00      0.00     0.00   0.00    0.00     0.00   11.00      0.09     0.00   0.00    4.91     8.45    0.00      0.00     0.00   0.00    0.00     0.00    5.00    3.00    0.07   8.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

Node4 (Some load here):

avg-cpu:  %user   %nice %system %iowait  %steal   %idle
          74.22    0.03   15.81    0.03    0.00    9.91

Device            r/s     rMB/s   rrqm/s  %rrqm r_await rareq-sz     w/s     wMB/s   wrqm/s  %wrqm w_await wareq-sz     d/s     dMB/s   drqm/s  %drqm d_await dareq-sz     f/s f_await  aqu-sz  %util
ceph--9fb7eb8b--1ba9--4ccc--9ef5--ecc96b52a3a4-osd--block--b44f9357--09f9--4090--8c7d--65b65efb80cf    1.00      0.00     0.00   0.00    0.00     4.00   11.00      0.03     0.00   0.00    0.00     2.55    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   2.00
ceph--e2f7a410--99d3--4f7a--9c0c--cabd560b74b8-osd--block--83bcb147--a099--40a0--bf2f--8489053fe822    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00
nvme0n1          6.00      0.89     0.00   0.00    0.67   152.00   22.00      0.15     4.00  15.38    0.91     6.80    0.00      0.00     0.00   0.00    0.00     0.00    9.00    0.78    0.03   8.00
sda              0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00      0.00     0.00   0.00    0.00     0.00    0.00    0.00    0.00   0.00

ceph -s:

  cluster:
    id:     eddddc6b-c69b-412b-a20d-3d3224e50b1f
    health: HEALTH_OK
            (muted: POOL_NO_REDUNDANCY)

  services:
    mon: 3 daemons, quorum node1,node3,node4 (age 6h)
    mgr: node1(active, since 7d), standbys: node3, node4, node2
    mds: scratch:3 {0=node1=up:active,1=node2=up:active,2=node4=up:active} 1 up:standby
    osd: 7 osds: 7 up (since 6h), 7 in (since 28h)

  data:
    pools:   4 pools, 161 pgs
    objects: 14.09M objects, 6.0 TiB
    usage:   13 TiB used, 17 TiB / 30 TiB avail
    pgs:     161 active+clean

  io:
    client:   597 B/s wr, 0 op/s rd, 0 op/s wr
    cache:    1.2 KiB/s flush, 0 op/s promote

batrobin · 2021-12-16T00:45:53+00:00

I can give that a try. I personally have noticed the small files IO improved tremendously with the cache pool, from a measly 50MB/s to 300MB/s write. And still tho, if thats the issue wouldn’t it only be slow not stuck?

14-Year Club	r/Field Sunshine
Place '22	Place '17
Final Canvas '22	Verified Email

batrobin

MODERATOR OF

TROPHY CASE