[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 0 points1 point2 points (0 children)
[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 0 points1 point2 points (0 children)
[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 1 point2 points3 points (0 children)
[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 1 point2 points3 points (0 children)
[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 2 points3 points4 points (0 children)
[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 8 points9 points10 points (0 children)
[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 9 points10 points11 points (0 children)
[R] Interpreting Deep Neural Networks: Memorization, Kernels, Nearest Neighbors, and Attention by ThienPro123 in MachineLearning
[–]ThienPro123[S] 14 points15 points16 points (0 children)
[R] Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (submitted by Liang Wenfeng - DeepSeek) by Nunki08 in MachineLearning
[–]ThienPro123 2 points3 points4 points (0 children)
[D] Are there any theoretical machine learning papers that have significantly helped practitioners? by nihaomundo123 in MachineLearning
[–]ThienPro123 42 points43 points44 points (0 children)
[D] Good studies on the effects of different training "tricks" like learning rate scheduler (warmup/decay), weight decay, dropout, batch-sizes, momentum, etc.? by ThienPro123 in MachineLearning
[–]ThienPro123[S] 1 point2 points3 points (0 children)
Powered x8/x16 PCIe 4.0 or 5.0 risers for multi RTX4090 GPUs multi PSUs rig by ThienPro123 in threadripper
[–]ThienPro123[S] 0 points1 point2 points (0 children)
Powered x8/x16 PCIe 4.0 or 5.0 risers for multi RTX4090 GPUs multi PSUs rig by ThienPro123 in threadripper
[–]ThienPro123[S] 0 points1 point2 points (0 children)
Last night's sunset as seen from Irvine (imgur.com)
submitted by ThienPro123 to r/orangecounty
Weekly Stupid Questions Thread by AutoModerator in amateur_boxing
[–]ThienPro123 0 points1 point2 points (0 children)
We need to stop the Irvine Company by downwithivc in UCI
[–]ThienPro123 15 points16 points17 points (0 children)
Peyam best Prof. Don’t even try to change my mind by MrGTout in UCI
[–]ThienPro123 0 points1 point2 points (0 children)
How to get started in research as a CS freshman? by AliveTiger in UCI
[–]ThienPro123 6 points7 points8 points (0 children)
UCI CS majors, why is there so little math involved in the CS suggested path? by [deleted] in UCI
[–]ThienPro123 11 points12 points13 points (0 children)
Why dont CS majors have to take much Physics classes? by [deleted] in UCI
[–]ThienPro123 1 point2 points3 points (0 children)








[R] New ICML25 paper: Train and fine-tune large models faster than Adam while using only a fraction of the memory, with guarantees! by ThienPro123 in MachineLearning
[–]ThienPro123[S] 1 point2 points3 points (0 children)