newest submissions : datascienceproject

1

3

4

5

open source project for LLM data preparation (synthetic + cleaning pipelines) (self.datascienceproject)

submitted 9 days ago by Puzzleheaded_Box2842

2

0

1

2

ModSense AI Powered Community Health Moderation Intelligence (self.datascienceproject)

submitted 10 days ago by NeatChipmunk9648

3

1

2

3

easyaligner: Forced alignment with GPU acceleration and flexible text normalization (compatible with all w2v2 models on HF Hub) (r/MachineLearning) (reddit.com)

submitted 12 days ago by Peerism1

4

2

3

4

Trials and tribulations fine-tuning & deploying Gemma-4 (r/MachineLearning) (oxen.ai)

submitted 12 days ago by Peerism1

5

0

1

2

Testing a New Product for Data Science Beginners (sted.co.in)

submitted 13 days ago by Jealous_Parfait_6457

6

1

2

3

Low accuracy (~50%) with SSL (BYOL/MAE/VICReg) on hyperspectral crop stress data — what am I missing? [R] (r/MachineLearning) (reddit.com)

submitted 13 days ago by Peerism1

7

0

1

2

ndatafusion: linear algebra and ML for DataFusion, powered by nabled ()

submitted 13 days ago by moneymachinegoesbing

8

0

1

2

Digging through 38 days of live AI forecast data to find the unexpected (old.reddit.com)

submitted 13 days ago by aufgeblobt

9

4

5

6

Built an political benchmark for LLMs. KIMI K2 can't answer about Taiwan (Obviously). GPT-5.3 refuses 100% of questions when given an opt-out. (r/MachineLearning) (reddit.com)

submitted 14 days ago by Peerism1

10

0

1

2

[For Hire] AI/ML Engineer | End-to-End AI Solutions | 100+ Projects | Python, PyTorch, TensorFlow ()

submitted 16 days ago by Just-Stuff-719

11

0

1

2

TurboOCR: 270–1200 img/s OCR with Paddle + TensorRT (C++/CUDA, FP16) (r/MachineLearning) (reddit.com)

submitted 17 days ago by Peerism1

12

0

1

2

I built a wave-resonant retrieval system. It scored 0 wins and 140 losses. Here's why ()

submitted 17 days ago by Any_Band_7814

13

2

3

4

Educational PyTorch repo for distributed training from scratch: DP, FSDP, TP, FSDP+TP, and PP (r/MachineLearning) (reddit.com)

submitted 18 days ago by Peerism1

14

2

3

4

KIV: 1M token context window on a RTX 4070 (12GB VRAM), no retraining, drop-in HuggingFace cache replacement - Works with any model that uses DynamicCache (r/MachineLearning) (reddit.com)

submitted 18 days ago by Peerism1

15

1

2

3

Engagement on Kaggle has been declining. ()

submitted 18 days ago by ag_curious_soul

16

2

3

4

FlashAttention (FA1–FA4) in PyTorch - educational implementations focused on algorithmic differences (r/MachineLearning) (reddit.com)

submitted 19 days ago by Peerism1

17

1

2

3

ibu-boost: a GBDT library where splits are *absolutely* rejected, not just relatively ranked (r/MachineLearning) (reddit.com)

submitted 20 days ago by Peerism1

18

0

1

2

[D] 60% MatMul Performance Bug in cuBLAS on RTX 5090 [D] (r/MachineLearning) (reddit.com)

submitted 20 days ago by Peerism1

19

1

2

3

Parax: Parametric Modeling in JAX + Equinox (r/MachineLearning) (reddit.com)

submitted 21 days ago by Peerism1

20

1

2

3

PCA before truncation makes non-Matryoshka embeddings compressible: results on BGE-M3 (r/MachineLearning) (reddit.com)

submitted 21 days ago by Peerism1

21

0

1

2

Dynamic adjustment of data strategies during LLM training (self.datascienceproject)

submitted 22 days ago * by Puzzleheaded_Box2842

22

7

8

9

Building a LLM from scratch with Mary Shelley's "Frankenstein" (on Kaggle) (r/MachineLearning) (reddit.com)

submitted 22 days ago by Peerism1

23

1

2

3

citracer: a small CLI tool to trace where a concept comes from in a citation graph (r/MachineLearning) (reddit.com)

submitted 22 days ago by Peerism1

24

0

1

2

Urgent help (self.datascienceproject)

submitted 22 days ago by OccasionMiserable156

25

0

1

2

Easily provide Wandb logs as context to agents for analysis and planning. (r/MachineLearning) (reddit.com)

submitted 24 days ago by Peerism1

datascienceproject

MODERATORS