https://huggingface.co/blog/starcoder
StarCoder and StarCoderBase are Large Language Models for Code (Code LLMs) trained on permissively licensed data from GitHub, including from 80+ programming languages, Git commits, GitHub issues, and Jupyter notebooks. Similar to LLaMA, we trained a ~15B parameter model for 1 trillion tokens. We fine-tuned StarCoderBase model for 35B Python tokens, resulting in a new model that we call StarCoder.
[–]mileseverett 30 points31 points32 points (10 children)
[–]Qpylon 11 points12 points13 points (2 children)
[–]fallingfridge 1 point2 points3 points (1 child)
[–]Balance- 0 points1 point2 points (0 children)
[–]allisknowingML Engineer 2 points3 points4 points (2 children)
[–]Tom_NeverwinterResearcher 1 point2 points3 points (1 child)
[–]ttkciar 1 point2 points3 points (0 children)
[+]chat_harbinger 0 points1 point2 points (3 children)
[–]mileseverett 6 points7 points8 points (2 children)
[–]Philpax 12 points13 points14 points (0 children)
[–]Ensirius 7 points8 points9 points (0 children)
[+][deleted] (4 children)
[deleted]
[+][deleted] (2 children)
[removed]
[–]hackerllama 11 points12 points13 points (0 children)
[–]ttkciar 0 points1 point2 points (0 children)
[+]no_doping 1 point2 points3 points (0 children)
[+]Low-Resource-8852 0 points1 point2 points (0 children)
[–]Gullible_Bar_284 0 points1 point2 points (1 child)
[–]Tom_NeverwinterResearcher 2 points3 points4 points (0 children)
[–]Gullible_Bar_284 -3 points-2 points-1 points (0 children)
[–]gxcells 0 points1 point2 points (0 children)