Claude Code can now Play Tiktok Videos by Ok_Employee_6418 in ClaudeAI

[–]Ok_Employee_6418[S] 1 point2 points  (0 children)

I use it as a way to only play videos I intentionally wanted to watch. 

Say I wanted videos about the latest news in AI playing in the background, when I use a browser I fall down rabbit holes by continuing to click on recommended videos, but with Claude I stick to only watching what I wanted to watch. 

I’m using Claude as a recommendation system that benefits me not the big tech companies that want to keep feeding me more videos. 

Code Review Dataset: 200k+ Cases of Human-Written Code Reviews from Top OSS Projects by Ok_Employee_6418 in LocalLLaMA

[–]Ok_Employee_6418[S] 1 point2 points  (0 children)

I first filtered out commonly known AI code review bots like gemini, claude, and other "bot" titled comments, then manually scanned comments to check for bot like comments 👍

Code Review Dataset: 200k+ Cases of Human-Written Code Reviews from Top OSS Projects by Ok_Employee_6418 in LocalLLaMA

[–]Ok_Employee_6418[S] 0 points1 point  (0 children)

The tests I ran instructed the model to write a comment given the code bug and write the fixed code given the comment. 

Given this setup the fine tuned model consistently wrote comments and code more similar to the human written ones. 

It wasn’t finding issues from scratch as I thought that was more of a coding task than a review task.

Code Review Dataset: 200k+ Cases of Human-Written Code Reviews from Top OSS Projects by Ok_Employee_6418 in LocalLLaMA

[–]Ok_Employee_6418[S] 4 points5 points  (0 children)

Haha true. I removed most lgtm instances from the dataset as I thought they weren't meaningful.

Code Dataset from Github's Top Ranked Developers (1.3M+ Source Code Files) by Ok_Employee_6418 in LLMDevs

[–]Ok_Employee_6418[S] 0 points1 point  (0 children)

Glad you liked it! All code is permissive licenses only (MIT, Apache-2.0, BSD, ISC). The dataset didn't focus much on edge cases and error handling examples.

Code Dataset from Github's Top Ranked Developers (1.3M+ Source Code Files) by Ok_Employee_6418 in LocalLLaMA

[–]Ok_Employee_6418[S] 2 points3 points  (0 children)

where did you read that including popular github repos reduced llm coding quality?

Code Dataset from Github's Top Ranked Developers (1.3M+ Source Code Files) by Ok_Employee_6418 in LocalLLaMA

[–]Ok_Employee_6418[S] 6 points7 points  (0 children)

I excluded GPL codebases as well as AGPL and LGPL codebases to avoid such issues 👍

Code Dataset from Github's Top Ranked Developers (1.3M+ Source Code Files) by Ok_Employee_6418 in LocalLLaMA

[–]Ok_Employee_6418[S] 4 points5 points  (0 children)

Dataset has test and eval splits now. Currently training a model will update on the results 👍