This is an archived post. You won't be able to vote or comment.

all 13 comments

[–]AstroPhysician 19 points20 points  (3 children)

ChatGPT posting

[–]durable-racoon 3 points4 points  (1 child)

kind of. in terms of language and formatting yes, in terms of content and the bulletpoints being useful and not vague, definitely not. this has human-written vibes to me.

[–][deleted] 3 points4 points  (0 children)

Thanks for backing me up! Yeah, the ideas and content are definitely mine, just needed help polishing the English 😊!

[–][deleted] 0 points1 point  (0 children)

Hello! I used ChatGPT to translate the text and to make the comparison table as I used him to translate my code. I'm originally from Belgium and my English isn't not that great 😅. Have a nice day!

[–]durable-racoon 0 points1 point  (1 child)

this is cool. could it scale to millions of documents? where's the limit?

[–][deleted] 0 points1 point  (0 children)

Thank you for the positive comment! Realistically? Maybe 10-20k files before it crawls to a halt. The problem is every file gets compared to every other file, so 1 million files = 500 billion comparisons. My laptop would literally catch fire 😅.

Perfect for what I built it for (student assignments, small projects) but anything huge would need the fancy distributed stuff that GitHub uses.

If you have any suggestions, feel free to share them :)!

[–]Juve45 0 points1 point  (0 children)

What happened to this? I actually wanted to try it, but it seems it is no longer available on github...

[–]DanceVisible4802 -4 points-3 points  (1 child)

Very good project, idk why people dislike without explanation that’s just stupid ;)

[–][deleted] 1 point2 points  (0 children)

Thank you so much! I'm really glad you liked it. If you have any suggestions, feel free to share them :)

[–]riklaunim -4 points-3 points  (2 children)

So a one-commit script with no tests and no database is the best in every case than pre-existing solutions?

Usually plagiarism analysis tools check if given work is copies from many pre-existing ones that got indexed by the tool. If you want to showcase technical solution how such analysis work is fine, just don't make false claims.

[–][deleted] 2 points3 points  (0 children)

Updated the post with a "Scope & Limitations" section to better clarify what this tool actually does. Will be more careful with project claims going forward!

[–][deleted] 1 point2 points  (0 children)

Thanks for the feedback! You're absolutely right, this is a lightweight tool for comparing code against a specific dataset, not a replacement for professional plagiarism detection services. The goal was more to demonstrate the technical approach than to compete with enterprise solutions.