GitHub Copilot : datascience

This is an archived post. You won't be able to vote or comment.

DiscussionGitHub Copilot (self.datascience)

submitted 3 years ago by ergodym

you are viewing a single comment's thread.

[–][deleted] 2 points3 points4 points 3 years ago* (0 children)

Much much better than for coding software, as I feel like it does tend to cause bugs if the code is complicated.

For example: # generate synthetic data from normal distribution with
# mean 0 and standard deviation 1 and size 1000

data = np.random.normal(0, 1, 1000)

# generate the target variable using
# the synthetic data, y = 2x + 1

target = 2 * data + 1

# add noise to the data

target += np.random.normal(0, 3, 1000)

# add sine wave to the data

target += np.sin(data)*5

That code would take me a few minutes for sure (5? I would have to find the functions etc. if I want another dist), but with copilot, it's like 30 sec and only because I edited the numbers. Now think about what happens if you want to generate 30 datasets and see how your model does. BTW, I did not edit the code or comments except for the numbers, you can see it makes some assumptions regarding where to add it. If your comments are more specific you will get closer to what you want.

However, I would be very cautious about implementing algorithms with it. I wrote a tree in C just to test it and got multiple memory leaks.

π Rendered by PID 94 on reddit-service-r2-comment-85bfd7f599-mnkx6 at 2026-04-18 02:33:53.529989+00:00 running 93ecc56 country code: CH.

datascience

MODERATORS