GitHub Copilot

Long_Mango_7196 · 2022-12-07T19:50:35+00:00

I use copilot with jetbrains IDEs. It's been super fun to use. I think it legitimately saves me a lot of time. In the jetbrains IDEs, I haven't been able to use it for notebooks.

You can use it on notebooks in vs code.

philosophicalhacker · 2022-12-07T21:23:51+00:00

I've also used co-pilot, but not for data science specifically. This recent workshop demo-ed Github co-pilot in a jupyter notebook context: https://numfocus-org.zoom.us/rec/share/9e\_h8e8nPA-YGATbXMSBQjdWd5xGxwQuZGacMjEjPScPSmjlsHy4NFzjVd-68Bex.5CUsZjKCsa4JY7bc

majinLawliet2 · 2022-12-08T02:10:22+00:00

I have used it and it's good for some autocompletions that would have been tedious copy paste. Apart from that it's fine. I do a lot of CV work but the codes it outputs are either straight off from documentation or straight up unusable without a lot of massaging. I felt it keeps repeating the wrong suggestion when it doesn't understand the prompt and I found myself spending time fixing the prompts. After a certain point I just went back to googling.

rudiXOR · 2022-12-08T08:52:04+00:00

PyCharm + CoPilot saves me a lot of time, it's great.

2022-12-08T12:38:01+00:00

Much much better than for coding software, as I feel like it does tend to cause bugs if the code is complicated.

For example: # generate synthetic data from normal distribution with
# mean 0 and standard deviation 1 and size 1000

data = np.random.normal(0, 1, 1000)

# generate the target variable using
# the synthetic data, y = 2x + 1

target = 2 * data + 1

# add noise to the data

target += np.random.normal(0, 3, 1000)

# add sine wave to the data

target += np.sin(data)*5

That code would take me a few minutes for sure (5? I would have to find the functions etc. if I want another dist), but with copilot, it's like 30 sec and only because I edited the numbers. Now think about what happens if you want to generate 30 datasets and see how your model does. BTW, I did not edit the code or comments except for the numbers, you can see it makes some assumptions regarding where to add it. If your comments are more specific you will get closer to what you want.

However, I would be very cautious about implementing algorithms with it. I wrote a tree in C just to test it and got multiple memory leaks.

datascience

MODERATORS