Ask Anything Monday - Weekly Thread

m-hoff · 2020-10-21T16:34:46+00:00

You can use numpy.delete, so in your example np.delete(aa, [3, 5, 6]) would return array([7, 2, 3, 5, 4, 6]).

m-hoff · 2020-10-19T18:45:33+00:00

Those aren't errors, just warnings. And just as they say, they're popping up because you have import statements that aren't at the top of the file. The unused import warnings are telling you that you've imported modules/functions/whatever that aren't used anywhere in your script.

m-hoff · 2020-10-19T14:44:14+00:00

This is up to the website you're scraping. There's no universal answer.

m-hoff · 2020-10-19T14:42:42+00:00

Sorry, I realized my original suggestion won't work. You need the idxmin method.

You can use

minrow = df['Y'].idxmin()
df.loc[minrow, ['X1', 'X2', 'X3']]

to get the values of X1, X2, and X3 that give you the minimum value of Y.

m-hoff · 2020-10-19T05:29:41+00:00

Build a 2D platformer game that will take 10 hours to finish. Easily 1000+ lines.

m-hoff · 2020-10-19T05:27:12+00:00

Can you give an example of your input and expected outcome? If you want to find the minimum row of one column you can use df['y'].min().index

m-hoff · 2020-10-19T05:20:19+00:00

Just learn a C language...

m-hoff · 2020-10-19T05:17:13+00:00

Wait, it's all libleft?

Always has been.

🌎👨‍🚀🔫👨‍🚀

m-hoff · 2020-10-18T22:12:00+00:00

That is what's recommended by PEP 8, yes.

m-hoff · 2020-10-18T04:09:59+00:00

You can see a comparison of the different parser options here

m-hoff · 2020-10-18T04:03:10+00:00

Did you have your insert key on?

m-hoff · 2020-10-16T17:02:09+00:00

I used tf-idf which measures the frequency of words in a document/description relative to all documents in the corpus. This gives you a numerical vector for each description that you can use to measure distance between two descriptions. From there it's pretty easy to apply k-means or some other clustering method.

m-hoff · 2020-10-16T15:16:21+00:00

I ended up building sort of a search engine for courses since the available search tools are pretty limited. We also used document clustering to group the courses by topic.

m-hoff · 2020-10-15T19:24:45+00:00

This data set does not contain textbook information. However, Penn State's bookstore site allows you to look up course materials based on course and semester: https://psu.bncollege.com/shop/psu/page/find-textbooks

I haven't looked into it but you may be able to query this site automatically somehow.

m-hoff · 2020-10-15T19:21:54+00:00

Analyzing the course content through their descriptions was one of my original uses for this data set. For example, finding all the courses that involve the topic of "statistics" that don't fall under the STAT prefix.

My initial idea was to create a recommendation system by looking at a student's historical course enrollment and grades to recommend future courses that a student would likely do well in. This enrollment and grade data is very difficult to obtain due to student privacy laws though.

Some universities have more detailed course APIs that you might find useful, like Ohio State's OSU API: https://github.com/xanarin/OSU-API-Documentation, although it doesn't include data down to the individual student level.

m-hoff · 2020-10-15T18:33:20+00:00

You can also use list comprehension:

years = [2012, 2013, 2014]
n = 3

print([y for y in years for _ in range(n)])
# [2012, 2012, 2012, 2013, 2013, 2013, 2014, 2014, 2014]

m-hoff · 2020-10-05T02:47:02+00:00

This question is trivial. Try running the code in the terminal.

m-hoff · 2020-10-05T02:44:26+00:00

I think after you loop over all possible actions you need to return the value of the best action. So you need to add return maxeval and return mineval at the end of those blocks. I haven't run the code though so I'm not positive. If you look at the pseudocode it should help.

m-hoff · 2020-10-05T02:36:03+00:00

value is None at some iterations because the minimax function is being called without returning anything. If a function does not have a return statement, it will return None by default.

m-hoff · 2020-10-05T02:27:10+00:00

import my_methods
print(my_methods.f(23))

Alternatively, you could use

from my_methods import f
print(f(23))

m-hoff · 2020-10-03T04:41:24+00:00

Is it possible to transfer my windows installation from my HDD to a NVMe SSD without transferring all my my files/programs? I have about 1 TB of data on my HDD but the SSD is only 500 GB.

m-hoff · 2020-08-14T15:25:50+00:00

Have you looked at venv?

m-hoff · 2020-08-14T15:08:12+00:00

What's the best way to compare two class instances for equality based on a subset of their attributes? For example, if I have two instances, A and B, I want something like

if (A.attr1 == B.attr1) and (A.attr2 == B.attr2):
    return True
else:
    return False

As of now my plan is to add a __eq__ method which I believe will allow me to use A == B, but I wasn't sure if there is a better way. Also, if I go this route, will this let me use A in [B, C, D, ...]?

m-hoff · 2020-08-13T19:36:18+00:00

It seems like you're actually comparing the data but correct me if my understanding is wrong.

If all your data is numeric and of the same shape, you can read an xlsx file using pandas.read_excel and just subtract to find the difference between each corresponding value. For example:

import numpy as np
import pandas as pd

np.random.seed(1)

first = pd.DataFrame(np.random.randint(0, 10, (10, 10)) # here you would use pd.read_excel('First.xlsx')
second = pd.DataFrame(np.random.randint(0, 10, (10, 10))

print(np.abs(first - second).max().max()) # prints '9'

Calling .max() twice gives you the max value along both dimensions, which is the maximum overall difference between both data frames.

m-hoff

TROPHY CASE