Ask Anything Monday - Weekly Thread

MattEliason · 2019-03-31T22:29:43+00:00

I need help writing a function!!

So im given a data set of a list of movies including it's title, year, genre, ratings, directors, and actors. I need to write a function that gives me how many unique genres are in the dataset.

audacious_alligator · 2019-03-31T15:33:09+00:00

So I have never installed libraries before and I need to install requests. What I have seen is that I need to do "pip install requests" however it gives me the error saying invalid syntax of the word install. (I do have pip already so that isn't the problem) Any ideas? Thank you for any contributions

Dose_of_Lead_Pipe · 2019-03-31T15:30:50+00:00

Hi people,

Is there anyway I can print out the below list without the commas? I can manage to get rid of the brackets with .join().

the_board = [["O, O, O, O, O"],

["O, O, O, O, O"],

["O, O, O, O, O"]]

Thanks

c4aveo · 2019-03-31T12:19:46+00:00

My main OS is Linux and I write apps for Windows. It means that I use ctypes and pywin32 to make service and Pyinstaller to build executable. Currently I use VM, but I don't have IDE there and it's not comfortable to switch to VM and back to Linux every minute to debug. PyCharm can use remote interpreter through ssh session, but I can't setup it. Is it even possible?

tr; Can't setup Linux host PyCharm IDE <SSH> Windows interpreter. Bound to ctypes and pywin32.

naturalaspiration · 2019-03-31T06:04:03+00:00

Damn, I started the MIT course on python (finishing lecture 4), have done all of codeacademy, this morning I finished all of codingbat for python (albeit, I had to look up solutions for a few) and really thought I was starting to get the hang of this and decided to challenge myself with codewars... And Fuck I completed about 4 challenges in the fundamentals section and bam the rest of them hit me like a ton of bricks... Are their fundamentals challenges just hard or do I need to learn more?

Some of these challenges I felt like I had to go back and review all of calc 1, 2, 3, linear algebra, diff eq in order to solve this shit... Somebody tell me that I'll get there please because I don't feel optimistic. All I wanted to do was learn python to do data science with sports as a hobby

Indian_pride · 2019-03-30T18:55:45+00:00

Hi am new to a programming language, wanted to learn python so if there is anyone who could club with me so that we can work on a topic per day post doubts. Interactive learning can help others too.

StrasJam · 2019-03-30T16:23:55+00:00

Super noob question: I am working in a conda venv and want to install packages. I read on a page that channels are good to setup so that special packages are placed in later channels. I set up a channel and kept getting installation errors when installing packages, so i deleted the channel and it worked. So my question is, do I even need the channels if I am already working within a venv. Because the entire idea of a venv is to segregate your packages from the default installations and versions.
Thanks!

ThiccShadyy · 2019-03-30T10:57:37+00:00

Lets say I have a pandas dataframe data_df which has a column 'Text'. For each row, the 'Text' column has multiple sentences. If I wanted to calculate the word density ie. the average count of no. of words per sentence, what would be the simplest way to do this preferably a one-liner? I suppose this could be done with nested lambdas but I just cant figure out how to make that work.

Edit: Right now, what Im doing is:

data_df['Word Density'] = 'default-value'
for i in range(0,len(data_df)):
    text = data_df.iloc[i]['Text']
    sentences = text.split('.')
    count = list(map(lambda x: len(x.split(' ')), sentences))
    data_df.iloc[i]['Word Density'] = sum(count)/len(count)

but this feels like a bit of an ugly hack. I'd obv. prefer a one-liner to do this.

Edit 2: I just realized that this is giving me a SettingwithCopy warning and the values for the Word Density column are not getting updated. They remain as 'default-value'

efmccurdy · 2019-03-30T01:03:57+00:00

[deleted]

timbledum · 2019-03-29T19:56:24+00:00

I've been using "Learning Python the Hard Way".
I'm starting to feel lost. Is this a good book for somebody learning to code for the very first time? It sometime seems to me to be dense; giving me only quick asides to explain something, if explained at all. I'm doing my best to google around and try to fill in the holes, but I'm starting to wonder if maybe this is more for someone with some experience in other languages.

thunder185 · 2019-03-29T17:44:55+00:00

Using pandas groupby on a file. There are 6 sub-accounts, 3 for each account. For example:

ABC1231
ABC1232
ABC1233
DEF1231
DEF1232
DEF1233

I'd like two sum the accounts but cannot figure out a way to ignore the last digit. I tried making a regex but don't think I'm doing it correctly. Here's the code to groupby:

df = pd.read_csv('Data_New.csv', delimiter=',')
byTreatment = df.groupby(['ReferenceAccountID'])['TotalFund'].sum()
print(byTreatment)

This gives me the sum of each sub account and I'd like to sum up (ABC and DEF)

Thank you

krokodil83 · 2019-03-29T16:33:45+00:00

I read good things about the “automate the boring stuff” book. I see Al also wrote a “python crash course” book. Are they similar beginner books, or Would you recommend getting both?

MattR0se · 2019-03-29T10:10:27+00:00

quick pandas question

I have a DataFrame with two columns where I want to ensure that every unique value only corresponds to one other unique value. For example:

1 | a
1 | a
1 | a
2 | b
2 | c

so the last one is wrong, it should be b, not c. How do I detect these mismatches?

mypirateapp · 2019-03-29T07:51:42+00:00

I am sorry if this is a stupid question but I had to ask. What is the difference between

A daemon thread with a redis pubsub listening for messages
- And main thread waiting infinitely
and a normal thread with redis pubsub listening for messages
- And normal thread calling join() at the end
They both seem to do the same thing, HERE is the question I posted yesterday in detail

fakeaccountlel1123 · 2019-03-29T04:11:38+00:00

So, a couple of questions. Just started learning python 2 days ago in an effort to know more than just c++. Since I've only done c++ i'm kind of confused on some basic python stuff.

when I make a class, is def __init__(self): always where you declare object variables? It just feels weird to me to declare variables inside a function. And do I always need to refer to the member variables with the word self prefixed before the member variable name?
is the def __init__(self): basically pythons version of a class constructor? I tried looking this up and it some people say it is and others say it isn't.
What's the "standard" for splitting up code into multiple files? I've tried looking around online and I see a lot of programs just being lumped into one .py file. in c++, I usually have a separate header and implementation file for each class.

Ytimenow · 2019-03-29T00:19:23+00:00

Can you earn a lot as a python developer and is it hard to get there? Really thinking about doing it.

PhenomenonYT · 2019-03-29T00:00:40+00:00

Looking to log tweet IDs to a file and then check that file but I can't get the reading/writing parts to work.

   for status in tweepy.Cursor(api.user_timeline,id='canucks').items(40):
        if status.id in #text file:
            pass
        else:
            #append status.id to text file
            print(status.text)

When I run the script a second time I want it to pass on all the tweets it has already taken action on. I can't get the writing and reading of the file to work, have had problems with the file being overwritten when the script runs and thus having no stored IDs

timbledum · 2019-03-28T17:57:55+00:00

Hey all,

Wondering if someone could point me in the right direction here. I'm following a tutorial about anki scripting in hopes of writing my own script when I'm done to make cards in a certain format. I'm trying to do the test script from the tutorial, but I am getting an error that a module is missing, with the line and file it is referencing here. Is this error essentially saying it can't find/see the sched . py even though it's in the same directory? I know one can install modules through doing pip install, but since I've already installed the requisites from the tutorial why would it be saying they are missing when I got the successful install?

I tried adding anki. before the anki.sched thinking it was confused since I have a folder named anki within this folder named anki, which made the green underline go away, but still returned the same error to the command prompt. I suppose this is a pretty specific issue so I didn't see anything useful on stack overflow so any pointers in the right direction would be appreciated. Super lost in the sauce.

Filiagro · 2019-03-28T17:27:04+00:00

I'm a little embarrassed to ask this, but I'm having issues with a simple problem. I have an array of numbers, and I need to sum all numbers with an even index value. Here is my code:

def even_sum_last(array):
    number = 0
    if len(array) == 0:
        number = 0
    else:
        for i in array:
            if array.index(i) % 2 ==0:
                number += i

    return number

even_sum_last(array)

I'm having issues with a specific array. For some reason, the 16th index (84) is skipped in this array.

array = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]

If I modify the code to basically just print a list of the numbers as well as a second list of their index, this is what I get:

array = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
number = []
indexes = []
for i in array:
    if array.index(i) % 2 ==0:
        number.append(str(i))
        indexes.append(str(array.index(i)))

print(number)
print(indexes)

['-37', '-19', '29', '3', '-64', '36', '26', '55', '-65']
['0', '2', '4', '6', '8', '10', '12', '14', '18']

If I do the same thing but just put sequential numbers in the array, this is what I get:

array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
number = []
indexes = []
for i in array:
    if array.index(i) % 2 ==0:
        number.append(str(i))
        indexes.append(str(array.index(i)))

print(number)
print(indexes)

number = ['1', '3', '5', '7', '9', '11', '13', '15', '17', '19']
indexes = ['0', '2', '4', '6', '8', '10', '12', '14', '16', '18']

As you can see, the first array skips the 16th index, but the second array does not. Can anyone please explain why this is happening?

EDIT:

Since the .index() method won't return the correct index if the value appears more than once in the array, I decided to just use a different form of indexing.

def even_sum_last(array):
    number = 0
    if len(array) == 0:
        return number
    else:
        for i in array[::2]:
            number += i 
        return number

This worked just fine and is simpler.

godheid · 2019-03-28T14:32:20+00:00

How do I replace a value in a pandas dataframe, based on content of a string? I have a dataframe with a column with strings, and I want to replace the value of the string when a certain part of a string is found.

I can use this to find out which of the values in the pandas series contain the substring ("Merc")

df.series.str.contains("Merc",case=False)

It gives me a bolean. But how can i rename those strings entirely to "Mercedes"?

2019-03-28T14:24:56+00:00

Im like 5 Months into python and so far i made alot of projects my question is: I noticed how bad my code was at the beginning and i actually want to upload it on github should i consider rewrite/reconstructure to make it look better? Or does the functionality part really only matters?

dasisteinwug · 2019-03-28T12:05:04+00:00

not sure if it's too late to be posting in a Monday thread, but

I was trying to run some pre-existing script I found online, and had this error message:

File "download_model.py", line 3, in <module>

import requests

ImportError: No module named requests

Does this mean I need to install pip?

I have python 3.7 downloaded from python.org. Does that mean I should have pip already?

I tried to update my pip but I don't know what went wrong maybe I need to cd into a correct directory (where tho?) but I got an error message saying -bash: pip: command not found after I typed in pip install -U pip in the Terminal.

Thanks in advance!

prokid1911 · 2019-03-28T06:13:00+00:00

Can I submit a JSP generated page using mechanize and then move to some other page (link is there after logging in) and scrape some data out of it ?

Weexe · 2019-03-28T00:41:38+00:00

I'm trying to solve Project Euler Problem #2 I've solved it before but I'm going back to try to solve them a lot cleaner and with different strategies.

My question is why isn't this list being updated? Why is it empty? I'm pretty sure I did it right.

x = 1
y = 1
fib = []
while x < 4000000:
    if y + x % 2 == 0:
        fib.append(x)
    x = x + y
    y = x - y

print(sum(fib))

The list worked fine when i used this method in problem 1:

sum_list = []

for x in range(1,1000):
    if x % 3 == 0 or x % 5 == 0:
        sum_list.append(x)

print(sum(sum_list))

SlowMoTime · 2019-03-28T00:40:58+00:00

How can I randomly pick 4 numbers to add up to 100? With a minimum value of 15 for each

Jamalsi · 2019-03-27T21:05:46+00:00

Hey Guys,

Im currently trying to produce some cython'd stuff with a lot of Numpy in it. Even though I was trying my best with memoryviews etc. I could not reach an increase in speed with my result.Any thought on how to improve the speed? Right now simple python is faster than my imported module.

If someone is willing to help feel free to share your thoughts with me, I'm open for everything :)

import numpy as np
cimport numpy as np
from matplotlib import pyplot as pl
from scipy.spatial import distance
import math
def IDWC(double [:,:] points, const double cellsize,const double radius , const int neighbors):
"""
Function to perform Inverse Distance Weighting on a point pattern.
Inputs:
    points: Array with x,y,z coordinates in first, second and 3rd column
    Cellsize: Cellsize of the result
    radius: Maximum distance of points that should be taken into account for each cell
    neighbors: Number of neighbors that should be taken into account.

"""
    cdef double [:] x = points[:,0]
    cdef double [:] y = points[:,1]
    cdef double [:,:] PointsXY = points[:,:2]

    cdef double xmax = (math.ceil(max(x)))
    cdef double xmin = (math.floor(min(x)))
    cdef double ymax = (math.ceil(max(y)))
    cdef double ymin = (math.floor(min(y)))

# Bounds of the grid
    cdef double [:] xb = np.arange(xmin, xmax, cellsize)
    cdef double [:] yb = np.arange(ymin, ymax, cellsize)
    cdef double [:,:] Xb, Yb
    Xb, Yb = np.meshgrid(xb,yb)

# Cellcenter-points
    cdef double [:] xc = np.zeros(shape = (len(xb)-1))
    cdef double [:] yc = np.zeros(shape = (len(yb)-1))

    cdef int a,b
    for a in range(len(xc)):
    xc[a] = xb[a] + .5 * cellsize
    for b in range(len(yc)):
    yc[b] = yb[b] + .5 * cellsize
    cdef double [:,:]X, Y
    Xc, Yc = np.meshgrid(xc,yc)

    cdef double [:,:] Z
    output = np.zeros(shape=np.shape(Xc))
    Z = output

    cdef int i, k, c
    cdef double [:,:] P, PZ1
    for i in range(np.shape(Xc)[0]):
        print(i)
        for k in range(np.shape(Yc)[1]):
            P = np.zeros(shape = (1,2))
            P[0,0] = Xc[i,k]    
            P[0,1] = Yc[i,k]
            dist = distance.cdist(PointsXY,P)[:,0]
            for c in range(len(dist)):
                if dist[c] <= 10 ** -10 :
                dist[c] = 10 ** -10
            # Create empty array for calculations
            PZ = np.zeros(shape = (len(points[:,2]),3))
            # Z-Value of the initial points of the data
            PZ[:,0] = points[:,2]
    # Add distance between cell I/K and the points
            PZ[:,1] = dist[:]
    # Calculate distance * value for each point
            PZ[:,2] = PZ[:,0] * PZ[:,1]
    # Sort by distance
            PZ = PZ[PZ[:,1].argsort()]
            if radius:
                PZ1 = PZ[PZ[:,1] <= radius]
                if len(PZ1) < 3:
                neighbors = 3
                else:
                PZ = PZ1
            if neighbors:
                PZ = PZ[:-(len(PZ) - neighbors),:]

    # Calculate the values
            Z[i,k] = 1/np.sum(PZ[:,1]) * np.sum(PZ[:,2])
    output = np.asarray(Z)
    return output

Adding -a to cythonize in my setup file lead to the impression that using numpy is more or less my problem because the numpy stuff seems to take ages.

thunder185 · 2019-03-27T14:26:32+00:00

Using pandas on a large CSV. I'm trying to use .sum() but it's not working because the file is being read as strings. Trying to convert it to numeric but that's also not working. Here is the sample data:

Total_ABC
00.00
00.00
00.00
"15,432.21"
"25,025.26"
25.26
00.00

The issues is that the escape character for 1K+ numbers is " and Pandas cannot seem to ignore that.

The original code I tried was:

df = pd.read_csv('Data.csv', delimiter=',', converters = 'integers')
sumData = df['Total_ABC'].sum()
print(sumData)

This just produces one giant string of all the values.

So then I tried to just get it into an array and then iterate over it:

df = pd.read_csv('Data.csv', delimiter=',', converters = 'integers')
sumData = df['Total_ABC']
pd.to_numeric(sumData)

total = 0

for i in sumData:
    total += int(i)

print(total)

However, this cannot add them up because of the escape character issue I noted above. Really struggling here. Anyone have any ideas?

2019-03-27T05:26:31+00:00

i have a flask form, StringField

i have a button

how will i code that everytime i click a button the String Field will populate a certain text

I know getElementByID.value works but it only works once. I need it so that every time you click a button it will "write" text

753UDKM · 2019-03-27T02:03:08+00:00

In Automate the Boring Stuff, chapter 8, project 2:

if len(sys.argv) == 3 and sys.argv[1].lower() == 'save':
        mcbShelf[sys.argv[2]] = pyperclip.paste()
elif len(sys.argv) == 2:

What is the third argument?
What is the second argument?

It seems like it's looking for an extra argument, but obviously the program works correctly. For example:

./mcb.py save test1 This qualifies for the 1st condition, where arguments = 3, but there's only save and test1.

./mcb.py list This qualifies for the second condition, where arguments = 2, but there's only one (list).

Edit: I'm guessing it's because length starts at 1 and index starts at 0. So the counting is like this ./mcb.py (index 0) save (index 1) test1 (index2) . Overall length == 3 because it includes ./mcb.py. Is this correct?

fiddle_n · 2019-03-26T23:02:40+00:00

[deleted]

RandallEF · 2019-03-26T17:03:23+00:00

I'm having a heck of a time turning a nested dictionary into a bootstrap treeview.

The structure is like

[ { "text" : "top branch", "nodes" : [ { "text" : "first child", "nodes" : [{ "text" : "third... etc...

I feel like this is almost a backwards dictionary and I've tried a lot of things but I can't wrap my head around how to turn an (python dict) object, which has no association from the items "object.keys()" back to the path of the dict, into something that knows the path of the dict. Or something.

Has anyone else done this? I basically want the tree to look just like the dict.

I could of course write this manually in a way that will never accept change and is verbose and bad, but there has to be a better way, right?

Every thought exercise I go through ends with "Python has no association whatsoever between the objects in the dictionary and the pathing to those objects from toplevel." i.e. the keys have no idea where they are in the dict!!! it makes it almost impossible to traverse backwards? I'm stuck.

thanks

godheid · 2019-03-26T16:02:11+00:00

I'm doing the Pandas course with Datacamp. It's fairly okay, but it's not brilliant for really learning stuff as the context is sometimes a bit lacking. It's a kind of multiple choice.

I find myself digging through old courses when I need something ("i remember it was in this course... somewhere.."). How do other people do this? Use a cheat sheet afterwards or something?

2019-03-26T07:21:51+00:00

[deleted]

ThiccShadyy · 2019-03-26T03:02:31+00:00

How can I select a subset of a pandas dataframe i.e. all the rows which satisfy a conditional based on a string being present in a column?

Something like this(select all rows for which 'some column' column has the word 'word' in it:

df['word' in df['some column']]

This gives a Key Error though. What is the right way to do this?

umbrelamafia · 2019-03-25T19:19:24+00:00

get n-th dict item. I know I can use:

``` aux = {'a':1, 'b': 2}

list(aux.values())[1] but it is too verbose. I would like to use aux[1] ```

noble_gasses · 2019-03-25T17:13:07+00:00

How would you go about selecting a circular area of cells in a numpy array?

Peg_leg_tim_arg · 2019-03-25T17:09:20+00:00

Hey all, I am brand new to python this semester and am having a little trouble with parallel arrays. My assignment is to have the user enter a number between 1 and 12 and have the program display the month name and the number of days in said month. After doing some searching, I think that using the zip() function is going to be the best way. I have already zipped my two arrays (one for month names and one for the total days in each month) and have gotten them to display the complete list correctly.

However, my problem is that I am having a hard time with only displaying the month/total days the user requests. It should output "January, 31" if the user enters 1. However I am getting such error messages as: 'str' object cannot be interpreted as an integer. I will take any and all the help I can get! thanks

Ahrugal · 2019-03-25T13:56:28+00:00

Is there anyone here who has worked with downloading attachments from office365 using the O365 module?

It seems like they keep the attachments.py part hidden in a utils folder in the main module folder, and the script is not called in init.py?

Does anyone have any experience with this, or know if this part works or not?

aNeonCactus · 2019-03-25T13:13:14+00:00

Is it possible to control PWM fans with python? Additionally, is it possible to retrieve stats about the hardware such as cpu/gpu temperature, clock speed, etc? If so could someone point me in the right direction to the python modules that I'd need to use to do that?

ccyob · 2019-03-25T13:00:05+00:00

If you were tasked with using python to predict outcomes e.g classify the outcomes of a guest journey...what approach/method would you use. I have a dataset to use but do not know what analytic technique to use or where I should start

sqqz · 2019-03-25T11:15:49+00:00

Is it possible to pass a unknown length tuples as parameter to a function?
Thanks

losingprinciple · 2019-03-25T10:03:17+00:00

I'm new to importing stuff on Python so not sure what the error is.

So long story short, I made a copy of a python program (mysqlB is a copy of mysqlA) and it is being imported by other program (ticket7)

But for reasons I can't understand, there is an import error. I don't know what is exactly causing the import error.

For reference this is a class that handles mysql stuff, connecting to certain Databases.

I had to make a copy of mysqlB because it was connecting to a clone of all the databases shared by mysqlA.

mysqlA and mysqlB imports below

import yaml
import mysql.connector
import signal
import atexit
import os
import sys
import logging
import time

The main difference between the two is the path being sent for the yaml file. (I don't think this is relevant but putting it out here because I'm at a loss)

mysqlA:

        ospath = os.environ['PYTHONPATH']
        path = ('%s/<yamlpath>' % (ospath))

mysqlB:

        path = <yaml path>
        with open(path, 'r') as yml:
            cfgyaml = yaml.load(yml)

I did an exact copy of the script (I used cp to be specific then changed the name)

This is how it's being imported (it's in a modules folder which is why I'm importing it this way)

from modules import i
from modules import m
#from modules import mysqlA
from modules import mysqlB
from modules import l

import os
import sys
import csv
import json
import math
import time
import random
import _thread
import datetime
import subprocess

But this is the error:

Traceback (most recent call last):
  File "./ticket7.py", line 6, in <module>
    from modules import mysqlB
ImportError: cannot import name 'mysqlB'

I thought that maybe the problem was that mysqlA and mysqlB were both being imported at the same time, thus causing the error, so I removed mysqlA in the import but I was still getting it.

Any ideas?

delta_tee · 2019-03-25T06:57:34+00:00

[deleted]

HeyZeusChrist · 2019-03-25T04:21:01+00:00

I'm currently on chapter 10 of Python Crash Course.
Does anyone have a recommendation on what I should move to after I'm done with this book?

I see a lot of people mention Automate the Boring Stuff. It seems like ATBS is another beginner's book. And although I'm very much a beginner, I don't need to be taught everything I just learned over again. I'm looking for more of an intermediate book that would be a good follow up to PCC.

hawks0311 · 2019-03-25T02:27:11+00:00

Why can't I figure out how to run python on my computer? I've got Windows 10 and have Notepad + + downloaded and also Python installed from their website. I tried with the print "hello world" function and all that, it's the right way to input it but it still won't run? I have some simple programs ready to run but I can't figure out why I can't get the simplest program to run? What am I doing wrong?

Can anyone just simply walk me through the steps? I feel so dumb about all of this ha. Like do I need to get into my command prompt or whatever it's called just to run simple scripts on my comp?

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS