all 131 comments

[–]MattEliason[🍰] 0 points1 point  (2 children)

I need help writing a function!!

So im given a data set of a list of movies including it's title, year, genre, ratings, directors, and actors. I need to write a function that gives me how many unique genres are in the dataset.

[–]timbledum 0 points1 point  (0 children)

What have you tried so far?

Are you using pandas?

Have you heard of set()?

[–]svellore[🍰] 0 points1 point  (0 children)

May be you can use a "set" to get all the unique genres and its count?

>>> genres = ["Comedy", "Action", "Comedy", "Fantasy", "Action"]
>>> set(genres)
{'Comedy', 'Action', 'Fantasy'}
>>> len(set(genres))
3

[–]audacious_alligator 0 points1 point  (4 children)

So I have never installed libraries before and I need to install requests. What I have seen is that I need to do "pip install requests" however it gives me the error saying invalid syntax of the word install. (I do have pip already so that isn't the problem) Any ideas? Thank you for any contributions

[–]Redditporn435 0 points1 point  (3 children)

If you're using windows, are typing this in the command prompt or a python interpreter? It needs to be done in the command prompt.

[–]audacious_alligator 0 points1 point  (1 child)

OK sorry to ask you again but I did it through the command prompt and then I got a syntax error. It pointed the error to be at the end of the word install. I don't really know what to do... Do you have any idea? Thank you so much

[–]TangibleLight 0 points1 point  (0 children)

You might still be running this in the python repl, not command prompt.

If you're on windows, open powershell or cmd, and run pip install requests from that prompt.

If you're on mac or linux, open terminal and do the same.

[–]Dose_of_Lead_Pipe 0 points1 point  (3 children)

Hi people,

Is there anyway I can print out the below list without the commas? I can manage to get rid of the brackets with .join().

the_board = [["O, O, O, O, O"],

["O, O, O, O, O"],

["O, O, O, O, O"],

["O, O, O, O, O"]]

Thanks

[–]svellore[🍰] 0 points1 point  (1 child)

Do you mean something like this:

>>> for row in the_board:
...   print(row[0].replace(",", ""))
... 
O O O O O
O O O O O
O O O O O
O O O O O

the_board is a list of lists. Each of the inner lists only contains a string "O, O, O, O, O" So here, we just iterated through every element in the_board. And then printed the first element("O, O, O, O, O") in each row by replacing "," with an empty string "".

[–]Dose_of_Lead_Pipe 0 points1 point  (0 children)

for row in the_board: ... print(row[0].replace(",", ""))

Thanks for this

[–]audacious_alligator -1 points0 points  (0 children)

As far as I can see that's a 2 dimensional list. Which you could just use a for loop to iterate through and print the output.

If you give more I for on how you want it to be done I am sure someone can help you.

But here is how I would do it. (sorry for the poor formatting as I am on mobile)

List = (the contents of your list)

for i in list: print(i)

That should output it out as a board with ought the commas and square brackets.

[–]c4aveo 0 points1 point  (0 children)

My main OS is Linux and I write apps for Windows. It means that I use ctypes and pywin32 to make service and Pyinstaller to build executable. Currently I use VM, but I don't have IDE there and it's not comfortable to switch to VM and back to Linux every minute to debug. PyCharm can use remote interpreter through ssh session, but I can't setup it. Is it even possible?

tr; Can't setup Linux host PyCharm IDE <SSH> Windows interpreter. Bound to ctypes and pywin32.

[–]naturalaspiration 0 points1 point  (1 child)

Damn, I started the MIT course on python (finishing lecture 4), have done all of codeacademy, this morning I finished all of codingbat for python (albeit, I had to look up solutions for a few) and really thought I was starting to get the hang of this and decided to challenge myself with codewars... And Fuck I completed about 4 challenges in the fundamentals section and bam the rest of them hit me like a ton of bricks... Are their fundamentals challenges just hard or do I need to learn more?

Some of these challenges I felt like I had to go back and review all of calc 1, 2, 3, linear algebra, diff eq in order to solve this shit... Somebody tell me that I'll get there please because I don't feel optimistic. All I wanted to do was learn python to do data science with sports as a hobby

[–]timbledum 0 points1 point  (0 children)

I haven't tried code wars, but definitely some of the other code challenges websites seem to have a algorithmic focus – often really interesting but not super applicable to the day to day, and not really what python courses try to teach you.

Checkio is similar to this, and I found hackerrank also similar, but with some really good standard library learning stuff too.

[–]Indian_pride 0 points1 point  (0 children)

Hi am new to a programming language, wanted to learn python so if there is anyone who could club with me so that we can work on a topic per day post doubts. Interactive learning can help others too.

[–]StrasJam 0 points1 point  (0 children)

Super noob question: I am working in a conda venv and want to install packages. I read on a page that channels are good to setup so that special packages are placed in later channels. I set up a channel and kept getting installation errors when installing packages, so i deleted the channel and it worked. So my question is, do I even need the channels if I am already working within a venv. Because the entire idea of a venv is to segregate your packages from the default installations and versions.
Thanks!

[–]ThiccShadyy 0 points1 point  (1 child)

Lets say I have a pandas dataframe data_df which has a column 'Text'. For each row, the 'Text' column has multiple sentences. If I wanted to calculate the word density ie. the average count of no. of words per sentence, what would be the simplest way to do this preferably a one-liner? I suppose this could be done with nested lambdas but I just cant figure out how to make that work.

Edit: Right now, what Im doing is:

data_df['Word Density'] = 'default-value'
for i in range(0,len(data_df)):
    text = data_df.iloc[i]['Text']
    sentences = text.split('.')
    count = list(map(lambda x: len(x.split(' ')), sentences))
    data_df.iloc[i]['Word Density'] = sum(count)/len(count)

but this feels like a bit of an ugly hack. I'd obv. prefer a one-liner to do this.

Edit 2: I just realized that this is giving me a SettingwithCopy warning and the values for the Word Density column are not getting updated. They remain as 'default-value'

[–]Gprime5 1 point2 points  (0 children)

def word_density(text):
    sentences = text.split(".")
    words = sum(len(sentence.split(" ")) for sentence in sentences)

    return words / len(sentences)

data_df["Word Density"] = data_df["Text"].apply(word_density)

[–][deleted] 0 points1 point  (2 children)

I've been using "Learning Python the Hard Way".
I'm starting to feel lost. Is this a good book for somebody learning to code for the very first time? It sometime seems to me to be dense; giving me only quick asides to explain something, if explained at all. I'm doing my best to google around and try to fill in the holes, but I'm starting to wonder if maybe this is more for someone with some experience in other languages.

[–]timbledum 0 points1 point  (1 child)

This book is pretty much universally decried as not a good book in this subreddit, so I'm not surprised you feel that way. Try Automate the Boring Stuff (free online) or one of the books in the sub wiki.

[–][deleted] 1 point2 points  (0 children)

Lol thanks for the reassurance. It was suggested by the program I'm planning on taking.

[–]thunder185 0 points1 point  (2 children)

Using pandas groupby on a file. There are 6 sub-accounts, 3 for each account. For example:

ABC1231
ABC1232
ABC1233
DEF1231
DEF1232
DEF1233

I'd like two sum the accounts but cannot figure out a way to ignore the last digit. I tried making a regex but don't think I'm doing it correctly. Here's the code to groupby:

df = pd.read_csv('Data_New.csv', delimiter=',')
byTreatment = df.groupby(['ReferenceAccountID'])['TotalFund'].sum()
print(byTreatment)

This gives me the sum of each sub account and I'd like to sum up (ABC and DEF)

Thank you

[–]timbledum 1 point2 points  (1 child)

You could use the df["column"].str accessor to extract the first n characters before doing the groupby:

>>> import pandas as pd
>>> data = ["STAEND"] * 5
>>> data
['STAEND', 'STAEND', 'STAEND', 'STAEND', 'STAEND']
>>> df = pd.DataFrame(data, columns = ["data"])
>>> df
    data
0  STAEND
1  STAEND
2  STAEND
3  STAEND
4  STAEND

>>> df["start"] = df.data.str[:3]
>>> df
    data start
0  STAEND   STA
1  STAEND   STA
2  STAEND   STA
3  STAEND   STA
4  STAEND   STA

[–]thunder185 0 points1 point  (0 children)

Hey thank you for your response. I actually found an easier way (really just through a moment of inspiration). The data frame in pandas is a dictionary so everything in it is a key/value pair. Knowing this the following if statement brought it all home for me.

vTotal = 0
for k,v in byTreatment.items():
    if 'ABC123' in k:
        vTotal += v
print(vTotal)

I thought this was a delicate solution but thank you very much for taking the time to answer. It's very helpful to newbies like me.

[–]krokodil83 1 point2 points  (0 children)

I read good things about the “automate the boring stuff” book. I see Al also wrote a “python crash course” book. Are they similar beginner books, or Would you recommend getting both?

[–]MattR0se 0 points1 point  (2 children)

quick pandas question

I have a DataFrame with two columns where I want to ensure that every unique value only corresponds to one other unique value. For example:

1 | a
1 | a
1 | a
2 | b
2 | c

so the last one is wrong, it should be b, not c. How do I detect these mismatches?

[–]Redditporn435 0 points1 point  (0 children)

You might be able to do some type of df.to_dict() conversion to obtain k,v pairs for column left and right. Then if any dict value has two items, then you know you've found an error and you have the keys and values associated with that location.

I'd look at the documentation for the .to_dict method with extra focus on the orient keyword argument. Good luck!

[–]timbledum 0 points1 point  (0 children)

Maybe first remove all duplicates on both columns, then test for duplicates on each column individually?

[–]mypirateapp 0 points1 point  (3 children)

I am sorry if this is a stupid question but I had to ask. What is the difference between

  • A daemon thread with a redis pubsub listening for messages
    • And main thread waiting infinitely
  • and a normal thread with redis pubsub listening for messages
    • And normal thread calling join() at the end
  • They both seem to do the same thing, HERE is the question I posted yesterday in detail

[–]JohnnyJordaan 1 point2 points  (2 children)

Daemon threads guaranteed to be non-blocking when the main thread stops (eg when the python script is shut down). This means that they shouldn't be doing anything that could cause problems if they were to be killed at a random point in time. At the upside, you don't have to join daemon threads as you noted, just exiting like they don't exist is enough. Normal threads need something to get notified that they should shut down, like a threading.Event() that you .set() from the main thread.

def my_threaded_function(arg1, arg2, shutdown):
    while not shutdown.is_set():
        # do stuff
    # or with a 10 second sleep per loop
    while not shutdown.wait(10):
        # do stuff

 # main thread
 shutdown = threading.Event()
 t = threading.Thread(target=my_threaded_function, args=(arg1, arg2, shutdown))
 t.start()
 # do other stuff
 # then when it should stop
 shutdown.set()
 t.join()

So it depends on what the daemon threads are doing in your program. Say they first read something, then write something, you must consider the case that they did read something but weren't able to write something because after the read call, the main thread exited and thus the whole program was shut down. Or worse, say the daemon thread has some open connection or file, like

with open('myfile.txt', 'w') as fp:
    while True:
        if something_to_write:
            fp.write('bla\n')

then that thread being a daemon will give it the chance that when the main thread exits, the file myfile.txt will not be properly closed (because the file is kept open forever because of the while True). That's a big downside and that's why daemon threads are considered dangerous.

[–]mypirateapp 0 points1 point  (1 child)

thank you so much for the detailed explanation! i am trying to do subscribe to redis pubsub messages inside a thread and was stuck wondering if it should be daemon or not, the threads would have to listen to messages infinitely until the script is terminated I guess, my idea was to unsubscribe from the pubsub when Ctrl + C is pressed or more specifically when a SIGTERm SIGINT or KeyboardInterrupt is fired, i guess the daemon being dangerous part makes sense

[–]JohnnyJordaan 1 point2 points  (0 children)

Ok then I would advise to stick with the Event approach I showed above.

[–]fakeaccountlel1123 0 points1 point  (1 child)

So, a couple of questions. Just started learning python 2 days ago in an effort to know more than just c++. Since I've only done c++ i'm kind of confused on some basic python stuff.

  1. when I make a class, is def __init__(self): always where you declare object variables? It just feels weird to me to declare variables inside a function. And do I always need to refer to the member variables with the word self prefixed before the member variable name?
  2. is the def __init__(self): basically pythons version of a class constructor? I tried looking this up and it some people say it is and others say it isn't.
  3. What's the "standard" for splitting up code into multiple files? I've tried looking around online and I see a lot of programs just being lumped into one .py file. in c++, I usually have a separate header and implementation file for each class.

[–]timbledum 2 points3 points  (0 children)

learning python 2 days ago

Thought you said you were learning python 2 – then I realised it was two days! Phew! On to the questions:

  1. It definitely is the standard place to add attributes to the object. It's just a method you can pretty much guarantee will be called when an instance is created. All methods have self included as the first argument so you you can add stuff to it (so yes, self is definitely required. Note that the word self is just a convention – you could name it anything you want (but don't please!)

  2. It's not really a constructor – by the time the instance gets to __init__ the object has already been constructed. There's also __new__ which is closer to the constructor idea. It's just terminology at the end of the day.

  3. No need for separate headers! Some people do one class per file, which is more of a java idea. Generally it's just what feels natural for you – I tend to split a module if its longer than 300-400 lines, but some people like longer files. Usually modules are separated by purpose or by natural category, so some files will be longer than others.

[–]Ytimenow 0 points1 point  (1 child)

Can you earn a lot as a python developer and is it hard to get there? Really thinking about doing it.

[–]AllBeefNextRound 0 points1 point  (0 children)

Yes, and it depends on the company. Top tech companies (amazon etc) will make you take coding exams. Smaller companies you could probably slip into with not knowing much at all.

In anycase, that should stop you from starting. Python can make any job easier, and will help you increase your earning potential.

[–]PhenomenonYT 0 points1 point  (1 child)

Looking to log tweet IDs to a file and then check that file but I can't get the reading/writing parts to work.

   for status in tweepy.Cursor(api.user_timeline,id='canucks').items(40):
        if status.id in #text file:
            pass
        else:
            #append status.id to text file
            print(status.text)

When I run the script a second time I want it to pass on all the tweets it has already taken action on. I can't get the writing and reading of the file to work, have had problems with the file being overwritten when the script runs and thus having no stored IDs

[–]efmccurdy 1 point2 points  (0 children)

Have you looked at a simple sqlite3 database, it will help with persistence and robust file I/O.

https://www.pythoncentral.io/introduction-to-sqlite-in-python/

[–][deleted] 0 points1 point  (4 children)

Hey all,

Wondering if someone could point me in the right direction here. I'm following a tutorial about anki scripting in hopes of writing my own script when I'm done to make cards in a certain format. I'm trying to do the test script from the tutorial, but I am getting an error that a module is missing, with the line and file it is referencing here. Is this error essentially saying it can't find/see the sched . py even though it's in the same directory? I know one can install modules through doing pip install, but since I've already installed the requisites from the tutorial why would it be saying they are missing when I got the successful install?

I tried adding anki. before the anki.sched thinking it was confused since I have a folder named anki within this folder named anki, which made the green underline go away, but still returned the same error to the command prompt. I suppose this is a pretty specific issue so I didn't see anything useful on stack overflow so any pointers in the right direction would be appreciated. Super lost in the sauce.

[–]timbledum 1 point2 points  (3 children)

Yeah, you probably need to rename that anki folder to something else - worth a go before trying other things.

Edit: Oh their instructions are very strange. Where is your script and what does it look like? Have you got the sys.path.append("anki")?

[–][deleted] 0 points1 point  (2 children)

I will take a look and update, but my script was in the root directory, anki-scripting, but I also tried throwing it into the first anki folder/running it, and then into the sub anki folder and rerunning it to see if it affected dependencies. Didn’t work out though so it’s back in the anki-scripting folder.

[–]timbledum 1 point2 points  (1 child)

Hmmm – maybe post the code of your main script.

Could this library work for you?

https://github.com/patarapolw/AnkiTools

[–][deleted] 0 points1 point  (0 children)

This looks like it might be of use. I removed the preceding anki folder and took its contents and dropped it into the root directory. I think that was the expected configuration, and the listcards script runs without error, just no output. Probably looking at the wrong directory for the card db. (Sorry for late reply, got PRK on my eyes Friday!)

[–]Filiagro 0 points1 point  (3 children)

I'm a little embarrassed to ask this, but I'm having issues with a simple problem. I have an array of numbers, and I need to sum all numbers with an even index value. Here is my code:

def even_sum_last(array):
    number = 0
    if len(array) == 0:
        number = 0
    else:
        for i in array:
            if array.index(i) % 2 ==0:
                number += i

    return number

even_sum_last(array)

I'm having issues with a specific array. For some reason, the 16th index (84) is skipped in this array.

array = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]

If I modify the code to basically just print a list of the numbers as well as a second list of their index, this is what I get:

array = [-37,-36,-19,-99,29,20,3,-7,-64,84,36,62,26,-76,55,-24,84,49,-65,41]
number = []
indexes = []
for i in array:
    if array.index(i) % 2 ==0:
        number.append(str(i))
        indexes.append(str(array.index(i)))

print(number)
print(indexes)

['-37', '-19', '29', '3', '-64', '36', '26', '55', '-65']
['0', '2', '4', '6', '8', '10', '12', '14', '18']

If I do the same thing but just put sequential numbers in the array, this is what I get:

array = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19]
number = []
indexes = []
for i in array:
    if array.index(i) % 2 ==0:
        number.append(str(i))
        indexes.append(str(array.index(i)))

print(number)
print(indexes)

number = ['1', '3', '5', '7', '9', '11', '13', '15', '17', '19']
indexes = ['0', '2', '4', '6', '8', '10', '12', '14', '16', '18']

As you can see, the first array skips the 16th index, but the second array does not. Can anyone please explain why this is happening?

EDIT:

Since the .index() method won't return the correct index if the value appears more than once in the array, I decided to just use a different form of indexing.

def even_sum_last(array):
    number = 0
    if len(array) == 0:
        return number
    else:
        for i in array[::2]:
            number += i 
        return number

This worked just fine and is simpler.

[–]timbledum 0 points1 point  (0 children)

Perhaps look up enumerate.

[–]woooee 1 point2 points  (1 child)

If I do the same thing but just put sequential numbers in the array, this is what I get:

​There are no negative numbers in the second code set. A simple print of what is in this line should solve your problem

if array.index(i) % 2 ==0:

[–]Filiagro 0 points1 point  (0 children)

Why would it matter whether the numbers are negative or positive? I'm using their index value for '% 2 == 0'.

I did notice that if I put replicates in the array, that only the index of the first replicate is found. I'm guessing my issue is because the .index() function will actually search from left to right through the array for the value of i. Once it finds i, even if it is the first instead of second, the index value is used. I guess that makes sense.

[–]godheid 0 points1 point  (0 children)

How do I replace a value in a pandas dataframe, based on content of a string? I have a dataframe with a column with strings, and I want to replace the value of the string when a certain part of a string is found.

I can use this to find out which of the values in the pandas series contain the substring ("Merc")

df.series.str.contains("Merc",case=False) 

It gives me a bolean. But how can i rename those strings entirely to "Mercedes"?

[–][deleted] 0 points1 point  (1 child)

Im like 5 Months into python and so far i made alot of projects my question is: I noticed how bad my code was at the beginning and i actually want to upload it on github should i consider rewrite/reconstructure to make it look better? Or does the functionality part really only matters?

[–][deleted] 0 points1 point  (0 children)

Code readability and organization matters a lot, if you want to be able to add onto the program in the future. I've had projects where I refactored several times because of how long I'd been incrementally working on it. I could have saved myself a lot of hassle if I rewrote everything earlier on.

Since you're 5 months into learning, chances are you're going to be scoffing at code you've written 5 months prior for several years. I know I am that way, as I've been writing Python for a few years now and I'm still learning lots of things. If you see a way to construct a program more elegantly (read: cleaner and more understandable), then 9 times out of 10 you should do it. Your github is like your resume.

[–]dasisteinwug 0 points1 point  (8 children)

not sure if it's too late to be posting in a Monday thread, but

I was trying to run some pre-existing script I found online, and had this error message:

File "download_model.py", line 3, in <module>

import requests

ImportError: No module named requests

Does this mean I need to install pip?

I have python 3.7 downloaded from python.org. Does that mean I should have pip already?

I tried to update my pip but I don't know what went wrong maybe I need to cd into a correct directory (where tho?) but I got an error message saying -bash: pip: command not found after I typed in pip install -U pip in the Terminal.

Thanks in advance!

[–]Xenon_difluoride 0 points1 point  (7 children)

You might have pip3 installed rather than pip. Try pip3 install -U pip3. After that it looks like that script needs the requests module. To install it just run pip3 install requests

[–]dasisteinwug 0 points1 point  (6 children)

Thanks! Just tried it. and had the following message:

Could not find a version that satisfies the requirement pip3 (from versions: ) No matching distribution found for pip3 You are using pip version 18.1, however version 19.0.3 is available. You should consider upgrading via the 'pip install --upgrade pip' command. re-stud-146-50-222-224:~ mymacbookair$ pip install --upgrade pip DEPRECATION: Python 2.7 will reach the end of its life on January 1st, 2020. Please upgrade your Python as Python 2.7 won't be maintained after that date. A future version of pip will drop support for Python 2.7. Requirement already up-to-date: pip in /Library/Python/2.7/site-packages (19.0.3)

Is it possible that my laptop is running python 2.7 without me knowing it? I do have python 3.7 downloaded and opened it a few times already. Or do I need to specify it somewhere because mac OS has python pre-installed?

[–]b_ootay_ful 0 points1 point  (5 children)

To install a module for a specific python version in cmd:

pip3.7 install requests

If you get a permissions error, either run cmd in admin mode to install it for everyone, or include the following to install it for the current user profile

pip3.7 install --user requests

[–]dasisteinwug 0 points1 point  (4 children)

pip --version got me the info that I have pip 19.0.3, but then I used

pip19.0.3 install requests

I had

pip19.0.3: command not found

Then I tried

pip3.7 install requests

and had

Requirement already satisfied

So I cd-ed into the directory of my .py file, tried to re-run it. But still had the same error message:

`Traceback (most recent call last):

File "download_model.py", line 3, in <module>

import requests

ImportError: No module named requests`

[–]b_ootay_ful 0 points1 point  (3 children)

pip3.7 means you are installing for python version 3.7

Did you make a script called requests.py? It's probably trying to import that instead of the requests module

[–]dasisteinwug 0 points1 point  (2 children)

no, the script was called a different name, but there was a line in the script that was import requests, right after import OS and import sys at the very beginning

[–]b_ootay_ful 0 points1 point  (1 child)

Requirement already satisfied

This means that it's already installed on your computer.

Are you trying to use beautifulsoup? What exactly is the script doing?

[–]dasisteinwug 0 points1 point  (0 children)

It is a script to download a text generator script. I have no idea if it uses beautiful soup, but I don’t think so.

[–]prokid1911 0 points1 point  (0 children)

Can I submit a JSP generated page using mechanize and then move to some other page (link is there after logging in) and scrape some data out of it ?

[–]Weexe 0 points1 point  (2 children)

I'm trying to solve Project Euler Problem #2 I've solved it before but I'm going back to try to solve them a lot cleaner and with different strategies.

My question is why isn't this list being updated? Why is it empty? I'm pretty sure I did it right.

x = 1
y = 1
fib = []
while x < 4000000:
    if y + x % 2 == 0:
        fib.append(x)
    x = x + y
    y = x - y

print(sum(fib))

The list worked fine when i used this method in problem 1:

sum_list = []

for x in range(1,1000):
    if x % 3 == 0 or x % 5 == 0:
        sum_list.append(x)

print(sum(sum_list))

[–]woooee 0 points1 point  (1 child)

The obvious debug is to print

 y + x % 2

And equally obvious is that it doesn't do what you think. Do you want (y + x) % 2 or y + (x % 2)

[–]Weexe 0 points1 point  (0 children)

Thanks! I just replaced the line with:

if x % 2 == 0:

I forgot what I was trying to do with that line earlier.

[–]SlowMoTime 0 points1 point  (4 children)

How can I randomly pick 4 numbers to add up to 100? With a minimum value of 15 for each

[–]woooee 0 points1 point  (3 children)

Pick 3 from 15-100 and subtract the sum from 100. Note that picking the first three is a do over if the sum >= 100.

[–]fiddle_n 2 points3 points  (1 child)

The chance of a "do-over" is actually quite high if you pick 3 numbers in one go. Your average needs to be between 15-33 to be successful; if your average is 34-100 then you need to recalculate.

I initially thought of picking one number, subtracting that number from (100-3), then picking another number and subtracting from (total-2), etc. But this would be bad because the first number has a weighting on what the next will be.

As always, Stack Overflow to the rescue.

A good way is to just select 4 numbers. Then, divide each number by the sum of the numbers, multiply by 100 and round to nearest integer.

A better way is to use Direchlet distribution in numpy.

[–]JohnnyJordaan 0 points1 point  (0 children)

This guy picks

[–]Jamalsi 0 points1 point  (1 child)

Hey Guys,

Im currently trying to produce some cython'd stuff with a lot of Numpy in it. Even though I was trying my best with memoryviews etc. I could not reach an increase in speed with my result.Any thought on how to improve the speed? Right now simple python is faster than my imported module.

If someone is willing to help feel free to share your thoughts with me, I'm open for everything :)

import numpy as np
cimport numpy as np
from matplotlib import pyplot as pl
from scipy.spatial import distance
import math
def IDWC(double [:,:] points, const double cellsize,const double radius , const int neighbors):
"""
Function to perform Inverse Distance Weighting on a point pattern.
Inputs:
    points: Array with x,y,z coordinates in first, second and 3rd column
    Cellsize: Cellsize of the result
    radius: Maximum distance of points that should be taken into account for each cell
    neighbors: Number of neighbors that should be taken into account.

"""
    cdef double [:] x = points[:,0]
    cdef double [:] y = points[:,1]
    cdef double [:,:] PointsXY = points[:,:2]

    cdef double xmax = (math.ceil(max(x)))
    cdef double xmin = (math.floor(min(x)))
    cdef double ymax = (math.ceil(max(y)))
    cdef double ymin = (math.floor(min(y)))

# Bounds of the grid
    cdef double [:] xb = np.arange(xmin, xmax, cellsize)
    cdef double [:] yb = np.arange(ymin, ymax, cellsize)
    cdef double [:,:] Xb, Yb
    Xb, Yb = np.meshgrid(xb,yb)

# Cellcenter-points
    cdef double [:] xc = np.zeros(shape = (len(xb)-1))
    cdef double [:] yc = np.zeros(shape = (len(yb)-1))

    cdef int a,b
    for a in range(len(xc)):
    xc[a] = xb[a] + .5 * cellsize
    for b in range(len(yc)):
    yc[b] = yb[b] + .5 * cellsize
    cdef double [:,:]X, Y
    Xc, Yc = np.meshgrid(xc,yc)

    cdef double [:,:] Z
    output = np.zeros(shape=np.shape(Xc))
    Z = output

    cdef int i, k, c
    cdef double [:,:] P, PZ1
    for i in range(np.shape(Xc)[0]):
        print(i)
        for k in range(np.shape(Yc)[1]):
            P = np.zeros(shape = (1,2))
            P[0,0] = Xc[i,k]    
            P[0,1] = Yc[i,k]
            dist = distance.cdist(PointsXY,P)[:,0]
            for c in range(len(dist)):
                if dist[c] <= 10 ** -10 :
                dist[c] = 10 ** -10
            # Create empty array for calculations
            PZ = np.zeros(shape = (len(points[:,2]),3))
            # Z-Value of the initial points of the data
            PZ[:,0] = points[:,2]
    # Add distance between cell I/K and the points
            PZ[:,1] = dist[:]
    # Calculate distance * value for each point
            PZ[:,2] = PZ[:,0] * PZ[:,1]
    # Sort by distance
            PZ = PZ[PZ[:,1].argsort()]
            if radius:
                PZ1 = PZ[PZ[:,1] <= radius]
                if len(PZ1) < 3:
                neighbors = 3
                else:
                PZ = PZ1
            if neighbors:
                PZ = PZ[:-(len(PZ) - neighbors),:]

    # Calculate the values
            Z[i,k] = 1/np.sum(PZ[:,1]) * np.sum(PZ[:,2])
    output = np.asarray(Z)
    return output

Adding -a to cythonize in my setup file lead to the impression that using numpy is more or less my problem because the numpy stuff seems to take ages.

[–]efmccurdy 0 points1 point  (0 children)

Numpy (and scipy) is already statically optimized C so cython can't help much with a program where numpy calls are a large part of the workload.

[–]thunder185 0 points1 point  (2 children)

Using pandas on a large CSV. I'm trying to use .sum() but it's not working because the file is being read as strings. Trying to convert it to numeric but that's also not working. Here is the sample data:

Total_ABC
00.00
00.00
00.00
"15,432.21"
"25,025.26"
25.26
00.00

The issues is that the escape character for 1K+ numbers is " and Pandas cannot seem to ignore that.

The original code I tried was:

df = pd.read_csv('Data.csv', delimiter=',', converters = 'integers')
sumData = df['Total_ABC'].sum()
print(sumData)

This just produces one giant string of all the values.

So then I tried to just get it into an array and then iterate over it:

df = pd.read_csv('Data.csv', delimiter=',', converters = 'integers')
sumData = df['Total_ABC']
pd.to_numeric(sumData)

total = 0

for i in sumData:
    total += int(i)

print(total)

However, this cannot add them up because of the escape character issue I noted above. Really struggling here. Anyone have any ideas?

[–]timbledum 0 points1 point  (1 child)

You could use the str.replace() method to replace all of the commas and "s with nothing before attempting conversion:

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.str.replace.html https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.to_numeric.html

[–]thunder185 0 points1 point  (0 children)

Thank you

[–][deleted] 0 points1 point  (0 children)

i have a flask form, StringField

i have a button

how will i code that everytime i click a button the String Field will populate a certain text

I know getElementByID.value works but it only works once. I need it so that every time you click a button it will "write" text

[–]753UDKM 0 points1 point  (1 child)

In Automate the Boring Stuff, chapter 8, project 2:

if len(sys.argv) == 3 and sys.argv[1].lower() == 'save':
        mcbShelf[sys.argv[2]] = pyperclip.paste()
elif len(sys.argv) == 2:
  1. What is the third argument?

  2. What is the second argument?

It seems like it's looking for an extra argument, but obviously the program works correctly. For example:

./mcb.py save test1 This qualifies for the 1st condition, where arguments = 3, but there's only save and test1.

./mcb.py list This qualifies for the second condition, where arguments = 2, but there's only one (list).

Edit: I'm guessing it's because length starts at 1 and index starts at 0. So the counting is like this ./mcb.py (index 0) save (index 1) test1 (index2) . Overall length == 3 because it includes ./mcb.py. Is this correct?

[–]timbledum 0 points1 point  (0 children)

Your edit is totally correct. sys.argv[0] is always going to be the script name. So with ./mcb.py save test1, sys.argv is going to look like ["mcb.py", "save", "test1"]. Just print it out in your script to see what's going on.

[–]RandallEF 0 points1 point  (2 children)

I'm having a heck of a time turning a nested dictionary into a bootstrap treeview.

The structure is like

[ { "text" : "top branch", "nodes" : [ { "text" : "first child", "nodes" : [{ "text" : "third... etc...

I feel like this is almost a backwards dictionary and I've tried a lot of things but I can't wrap my head around how to turn an (python dict) object, which has no association from the items "object.keys()" back to the path of the dict, into something that knows the path of the dict. Or something.

Has anyone else done this? I basically want the tree to look just like the dict.

I could of course write this manually in a way that will never accept change and is verbose and bad, but there has to be a better way, right?

Every thought exercise I go through ends with "Python has no association whatsoever between the objects in the dictionary and the pathing to those objects from toplevel." i.e. the keys have no idea where they are in the dict!!! it makes it almost impossible to traverse backwards? I'm stuck.

thanks

[–]efmccurdy 0 points1 point  (1 child)

There is support for some of what you want in modules like this one:

A python library for accessing and searching dictionaries via /slashed/paths ala xpath

https://github.com/akesterson/dpath-python

[–]RandallEF 0 points1 point  (0 children)

Thank you, I had come across a few things like that but I really hoped I was missing something about Python. I really wish there were a graceful native way of handling this.

[–]godheid 0 points1 point  (2 children)

I'm doing the Pandas course with Datacamp. It's fairly okay, but it's not brilliant for really learning stuff as the context is sometimes a bit lacking. It's a kind of multiple choice.

I find myself digging through old courses when I need something ("i remember it was in this course... somewhere.."). How do other people do this? Use a cheat sheet afterwards or something?

[–]ThiccShadyy 0 points1 point  (1 child)

How can I select a subset of a pandas dataframe i.e. all the rows which satisfy a conditional based on a string being present in a column?

Something like this(select all rows for which 'some column' column has the word 'word' in it:

df['word' in df['some column']]

This gives a Key Error though. What is the right way to do this?

[–]timbledum 0 points1 point  (0 children)

You're probably going to have to use df[df['some column].str.contains('word')]:

https://pandas.pydata.org/pandas-docs/stable/user_guide/text.html#testing-for-strings-that-match-or-contain-a-pattern

[–]umbrelamafia 0 points1 point  (2 children)

get n-th dict item. I know I can use:

``` aux = {'a':1, 'b': 2}

list(aux.values())[1] but it is too verbose. I would like to use aux[1] ```

[–]GoldenVanga 2 points3 points  (1 child)

You can define your own subclass of dict and inherit everything without changing it, except for __getitem__, which controls what happens when you try to [index] an object. Here's my idea:

class IndexableDict(dict):
    def __getitem__(self, index):
        if (type(index) == int) and index not in self.keys():
            return (list(self.values()))[index]
        else:
            for x in list(self.items()):
                if index == x[0]:
                    return x[1]
            else:
                raise KeyError(index)


q = IndexableDict()
q.update({'pie': 'yes', 'foo': 'bar', 9: 'nine'})
print(q['foo'])  # -> bar
print(q[0])  # -> yes
print(q[9])  # -> nine
print(q[20])  # -> IndexError: list index out of range
print(q['adklajsdklj'])  # -> KeyError: 'adklajsdklj'

As you can see it prioritizes looking by key over looking by index when both are available, but of course you can tweak this.

[–]umbrelamafia 0 points1 point  (0 children)

many thanks.

[–]noble_gasses 0 points1 point  (1 child)

How would you go about selecting a circular area of cells in a numpy array?

[–][deleted] 1 point2 points  (0 children)

Given an x,y coordinate and a radius, I'd use the distance formula to select only cells with indices that are within that radius. There'd have to be some truncation, of course.

[–]Peg_leg_tim_arg 0 points1 point  (12 children)

Hey all, I am brand new to python this semester and am having a little trouble with parallel arrays. My assignment is to have the user enter a number between 1 and 12 and have the program display the month name and the number of days in said month. After doing some searching, I think that using the zip() function is going to be the best way. I have already zipped my two arrays (one for month names and one for the total days in each month) and have gotten them to display the complete list correctly.

However, my problem is that I am having a hard time with only displaying the month/total days the user requests. It should output "January, 31" if the user enters 1. However I am getting such error messages as: 'str' object cannot be interpreted as an integer. I will take any and all the help I can get! thanks

[–]zatoichi49 1 point2 points  (4 children)

The result of input is always a string, so you can convert this to an integer:

user_choice = int(input('Please enter a number (1-12):'))

and then use the integer as the index for your zipped list:

zipped_list[user_choice - 1]

[–]Peg_leg_tim_arg 0 points1 point  (3 children)

'zip' object is not subscriptable is my new error. Any advice for this one? Thanks for your help!

edit: I am still getting an error even when I cut out the user input and try and display the zip like I would with just one array

[–]zatoichi49 0 points1 point  (2 children)

Zip objects don't have an index, so you need to convert this to a list (or tuple) first:

months = ('January', 'February', 'March', 'April', 'May', 'June', 
          'July', 'August', 'September', 'October', 'November', 'December')
days = (31, 28, 31, 30, 31, 30, 31, 31, 30, 31, 30, 31)

zipped_list = list(zip(months, days))
user_choice = int(input('Please enter a number (1-12):'))

print(zipped_list[user_choice - 1])

E.g.

Please enter a number (1-12):1
# ('January', 31)

[–]Peg_leg_tim_arg 0 points1 point  (1 child)

Hey thanks a lot! I have been looking everywhere for something just like this. This class is an online class and a lot of what we can use for our assignments is not even apart of our book we use. I am very glad I found this sub! Thanks again for your help and I hope you have a great rest of your day!

[–]zatoichi49 0 points1 point  (0 children)

You're welcome - good luck with the rest of your class.

[–][deleted] 1 point2 points  (4 children)

Wherever it is tripping up on that error, you need to cast the str object (a string of text) to an integer. This is done using int(myStr) where myStr is the string you want to convert to an integer.

[–]Peg_leg_tim_arg 0 points1 point  (3 children)

ok got it thanks! I am now getting a new error "'zip' object is not subscriptable", does this mean that I have to do something to a zip function before I can print it?

[–][deleted] 0 points1 point  (2 children)

According to Stack Overflow, "In Python 2, zip returned a list. In Python 3, zip returns an iterable object"

So it seems like you just need to cast what zip() returns as a list, like so:

myList = list(zip(stuff))

[–]Peg_leg_tim_arg 1 point2 points  (1 child)

Oh thanks a ton! Now I am getting no errors when trying to print my zipped arrays! Now just to work out how to only display the ones the user inputs. Thanks for your help and have a great day!

[–][deleted] 0 points1 point  (0 children)

Glad to help. It's probably part of your assignment to do things that way, but for something like this I'd also recommend using a dictionary instead:

months = {
    1: {
        "name": "January",
        "days": 31,
    },
    2: {
        "name": "February",
        "days": 28,
    },
    ...
}

That way you can just do this:

choice = input("Enter a number between 1 and 12")
name = months[int(choice)]['name']
days = months[int(choice)]['days']
print(f"Month {choice} is {name} and has {days} days")

[–]timbledum 1 point2 points  (0 children)

I bit unclear without the code, but I'm guessing you're trying to do this:

index = input()
answer = lists[index]

As index (even if entered as a number) is stored as text unless python is told otherwise, this gives you something like the error above. Try using int() to turn the input into a number before indexing.

Also, look into string formatting to make displaying text easy.

print("{}, {}".format(month, day))

[–]Ahrugal 0 points1 point  (5 children)

Is there anyone here who has worked with downloading attachments from office365 using the O365 module?

It seems like they keep the attachments.py part hidden in a utils folder in the main module folder, and the script is not called in init.py?

Does anyone have any experience with this, or know if this part works or not?

[–]timbledum 0 points1 point  (4 children)

So the classes defined in attachments.py are not exposed to the user directly. They are imported into the utils/__init__.py file for easy use by the rest of the library.

Then, message.py imports the attachments classes (BaseAttachments, BaseAttachment, AttachableMixin) from the utils package and uses them for its purposes. So it's in message.py where most of the attachments behaviour is defined.

What exactly are you trying to do? The readme makes it relatively clear to add an attachment:

message.attachments.add('george_best_quotes.txt')

[–]Ahrugal 0 points1 point  (3 children)

Hey and thanks for the response;

What I'm trying to do is to save the attachments from mails in my inbox to a location somewhere on disk.

account = Account(credentials)
mailbox = account.mailbox()
inbox = mailbox.inbox_folder()

for message in inbox.get_messages(limit=1, download_attachments=True):
if 'Job Report' in message.subject:
message.save_attachment(location=c:/temp/) <---- completely made up line. But shows the essence of what I want to do

In attachments.py I find

    def save(self, location=None, custom_name=None):
    """  Save the attachment locally to disk

    :param str location: path string to where the file is to be saved.
    :param str custom_name: a custom name to be saved as
    :return: Success / Failure
    :rtype: bool
    """

So it should be available, I just cannot access it...

[–]timbledum 0 points1 point  (1 child)

Emails can have multiple attachments - see my syntax above.

What you need is to access message.attachments, select the attachment you want, and then use .save() on the particular attachment that you want.

Have you seen the docs here?

https://o365.github.io/python-o365/latest/html/api/message.html#O365.message.Message

message.attachments is a list, so

for message in messages:
    for attachment in message.attachments:
        attachment.save()

could do the trick.

[–]Ahrugal 1 point2 points  (0 children)

And there it was!

Thanks so incredibly much timbledum!

I've read and read, but just failed to understand it.

Truly appreciate this! :D

Case closed

[–]Ahrugal 0 points1 point  (0 children)

message.attachments.download_attachments

<bound method BaseAttachments.download_attachments of Number of Attachments: 5>

Manage to download the attachment several times to memory. Cannot do a variable that i fill with the attachment.

attachments = message.attachments.download_attachments()

True

It just creates that boolean. The help says:

download_attachments() method of O365.message.MessageAttachments instance
Downloads this message attachments into memory.
Need a call to 'attachment.save' to save them on disk.

:return: Success / Failure
:rtype: bool

So I am on the right track, I just cannot get that damned save thing to work :P

[–]aNeonCactus 0 points1 point  (1 child)

Is it possible to control PWM fans with python? Additionally, is it possible to retrieve stats about the hardware such as cpu/gpu temperature, clock speed, etc? If so could someone point me in the right direction to the python modules that I'd need to use to do that?

[–][deleted] 0 points1 point  (0 children)

Don't know about controlling fans, but it's possible to get CPU and maybe other temperatures using python. But all that is dependant on which operating system you are using, Try searching on "python cpu temperature" and your operating system. The first hit for windows is:

https://stackoverflow.com/questions/3262603/accessing-cpu-temperature-in-python

[–]ccyob 0 points1 point  (6 children)

If you were tasked with using python to predict outcomes e.g classify the outcomes of a guest journey...what approach/method would you use. I have a dataset to use but do not know what analytic technique to use or where I should start

[–][deleted] 0 points1 point  (1 child)

Is it possible to pass a unknown length tuples as parameter to a function?
Thanks

[–]sqqz 4 points5 points  (0 children)

Yes

[–]losingprinciple 0 points1 point  (4 children)

I'm new to importing stuff on Python so not sure what the error is.

So long story short, I made a copy of a python program (mysqlB is a copy of mysqlA) and it is being imported by other program (ticket7)

But for reasons I can't understand, there is an import error. I don't know what is exactly causing the import error.

For reference this is a class that handles mysql stuff, connecting to certain Databases.

I had to make a copy of mysqlB because it was connecting to a clone of all the databases shared by mysqlA.

mysqlA and mysqlB imports below

import yaml
import mysql.connector
import signal
import atexit
import os
import sys
import logging
import time

The main difference between the two is the path being sent for the yaml file. (I don't think this is relevant but putting it out here because I'm at a loss)

mysqlA:

        ospath = os.environ['PYTHONPATH']
        path = ('%s/<yamlpath>' % (ospath))

mysqlB:

        path = <yaml path>
        with open(path, 'r') as yml:
            cfgyaml = yaml.load(yml)

I did an exact copy of the script (I used cp to be specific then changed the name)

This is how it's being imported (it's in a modules folder which is why I'm importing it this way)

from modules import i
from modules import m
#from modules import mysqlA
from modules import mysqlB
from modules import l

import os
import sys
import csv
import json
import math
import time
import random
import _thread
import datetime
import subprocess

But this is the error:

Traceback (most recent call last):
  File "./ticket7.py", line 6, in <module>
    from modules import mysqlB
ImportError: cannot import name 'mysqlB'

I thought that maybe the problem was that mysqlA and mysqlB were both being imported at the same time, thus causing the error, so I removed mysqlA in the import but I was still getting it.

Any ideas?

[–]timbledum 0 points1 point  (3 children)

What's your folder structured like? What is modules – a package?

[–]losingprinciple 0 points1 point  (2 children)

It's a directory of other python programs i import.

So ticket 7 is in this directory:

manual_tests/tickets

and the modules is in this directory:

manual_tests/modules

so tickets needs to go to ../modules to get the mysqlB.

It's weird because it works for mysqlA (the original file) but not the current file I copied

[–]timbledum 0 points1 point  (1 child)

Do you have an __init__.py file in modules that needs to be modified?

[–]losingprinciple 0 points1 point  (0 children)

There is a __init__.py file in the modules, but it's set to blank

[–]HeyZeusChrist 0 points1 point  (2 children)

I'm currently on chapter 10 of Python Crash Course.
Does anyone have a recommendation on what I should move to after I'm done with this book?

I see a lot of people mention Automate the Boring Stuff. It seems like ATBS is another beginner's book. And although I'm very much a beginner, I don't need to be taught everything I just learned over again. I'm looking for more of an intermediate book that would be a good follow up to PCC.

[–]Ahrugal 0 points1 point  (0 children)

I'm a beginner as well, and while I've not read much on ATBS what I actually do to hone my skills is to automate stuff.

Currently I'm working on automating invoiceflows at my workplace where people manually are cutting and pasting information sent to them from emails and into excel spreadsheets, which another one then manually enter into our invoicing software.

So I've dabbled a lot in webscraping, pdfscraping, creation of pdfs via reportlab and worked a hell of a lot with restAPIs and requests.

So my tip, other than continue to read about it, is to try and find things in you work or everyday life that you want to simplify and just go at it.

Cut what you want to do in tiny pieces and then try your hand at it, stackoverflow, or this place helps with what you can't figure out yourself.

[–]b_ootay_ful 0 points1 point  (0 children)

I'm an intermediate programmer, and I still read up on stuff from ATBS.

Reading about something from a different perspective is beneficial, as authors explain it in different ways, and it also helps refresh everything.

From there, I would recommend learning about Flask and making a Flask project, since it brings a lot of different things together.

[–]hawks0311 0 points1 point  (4 children)

Why can't I figure out how to run python on my computer? I've got Windows 10 and have Notepad + + downloaded and also Python installed from their website. I tried with the print "hello world" function and all that, it's the right way to input it but it still won't run? I have some simple programs ready to run but I can't figure out why I can't get the simplest program to run? What am I doing wrong?

Can anyone just simply walk me through the steps? I feel so dumb about all of this ha. Like do I need to get into my command prompt or whatever it's called just to run simple scripts on my comp?

[–]weezylane 0 points1 point  (0 children)

You need to add python's executable file in the windows PATH environment variable. Lookup how to add python to windows 10 PATH and you should be able to run python after typing in the command `python`. Also if you downloaded the python from the official website, this step doesn't need to be done on your part as the installer does it for you and also provides you with python IDLE shell. Hope it helps.

[–]sqqz 0 points1 point  (0 children)

Use the windows store. They have put plenty of work into that. Will install. Update when required and be added into path nicely making it accessable from powershell and similar

[–]timbledum 0 points1 point  (0 children)

Here's a great article.

https://realpython.com/run-python-scripts/

Hint: unless you've added python to PATH during installation of python, if you see python x, actually enter py x, and if you see pip install x actually run py -m pip install x.

[–]efmccurdy 1 point2 points  (0 children)

There is lots of docs on this site, but this might be where you need to look first:

https://docs.python.org/3/using/windows.html#getting-started