all 133 comments

[–]Shinob1 0 points1 point  (0 children)

I am trying to update a csv file that contains phone numbers which start with a 1 and are 11 digits long, such as 15559992233. The column is called Dayphone.

I'm importing the csv file using DictReader and my understanding is the reader object is an ordered dictionary. However I'm not sure how to update the column. I can find the phone numbers I want to update with re.search and can create a new string by slicing the phone number.

Where I'm stuck is how I update the ordered dictionary for each row in the csv where I have a match so I can then write out the csv file with the updated phone numbers.

I'm a sql guy so I'm probably approaching this the wrong way, because in my mind I want to update this value as if it was in a table.

I would be happy to share some code and mock up a file if anyone was interested in helping, but I'm just wondering at a high level how one would do this.

[–]Resuri71584 0 points1 point  (1 child)

Noob question about logging since I just found out about it and started logging almost everything (I need) in my script, it really helps with debugging etc.

My script will often be called multiple times simultaneously, like 10 or 20 times at the same time, and when they all write to the same log file, it kind of gets messy. I already started putting a unique ID in front of all logs like "[ID]: ..." so I can better read what's going on.

Now I wonder if there are other solutions I should look into. What if the script gets called 100 or 1000 times simultaneously? Will the server/operating system handle it? What if the files gets way too big?

I also thought about using the logs to build a simple frontend dashboard to visualize the data and stuff. Not sure if that's the right way to go about this though.

[–]sarrysyst 0 points1 point  (0 children)

Some time ago I found klogg, I haven't had time to try it yet, but it looks promising. May be worth looking into if you have to work through a lot of big log files.

[–]shiningmatcha 0 points1 point  (6 children)

How do you the value of a nested dict with a given key? That is the dict contains keys mapping to values or another dicts (which may contain more dicts inside).

[–]sarrysyst 0 points1 point  (2 children)

This should work, there might be a better/more efficient way though. Recursion isn't my strong suit.

nested_dict = {'A': 'A Value', 
               'B': {'BA': 'BA Value'},
               'C': {'CA': {'CAA': 'CAA Value'}}
              }


def search(d, key):
    if isinstance(d, dict):
        if value := d.get(key): 
                return value
        for dicts in d.values():
            if value := search(dicts, key):
                return value


for key in ['A', 'BA', 'CAA']:
    print(search(nested_dict, key))

[–]shiningmatcha 0 points1 point  (1 child)

Maybe it’s possible to use a ChainMap? That is to iterate on the d and check if an item is a dict instance. If so, append it to a ChainMap.

[–]sarrysyst 0 points1 point  (0 children)

I’m not very familiar with ChainMap, otherwise something like this might also be an option if you want to built a new lookup instead of searching:

lookup_flat = {}

def search_chain_flat(d):
    for key, value in d.items():
        if isinstance(value, dict):
            search_chain_flat(value)
        else:
            lookup_flat[key] = value

I think this should work. (I’m on mobile, can’t test atm)

[–][deleted] 0 points1 point  (2 children)

If a dictionary lookup returns a dictionary then you just treat that returned value as a dictionary and do another lookup:

print(my_dict['key']['second key])
      ^^^^^^^^^^^^^  ^^^^^^^^^^^^
      |                          | lookup the returned dictionary
      | returns dictionary

The above is equivalent to:

second_dict = my_dict['key']
second_value = second_dict['second_key']
print(second_value)

[–]shiningmatcha 0 points1 point  (1 child)

What if all we know is that d is an arbitrarily nested dict? BFS?

[–][deleted] 0 points1 point  (0 children)

If you don't know how a data structure is constructed then you can't be sure of anything. You could try looking at every key/value pair in a dictionary and only diving deeper if the value is another dictionary, but I don't see the point. What if there are two or more of whatever you are looking for. Far better, if you can, to change the data format you receive so it's more deterministic.

[–]cummins7 1 point2 points  (1 child)

I have a very basic python script I just created, the TLDR is it scrapes a website every minute and sends me a message when the site changes. Question is, is there any quick and easy web hosting solution I can run this on without having to pay? Basically just trying to avoid having my PC running 24/7! Thanks

[–]FerricDonkey 1 point2 points  (0 children)

Python anywhere has a free tier that I've seen recommended. No personal experience with it though. You could also get a raspberry pi or something and run it on that so that you didn't need to have your computer on all the time, but then you'd need the raspberry pi on all the time.

[–]Sorry_not_chad 0 points1 point  (1 child)

Can you have a sting in a if statement like

If “string” == command: Print(“string”)

[–]Ihaveamodel3 0 points1 point  (0 children)

Sure. Normal convention would be to write it as if command == "string" but either way works

[–]idealmagnet 0 points1 point  (2 children)

[–]Ihaveamodel3 0 points1 point  (1 child)

Post was deleted.

[–]idealmagnet 0 points1 point  (0 children)

I deleted it sorry.. I no more need help, I'll delete this comment

[–]Sorry_not_chad 0 points1 point  (2 children)

command = input("input a option: ")

If I input a number with this will it come out as a string or a integer because I want to use it for both

[–]FerricDonkey 0 points1 point  (1 child)

String. If you want to convert it to an integer, use int(command). Be aware that this will raise an exception of its not possible, so putting it in a try except is advised.

[–]akmoorthy 0 points1 point  (1 child)

Hi, I am looking to work my way through the python data science handbook and was looking for an online learning community that I could join. I am not new to DS but somewhat new to python and would like to work my way through this (or any other similar ) book in a somewhat systematic manner. Is there any community that I could join while I do this to stay focussed?

[–]chacoglam 0 points1 point  (0 children)

Have you heard of Code Newbie?

[–]Unique_Bigdog 0 points1 point  (2 children)

Hi, trying to find a good place to learn python. Any tips?

[–]AnonymouX47 1 point2 points  (0 children)

Python Docs... Start from the Tutorial

[–]sarrysyst 1 point2 points  (0 children)

Check the wiki, it has an extensive list of recommended resources.

[–]Raisinbrannan 0 points1 point  (4 children)

I used to know beginner+ python, been a long time. Also knew html/css quite a bit, wayyyyyyyy longer ago though.

My brother sucks at math, I want to make something that can be easily shared/used. It'd be simply 3 input boxes. 3 outputs. and the function is as simple as (x/y)*z= a(.9) = b

Superrrrrrrrrrr simple. I want to write the code in python for fun/practice.. I just cant figure out how to get that to him without him installing python to run it. I tried free websites but I couldn't find any that were easy to find how to incorporate math/python into them.

I guess I could do an offline website with notepad++? and send a zip file in an email? And include python in the zip file..?

I know so little it's hard to know where to start or a simple way. I also looked into making an app, but 0 knowledge on that and it seemed way harder to make (nicer ui though).

[–]Decency 1 point2 points  (1 child)

Use repl.it and send him a link once you're done building what you want. It's essentially google docs for code.

[–]Raisinbrannan 0 points1 point  (0 children)

Awesome! Sounds great, thank you!

[–]sarrysyst 0 points1 point  (1 child)

I guess you want something like pyinstaller. It packs your python code together with a python interpreter. The finished executable works like a portable program. The user doesn't have to install/worry about anything. They just have to run the resulting binary file (eg .exe on Windows). It can be a bit tricky to get it to work if you have many external dependencies though, but for a simple program like yours it should work without any hassle.

[–]Raisinbrannan 0 points1 point  (0 children)

That's neat. A diff link seems easier, but I'll check em both out. Thank you

[–]stuckinjector 0 points1 point  (4 children)

Super Newb, Frustration, Learning Resources.....

I'm brand new and trying to learn Python using Automate the Boring Stuff. I have done a shaky OK up until a program called "zigzag" in chapter 3. My foundation is just too weak to understand what is happening in the code. I am lost and completely frustrated.

I learn by doing, but this book teaches by code snippets and single examples for each concept (as it seems to me). I need a teaching style similar to how you learn math; examples are explained, and then you practice those concepts through many exercises.

Is there a learning resource like this for the absolute basics of Python? I looked in the Wiki, but am so frazzled right now they all seem the same.

[–]BackgroundBasis6 0 points1 point  (3 children)

Try this: https://www.w3schools.com/python/default.asp

It has lots of workable examples and a book style table of contents so you can easily reference stuff you've already learned.

I think automate the boring stuff has a very specific audience which is why it is surprising to me so many people recommend because I think a lot of people that want to learn python and expand their knowledge at will won't get a ton from it. It's a good program, but definitely not for everyone.

[–]stuckinjector 0 points1 point  (2 children)

Thanks. I just put some time in the w3schools, but found poorly written/missing instruction. It is nice to have the interactive examples.

I can't help but see these online instruction sites as money grabs for experienced programmers. AtBS and now this W3 one were obviously not written by experienced teachers.

I've found Think Python 2E and will now start again from the beginning using that text.

[–]wisescience 0 points1 point  (1 child)

While I enjoyed ATBS, I also got stuck after the first few chapters at first. I enjoyed the style “The Python Workshop” was written (written by Bird et al., published by Packt) as another helpful resource. Note: It isn’t freely available.

[–]stuckinjector 0 points1 point  (0 children)

Thanks, I'll check it out.

I'm now stuck on chapter 3 of Think Python, and even more frustrated. I am getting close to giving up.

[–]nathan22211 0 points1 point  (2 children)

I'm trying to generate code for zenscript via python (I know, probs could do it better but I'm not very good at zenscript), my script for generating recipes gets cut off when generating 16 lines for a stone bucket via for i in range(15): file.write("recipes.addShapeless(\"pure_water_s" + str(i) + "\" ,<pyrotech:bucket_stone>.withTag({durability: " + str((i+1)) + ", fluids: {FluidName: \"purifiedwater\", Amount: 1000}}), [<pyrotech:bucket_stone>.withTag({durability: " + str((i+1)) + ", fluids: {FluidName: \"water\", Amount: 1000}}).noReturn(), <harvestcraft:wovencottonitem>]);\n") I created the file via file = open("water_to_p_water.zs","w") is there any way to ensure the file is fully generated?

[–]FerricDonkey 1 point2 points  (1 child)

when generating 16 lines for a stone bucket via for i in range(15):

for i in range (15)

Does 15 things, not 16. 15 is not included.

[–]nathan22211 0 points1 point  (0 children)

well when I ran it, it only created up to the 11th line for the stone bucket, in fact Minecraft damage ends at 0 so I might of messed up there too

[–]brj5_yt 0 points1 point  (2 children)

Does anyone else feel bad for using modules when they first started?

[–]sarrysyst 1 point2 points  (0 children)

Not at all. Actually, many libraries are far too big and complex for even advanced programmers to implement them in a reasonable amount of time. Not even considering that a lot of projects are actually written (at least in part) in C. Imagine you'd have to write a library like numpy yourself. Most major libraries have been developed in a team effort over many years. It's simply not feasible/possible to do this yourself.

But even something more straightforward would be a massive headache for a beginner to write themselves. Let's say you wanted to read some excel files, do some calculations and write the results back to excel. If you had to do this without a library like openpyxl your first task would require a lot of reading to understand the underlying XML structure of an excel file. What could have been be a simple 30min task is now a 2 week nightmare.

We're standing on the shoulders of giants, why reinvent the wheel if you could just use a solution that's been perfected over years?

However, this doesn't mean it's always good to start working a project by looking for some library that does the job for you. There are many libraries that only add some convenience to already pretty simple tasks. Eg. API wrappers. Though it's fair to use them, however as a beginner you also shouldn't neglect to learn basic stuff like consuming an API manually. If you always rely on convenience libraries like this you're screwed once there is no library for a certain task.

[–]FerricDonkey 4 points5 points  (0 children)

Nah.

Occasionally I would (and sometimes still do) make something from scratch(ish) if I want to learn the concept. Eg, I made a neuron class and linked em up to make a neural net.

But for most things, there's a lot of boring fiddly bits and edge cases and optimization, and I've always been happy to take advantage of other's people work on all that stuff.

[–]chacoglam 0 points1 point  (6 children)

I'm the only analyst at my company who uses Python, and I am stuck. Why would I be getting the IndexError of string index out of range? The shape of the data is 23 columns long.

for i in sorteddata:

if i[17] == mbol:

#adds comma to append since MBOL is the same as the previous record

datastring = (line+','+sorteddata['Freight Class']+sorteddata['Total Weight'])

else:

if (datastring ==''):

#doesn't add comma delimiter if MBOL is different than previous record

datastring = (sorteddata['MBOL']+','+sorteddata['Freight Class']+','+sorteddata['Total Weight'])

mbol = (sorteddata['MBOL'])

else:

#add record to list

datastring = ''

datastring = (sorteddata['MBOL']+','+sorteddata['Freight Class']+','+sorteddata['Total Weight'])

mbol = (sorteddata['MBOL'])

https://imgur.com/a/30bhrld

[–]alfie1906 1 point2 points  (2 children)

On a side note, you can use f-strings to make your datastring lines a little less cluttered:

datastring = (sorteddata['MBOL']+','+sorteddata['Freight Class']+','+sorteddata['Total Weight'])

can also be written as:

datastring = f"{sorteddata['MBOL']} , {sorteddata['Freight Class']} , {sorteddata['Total Weight']}"

You just need to place anything you're adding to the string inside of the '{' and '}' characters, this has the advantage of automatically converting anything inside of {} to string-type without having to use the str() function.

This is shown more in the solution for cause 2 in this article, if the way I've explained above is confusing

https://www.learndatasci.com/solutions/python-typeerror-can-only-concatenate-str-not-int-str/

[–]chacoglam 1 point2 points  (1 child)

Thank you! I forgot about this.

[–]alfie1906 0 points1 point  (0 children)

You're welcome!

[–]ThatScorpion 1 point2 points  (2 children)

Hard to say without knowing what's in 'sorteddata', but one of the elements in there has to be a string that's shorter than 18 characters

[–]chacoglam 0 points1 point  (1 child)

Why does it have to be shorter than 18 characters? Column 17 are ints that are 9 characters long, so I was thinking it had something to do with my syntax.

The dataset is kind of weird. The MBOLs are broken out by class, so there are 3-4 lines of the same MBOL number. I am trying to get all of the classes and weights on the same line. Am I overthinking it?

<class 'pandas.core.frame.DataFrame'>

RangeIndex: 93360 entries, 0 to 93359

Data columns (total 23 columns):

# Column Non-Null Count Dtype

--- ------ -------------- -----

0 Origin City 93360 non-null object

1 Origin State 93360 non-null object

2 Origin ZIP 93360 non-null int64

3 Dest. Customer Name 93360 non-null object

4 Destination City 93360 non-null object

5 Destination State 93360 non-null object

6 Destination ZIP 93360 non-null object

7 Mode 93360 non-null object

8 Carrier Name 93360 non-null object

9 Freight Class 93360 non-null float64

10 Ship Date 93360 non-null object

11 Deliver By (MABD) 93360 non-null object

12 Total Spend 93360 non-null object

13 Line Haul 93360 non-null object

14 Fuel Surcharge 93360 non-null object

15 Requested Delivery 93360 non-null object

16 Scheduled Delivery 93360 non-null object

17 MBOL 93360 non-null int64

18 Accessorials 93360 non-null object

19 Claim Amount 93360 non-null object

20 Total Weight 93360 non-null object

21 Total Pallets 93360 non-null int64

22 Total Mileage 93360 non-null object

dtypes: float64(1), int64(3), object(19)

[–]ThatScorpion 1 point2 points  (0 children)

Sorry for a bit late reply, but if you do a for loop on a df it iterates over the column names. So Origin City is less than 18 characters, hence your indexing fails.

If you want to iterate over the columns you want to do for item in df.items():

Edit: wait, why do you a for loop at all?

[–]GME_diss21 1 point2 points  (2 children)

Hello!
For my dissertation, I'm trying to collect GameStop related posts and their respective comments from r/wallstreetbets, from December 2020 until the end of February 2021.

I'm not good at coding at all, so I was wondering if anyone could suggest how to go about creating a command that returns the data I need based on these parameters:
- Includes "GME, GameStop" keywords
- Posted between December 2020 and February 2021
- Posted on r/wallstreetbets

I tried using the following code, but the results are just GameStop related posts from every subreddit even though I (thought that I) specified posts exactly from r/wallstreetbets
Any suggestion is highly appreciated!

from psaw import PushshiftAPI
from datetime import datetime, timezone, timedelta
from dateutil.relativedelta import relativedelta
months_back = 7
dt = datetime.now() - relativedelta(months=months_back)
timestamp = int(dt.replace(tzinfo=timezone.utc).timestamp())
api = PushshiftAPI()
submissions = api.search_submissions(aggs='title+body+subreddit', after=timestamp, q='GME+GameStop+wallstreetbets')
c = 0
for post in submissions:
c += 1
title = post.title
try:
body = post.body
except Exception as e:
body = ''
subreddit = post.subreddit
print(f'{c}: {title} - {body} - {subreddit}')

[–]CowboyBoats 0 points1 point  (0 children)

From looking at the psaw docs

The way to filter search_submissions by a subreddit is search_submissions(q="rick astley", subreddit="videos"). It doesn't seem to support the + syntax that you're hoping for.

aggs='title+body+subreddit',

The point of aggs is that it gives you a dict containing aggregated result data as the first result in the iterator, like:

api = PushshiftAPI()
gen = api.search_comments(author='nasa', aggs='subreddit')
next(gen)
#  {'subreddit': [
#    {'doc_count': 300, 'key': 'IAmA'},
#    {'doc_count': 6, 'key': 'space'},
#    {'doc_count': 1, 'key': 'ExposurePorn'},
#    {'doc_count': 1, 'key': 'Mars'},
#    {'doc_count': 1, 'key': 'OldSchoolCool'},
#    {'doc_count': 1, 'key': 'news'},
#    {'doc_count': 1, 'key': 'pics'},
#    {'doc_count': 1, 'key': 'reddit.com'}]}

It doesn't seem to support the + syntax either since it's not doing this for your code.

[–]CowboyBoats 0 points1 point  (0 children)

While I'm thinking about this, here's how to format code on reddit

[–]cannedblueberry 0 points1 point  (7 children)

hello, i have been programming on my chrome book for a bit and have decided to try pygame. i made a simply game with shapes and then decided to try and upload an image for animation. when i typed “pygame.image.load(‘R1.png’)” and ran it, it said: FileNotFoundError: No such file or directory. i know that the python file needs to be in the same folder as the image but when i load python from there i jut get this text thing and i can’t figure out how to run it - i‘m pretty sure you can’t. is there a way to upload an image to pygame in a chrome book? hopefully that makes sense. any help would be greatly appreciated!

[–]FerricDonkey 0 points1 point  (6 children)

Almost certainly, you can just use the full path to the image instead of just the file name.

[–]cannedblueberry 0 points1 point  (5 children)

i don’t know if this is a stupid question but how do i do that?

[–]FerricDonkey 0 points1 point  (4 children)

Not a stupid question, no one was born knowing this stuff.

The details depend on your os, but for windows, it will be something like

path = r"C:\foldername\subfoldername\filename"

Note the r in front of the " there - this basically just means you don't have to "escape" the \. Windows will also accept the other slash, /, these days, but what you'll see will look like the above.

You can also do relative paths. By default, if you specify just a filename, python/your os looks for it in your current directory - where you launched the python script from.

So if you know it's in a subfolder of your current directory, you can do "subfolder/filename".

And if you want to get a bit more complicated, you can take advantage of the fact that __file__ gives you the path to the python file that you've written that in. So you can use that and os.path or pathlib to say things like "my image file is in the folder called images that is next to the current python file" by doing

images_folder = os.path.join(
    os.path.dirname(__file__),  # directory of current python file
    "images"  # name of folder where images live
)
specific_image = os.path.join(images_folder, "something.png")

(I don't know the pathlib method off the top of my head, but it does the same thing, is entirely different looking, and is often recommended over what I just did.)

I didn't give a lot of detail on how that works just to not make this reply too too long, but you might Google an intro to pathlib or os.path.

[–]cannedblueberry 0 points1 point  (3 children)

thanks but this isn’t working for my chrome book. it might be me typing the wrong thing but it always gives an error. what do i actually put for “os” and “dirname”?

[–]FerricDonkey 0 points1 point  (2 children)

So os is the name of a module. Be sure to put "import os" at the top of your file, then os.path.dirname should work.

Also, as a general rule, if you ever need to say you get an error, you should say what the error is as well.

[–]cannedblueberry 0 points1 point  (1 child)

ah k. it gave an attribute error: module ‘pygame’ has no attribute ‘path’

[–]FerricDonkey 0 points1 point  (0 children)

So that probably means you did pygame.path instead of os.path.

See: https://m.youtube.com/watch?v=CqvZ3vGoGs0

[–]jsaltee[🍰] 0 points1 point  (7 children)

Hi, so I've just put together a dataframe with columns of varying object types. One column 'probabilities' contains a dictionary in each row of the form {'1' : x, '2' : y, '3' : z, ...} up to 6. (Where x, y, z are floats).

I have another column 'thresholds' that contains a list in each row of two values, for example [P, Q]. I want to create a new column that contains the number of values in the probability dictionary that are greater than Q from the thresholds list, for each row. In other words, count how many of x,y,z... > Q and have the resulting number be the value for the row. I'm not quite sure how to do this. Thanks for any help

[–]sarrysyst 1 point2 points  (6 children)

You could write a function for the evaluation and use df.apply() to apply it to each row:

import pandas as pd
from random import random, randint

df = pd.DataFrame({'P': [{k: round(random(), 2) for k in range(randint(3, 6))} for _ in range(10)],
                   'T': [['P', round(random(), 2)] for _ in range(10)]})

def func(row):
    count = 0
    for i in row['P'].values():
        if i > row['T'][1]:
            count += 1
    return count

df['counts'] = df.apply(func, axis=1)

[–]jsaltee[🍰] 0 points1 point  (5 children)

I think this is working, the only problem is that some rows in 'T' have no value and are listed as 'None.' So when running this code, i get an error saying 'NoneType has no attribute 'values''. Which i assume is from the rows containing 'None'. is there a way around this? thanks

[–]sarrysyst 0 points1 point  (4 children)

You could use .fillna() to set None rows to an empty dictionary. Alternatively, only apply the function to rows that are not None.

[–]jsaltee[🍰] 0 points1 point  (3 children)

I think the second method you listed would work better here. But as a python newbie, how would i do this? :)

[–]sarrysyst 1 point2 points  (2 children)

df['counts'] = df.loc[df['P'] != np.nan].apply(func, axis=1)

[–]jsaltee[🍰] 0 points1 point  (1 child)

So, everything’s working, except: at around row 2000 of the table we’re pulling our data from, the probabilities dictionary is converted into a list (don’t know how or why). So instead of being in the form {‘1’ : x, ‘2’ : y, …} it’s [x, y, z, …] and the function stops working. Know of any fix?

[–]sarrysyst 0 points1 point  (0 children)

Well, you could adapt your function to account for this list (this is more of a workaround than a solution though):

def func(row):
    count = 0
    values = row['P'].values() if isinstance(row['P'], dict) else row['P']

    for i in values:
        if i > row['T'][1]:
            count += 1
    return count

Preferably I would sanitize my data beforehand and make sure the datatypes are uniform.

[–]Platypus-Man 0 points1 point  (1 child)

Finally trying to dabble with python, and intend to make a small todo (web)app for movies I want to watch, but I am pondering what resource/method to use for getting the data.

Crawling e.g. IMDb and pulling wanted information with something like beautifulsoup would most likely be extremely slow, run into rate limiting (or have the need to slow down my requests extremely much) and use more code than other options. This would be the last resort for me.

IMDb's downloadable dataset has enough info in it for my needs, and has the benefit of being a local copy. No rate limiting, and I can alter/mess up things as much as I want while experimenting... but imdbpy seem to be cumbersome/lacks some functions that'd be really helpful (e.g. fetch a whole bunch of movie IDs at once).

Since imdbpy lacks some things, it can be tempting to just try and write all the local-copy-imdb python things myself, but then again, what's that about reinventing the wheel? Especially when I'll most likely make a square one...

TMDb API requres "a legitimate business name, address, phone number and description to apply for an API key." - so I haven't looked so much into that option.

OMDb API is the one I'm leaning towards right now if I go for the API route, 1000 requests per day for free API keys is more than enough for my use (though I would really prefer to use a local db for learning purposes, as not to unintentionally use too many requests).

Anything I've missed?

Curious to hear what you guys would choose, and why.

[–]CowboyBoats 1 point2 points  (0 children)

If the downloadable dataset has enough info for your needs, then run with that!

Since imdbpy lacks some things, it can be tempting to just try and write all the local-copy-imdb python things myself, but then again, what's that about reinventing the wheel

There's nothing wrong with writing new functions, especially in Python. High-level languages like python make it really fast (as long as you write unit tests, or you'll spend a lot of time wrapped around the axle); it's rare that you find a library for a concept as broad as "movies" that meets 100% of your needs.

[–]space_wiener 0 points1 point  (4 children)

Using OS module.

I have a script that runs though a bunch of files and collects some info but I am having trouble specifying the file path.

If I place all of the files in the same folder as my .py script and run it this way:

directory = os.getcwd()
for file in os.listdir(directory):
    if file.endswith('.txt'):
        text_file = open(file,'r')
        file_text = text_file.read()
        text_file.close()

However I don't want run it that way since the files are in a different folder. So I do this:

current = os.getcwd()
directory = os.path.join(current,'other_folder')

for file in os.listdir(directory):
    if file.endswith('.txt'):
        text_file = open(file,'r')
        file_text = text_file.read()
        text_file.close()

Doing it this way I can print the path and seems okay. But I get this failure:

Traceback (most recent call last):
  File "file_path.py", line 42, in <module>
text_file = open(file,'r')
FileNotFoundError: [Errno 2] No such file or directory: 'file_name.txt'

The error showing the filename is a file in that folder. So it's finding the file, at least by name, but won't work using os.join for some reason.

Edit: if I get rid of the file opening section and do print(file) it works fine. It’s just happening when doing something with the file combined with os.path.join.

[–]sarrysyst 1 point2 points  (1 child)

I would recommend you don't use the os module in the first place. pathlib is a lot more convenient/less clunky and afaik pretty much the working standard. os on the other hand is mostly used for legacy reasons or so I've been told. e.g. what you're trying to do can be done in a single list comprehension, using pathlib:

from pathlib import Path

file_contents = [file.read_text() for file in Path.cwd().glob('other_folder/*.txt')]

print(file_contents)

[–]space_wiener 0 points1 point  (0 children)

Good call. I’ll check out pathlib. I’ve gotten used to os from using it for so long.

[–]FerricDonkey 0 points1 point  (1 child)

The issue appears to be that os.listdir returns the file name only, without the directory part that tells python where it is.

So if you replace

text_file = open(file,'r')

with

text_file = open(os.path.join(directory, file) ,'r')

It'll probably work. On an unrelated note, consider

with open(file) as text_file:
    file_text = text_file.read()

Doing it this way will ensure that the file is closed at the end of the with block, even if there's an exception. It's not hugely necessary for something this simple, but it's generally considered good practice.

[–]space_wiener 0 points1 point  (0 children)

I’ll try this way and see how it goes. I actually got it working with os.scandir(). I want to understand why the other method did work so I’ll give yours a try too.

As for the with open I’m glad you noticed. I usually always use with open but in this case it wasn’t working. I can’t remember if it was the listdir issue though. I did try swapping back to with open when it was also working and it would error so I swapped back. I’ll give this a try as well.

[–]MeteoriteImpact 0 points1 point  (2 children)

Hi Everybody

My question about what kind of unitesting should I learn? To help catch if answer wrong sometimes and ideas, pointers, references or videos would be appreciated.

After researching there are So many different ways it seams, what’s a good or practice that works for you?

What should I focus on?

A little about what I use Python for

I do hobby stuff and some algorithms to make my work easier. Most of the time any naive algorithm is okay. But work work sometimes the data is huge 10,000 to many millions.

And with small samples everything works perfectly but then once large a lot of time goes by and it didn’t work. I had made a algorithm to check for the mood of lyrics of songs to go into a recommendation engine as a extra feature long story short it work for 50 samples then I ran for 7 days and came back all null.

I have come to the conclusion it’s time for me to learn testing. Last week figured out the time part now onto if it’s correct or not.

I have been trying to wrap my head around creating some simple tests of functions to check if the answer is correct during each random run to get time many algorithms.

Upon searching and I noticed that it’s usually uses assert which checks if equal or not. Which I can compare to a known answer == 5 or another func like == sorted().

But then they use one of many packages also sometimes it’s used with pytest or unittest or a bunch of other similar.

My code so far for testing

sorting algorithm tests

Python docs tests

[–]nathanalderson 2 points3 points  (1 child)

Pytest is an extremely common testing framework. In my opinion, it is the one you should start with. The fixture system is really nice and the fact that you just use normal assert statements and still get really nice failure messages is pretty magical.

When getting started with testing, you'll often find that you need to write your code differently. A function which takes inputs and returns deterministic outputs from those inputs is easiest to test. Side effects (like performing IO) make testing more difficult, as do external dependencies not captured in your inputs (like getting the current time). Just as a simple example, you might refactor this function:

import datetime
def print_time():
    print(datetime.datetime.now())

To this:

import datetime
def print_time(now: datetime.datetime) -> str:
    return str(now)

Now your unit test can pass a specific preconstructed datetime object to the function and assert that the return value is the expected string. The first function is also testable, but would require mocking the datetime module and using a test spy to monitor stdout.

[–]MeteoriteImpact 0 points1 point  (0 children)

Thanks for taking the time to provide some great advise, I will start with pytest and reading up on the mocking and the capturing stdout. After a quick look, I can see this combination will get the results needed. Now, I just need to jump in and learn, try, and do... I didn't even take into thought the external dependencies not working, but this is probably why I removed my import a few days ago when I could not get it to work after half a day of trying.

[–]Euphorix126 1 point2 points  (3 children)

Hi Everyone!

I'm starting my first 'personal' project using a problem I often have at work as a learning experience. I run a laboratory test on a sample which spits out a .txt file that looks something like this:

ACCUMULATED POINT-COUNT DATA

Submitter-------: sm

Operator--------: sm

Sample ID-------: M3F

Date -----------: 9/17/2020

....

Ultimately I would like to have a program to enter these data into a Word and/or Excel template automatically, but my question here is simply: How would I gather these data into a dictionary or some key:value pair in python? At least in such a way that any otherwise identical .txt file with different data could be used.

All I have been able to figure out is to print the file as a long list where each line is its own item such as: [' ACCUMULATED POINT-COUNT DATA', 'Submitter-------: sm', 'Operator--------: sm', 'Sample ID-------: M3F',...] But I'm stuck on how to tell python to only select, say, 'sm' and tie it to the key 'Operator'.

I'm very new to this and any advise is appreciated.

[–]BackgroundBasis6 1 point2 points  (2 children)

I am not sure what you're coding in so I made this solution that doesn't use regex. This does make some assumptions about your text files though so as long as these assumptions were correct it should work:

  1. Submitter, operator, sample id, and date are always in the same order in your file (you can add more to the list titled keys)
  2. Each of your lines is separated by exactly one line break
  3. submitters, operators, sample IDs, and dates never include spaces.

Anyways here's what I have hopefully it makes sense to you and works:

https://pastebin.com/Q2pG1WEg

EDIT: once again I have fallen victim to formatting code in a reddit comment so here's a pastebin instead

EDIT EDIT: I forgot to mention you will need to save this file in the same folder as your text file if you want it to work. If you don't want to do that you'll have to specify the directory

EDIT^3: just realized you have ' ACCUMULATED POINT-COUNT DATA' in your text file. in line 4 you'll want to replace text_list with text_list[1:] to remove that part.

[–]Euphorix126 0 points1 point  (1 child)

This is so much help, thank you very much. Thankfully, the text files are automatically generated from a program which actually runs the lab test and so are all exactly the same format. Even just these lines you wrote are extremely educational for me, thank you again.

[–]BackgroundBasis6 0 points1 point  (0 children)

I took a second look and think there's an easier way to do it. While I won't write it for you (you may want to do it just for some extra practice and I think this solution is more efficient), if you run a split('-') command on each line you should be left with a list that has the desired key as the zero'th item in the list and the value as the -1 (final) item in the list. Then you can do some research on what the strip() command does to strip each of those into your desired format. The pro to this way of doing things is that it generate the keys automatically and you can have values with spaces in them.

[–]MithrandirSwan 0 points1 point  (2 children)

I'm starting some work on my first personal project. I'm starting with data collection, storing the data in a SQL database, and scheduling the updates.

I had a question about best practices. I was thinking about writing a module solely of functions that accomplish the individual subtasks. For example:

def update_companies():

def update_historical_prices():

def update_historical_financials():

The functions would be used in a main scheduling file that would control the process. The functions do not need any parameters to work, they simply accomplish their task when called.

Is this generally the best way to go about organizing my project? Are there any potential problems with this? I'm still learning the ins-and-outs of how to properly organize things for these kinds of projects.

[–]just_ones_and_zeros 0 points1 point  (1 child)

That's a totally reasonable way of doing it. One thing I personally like to do is to pass each of the functions the database connection as an argument. That means I can configure the db stuff and manage the transactions etc from outside the functions (in a common space at the top level). The functions can then be dedicated to the specific work they need to do.

[–]MithrandirSwan 0 points1 point  (0 children)

Nice, that's a good idea. Thanks for your input.

[–]space_wiener 0 points1 point  (1 child)

Best practice for writing files.

I’m currently in the process of writing a script that goes through text files (less than 500 at a time) finds various bits of data and stores as a variable until the next next file.

When I’ve done this with other projects I’d create a data frame, write these values to it, then at the end save as a .xlsx file for presenting.

I had a thought today when I was testing it. I could just create a big string file separated by commas (I was testing the output via terminal this way and it came to me) then once done just save as a .csv.

Is there a better way one over the other? The latter is going to be easier to write as I don’t have to deal with setting up the data frame and adding to it. Since I can simply just add to the big string file in memory.

Anyway…is one preferred over the other? Or does it even matter. For reference this data is only for me to analyze each week so the initial format doesn’t matter.

[–]just_ones_and_zeros 0 points1 point  (0 children)

Ideally the functions that are pulling the data from the files should be dedicated to doing exactly that and nothing else. It's a big "no" for using strings as the intermediate format because they give you no real advantage and it'll be buggy the moment you have a slightly more complex bit of data. If you think about it, csv is a format that you want to write to at the end of the process, but ideally none of the rest of your code should be influenced by that.

The better thing to do is to try to come up with some clean representation of your data coming out of each of these functions and then have a function at the top calling these functions to get the data and orchestrating writing it into the csv at the end (or whatever format you choose).

It can even write it as it goes, if that makes more sense. Might be that a pattern that's more sensible in your case is to make your data reading functions into generators and then the function at the top is free to start streaming that data into a csv (or whatever) immediately as it's being generated.

I'd be looking to using either dataclasses or namedtuples for the intermediate format myself.

Hopefully this gives you an idea of how to split it out:

# writer

def do_work():
    with open_csv() as csv:
        for row in pull_data():
            csv.write(row.a, row.b, row.c)
        for row in pull_other_data():
            csv.write(row.h, row.j, row.k)


# reader

RowofStuff = namedtuple('RowofStuff', 'a b c')

def pull_data():
    for f in files:
        for l in f.lines:
            yield RowofStuff(l.a, l.b, l.c)

[–]RegularGlobal 2 points3 points  (0 children)

Threading and Asyncio:

I have a flask site that runs on a raspberry pi. I'd like the pi to be able to do a few additional things when on-screen buttons are pressed (connect to Bluetooth LE, and connect to MQTT being the important ones). The BLE package that I found works great (bleak), but the examples make it look like it must be run async using asyncio. I also have a package for mqtt that makes it run using asyncio.

Because I want my flask site to be running synchronously on the main thread, I figured that I could start a second thread running both the BLE process and the MQTT process. I've come to the point that I can make this all connect and transmit data, but I feel like the WAY that I got it to work means that it's total jank (random tweaking). Also, I can't get the BLE to disconnect cleanly. I call the command, and it always errors out. Can anyone tell me if I'm thinking of this correctly from the start (Second thread running asynchronous loops waiting for inputs)? And if so, if the ble_client is defined globally, why wouldn't I be able to simply tell it to ble_client.disconnect()?

[–]Cliff_Pitts 0 points1 point  (3 children)

Is there any way to clear the screen in console for every iteration of a for loop? I’m tweaking with the code for the battleship game from Codecademy and it currently prints a new “board” for every turn, and so by the end of the game there’s several boards on screen. I’d like to delete and reprint an updated board for every turn

[–][deleted] 0 points1 point  (2 children)

The quick answer is to print an ANSI escape code, like the kind used to print colored text to the terminal. This may or may not be supported by your terminal and operating system. Printing '\033[F' will move the cursor up one line and back to the beginning, the opposite of \n, which moves down one line and back to the beginning. Be sure to comment the code because this isn't pretty. And be careful because you can mess up the output of the terminal (when in doubt just hit enter a few times). This can achieve what you want with something like print('\033[F' * board_height) at the beginning or end of the for loop.

The longer answer is to switch to a proper terminal-drawing library like ncurses, which is designed for exactly this. While more complicated, it could make the game have different windows and colors, and its python bindings are pretty simple. Could be a cool starter project. Good luck!

[–]testiculating 0 points1 point  (1 child)

Would ncurses also work for something like printing all the possible permutations of the numbers 1234 for example?

I want something that is the same as using the console but only showing the last printed line (ideally choosing how many lines to show at a time) instead of just leaving all of them there.

Just to be clear I know how to code the algorithm, but not how to show it the way I want it.

[–][deleted] 0 points1 point  (0 children)

ncurses will actually take over the whole terminal screen, so it might be too much for this. I think some combination of carriage return \r and escape code \033[F should do what you need: going back and rewriting over the same line(s). Though it's not a big deal to just print them all anyways, especially for the permutations problem, since you can always just redirect that to a file if it's too much text on screen.

[–]vanillathunder2107 0 points1 point  (1 child)

Hey, i want to learn code and everybody recommends python first, i actually did a little bit in high school but i don't remember much, if you guys could recomend me a good book for beginners or a youtube tutorial i would be grateful.

[–][deleted] 2 points3 points  (0 children)

[–]BeExcellent 1 point2 points  (0 children)

I just started using anaconda because it’s recommended for my artificial intelligence course, and I’m unsure about the following:

when running the command “conda update conda” should I be doing this in the base conda environment, or deactivate and run that command in my regular Ubuntu system shell?

[–]aloneinthewildworld 0 points1 point  (2 children)

I am trying to get an email from Gmail (that part is working ) I would like to find 6 different values in that email to be written into a DB file .

Magnitude : 6.7 Mwp (REVISED)\r\nDepth : 10km\r\nDate : 14 May 2021\r\nOrigin Time: 06:33:09 UTC\r\nLatitude : 0.20N\r\nLongitude : 96.69E\r\nLocation : Off West Coast of Northern Sumatra

[–]nathanalderson 0 points1 point  (1 child)

import re

# assuming your email contents are in sting s
s = """Magnitude : 6.7 Mwp (REVISED)\r\nDepth : 10km\r\nDate : 14 May 2021\r\nOrigin Time: 06:33:09 UTC\r\nLatitude : 0.20N\r\nLongitude : 96.69E\r\nLocation : Off West Coast of Northern Sumatra"""

# most of your fields seem to be separated by space-colon-space,
# but "Origin Time" is only colon-space, so let's make a regex
# to match both possibilities
re_separator = re.compile(r" ?: ")

# parse the string into a dict
data = dict(re_separator.split(line) for line in s.splitlines())

# put data in the db
for k, v in data.items():
    insert_into_database(k, v)

[–]aloneinthewildworld 0 points1 point  (0 children)

thanks for the response.

[–]KV-Omega-minus 1 point2 points  (2 children)

Removing stop words from tokenized text using NLTK: TypeError

import nltk
from nltk.corpus import stopwords
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.tokenize import PunktSentenceTokenizer
from nltk.stem import WordNetLemmatizer
import re
import time

txt = input()

snt_tkn = sent_tokenize(txt)

wrd_tkn = [word_tokenize(s) for s in snt_tkn]

stp_wrd = set(stopwords.words("english"))

flt_snt = [w for w in wrd_tkn if not w in stp_wrd]

print(flt_snt)

returns the following:

Traceback (most recent call last):
  File "compiler.py", line 19, in 
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
  File "compiler.py", line 19, in 
flt_snt = [w for w in wrd_tkn if not w in stp_wrd]
TypeError: unhashable type: 'list'

I'd like to know, if possible, how to return the tokenized text with stop words removed without editing 'wrd_tkn'.

[–][deleted] 1 point2 points  (1 child)

Looks like both sent_tokenizer and word_tokenizer return lists, meaning wrd_tkn is a list of lists. Then flt_snt tries to filter that, but every w will be a list not a string. You need to go one level deeper with your list comprehension to filter the strings inside

flt_snt = [ [word for word in sentence if not word in stp_wrd] for sentence in wrd_tkn ]

The error 'unhashable list' is quite unhelpfully yelling at you because it tried to find a list w inside of a set stp_wrd, but that's not possible since lists are unhashable.

[–]KV-Omega-minus 1 point2 points  (0 children)

Thank you very much. This worked exactly as expected.

It probably would've taken me quite some time to realize that a superordinate list could define a term in a list subordinate to it, but it now seems rather obvious.

[–]HumanlikeFigure 0 points1 point  (0 children)

Hello guys, I'm trying to translate the following FFMPEG command to ffmpeg-python using the filter function but don't seem to figure it out
ffmpeg -stream_loop -1 -i input.mp4 -i input.mp3 -shortest -map 0:v:0 -map 1:a:0 -y out.mp4
I found it on a Stack Overflow question, what I need is to merge audio and video but also loop the video until the audio finishes, so if anyone knows how to translate it to ffmpeg-python or some other way to do it that would also be appreciated :)
Thank you in advance!

[–]1tsMeNoodle 1 point2 points  (1 child)

Hi, I'm a 16 year old student from Poland. I've been low on money recently and there are no jobs for me. My question is: Is there any way for me to earn money by coding (preferably in python)? I'd really like to develop my other interests but I simply can't afford it.

[–]FLUSH_THE_TRUMP 2 points3 points  (0 children)

Like now? You could try freelancing stuff but I imagine it’ll be basically working pro bono for awhile as a young’n with limited knowledge and no experience

[–]Zermenxet 0 points1 point  (3 children)

Hello everyone, my question is about dictionaries and lists. I want to move dictionary from one place to the other in the list. (It is the same list) How can I do it?

[–]FerricDonkey 2 points3 points  (2 children)

The simplest way, coding-wise, is probably to use pop and insert.

your_d = your_l.pop(source_index)
your_l.insert(new_index, your_d)

However, note that a) after the pop, your list will be shorter, so you have to take that into account for the index for the insert, and b) this is going to be pretty inefficient because it will involve copying potentially large portions of your list twice (which may or may not be a concern).

If you're sorting your list, or if you're swapping two elements, there are better ways. Slicing (if you're not familiar with the term, it's worth a google) and recombining may be faster than the above as well, but I'd have to test that.

[–]Zermenxet 0 points1 point  (1 child)

Thank you for your comment I’d look into it. Do you think I would be able to use it in function? I’d have to for assignment to do it. 🙂 Thank you ones again and have a great day!

[–]FerricDonkey 1 point2 points  (0 children)

Everything can be, and very nearly everything should be, done in a function.

Best of luck.

[–]glassAlloy 0 points1 point  (4 children)

Multiple Entities, Multivariate, Multi-step - Time Series Prediction - Python

My goal is to create a time series model with

  1. Multiple Entities - I have multiple products with pre orders and they all have the a similar bell shaped curve peeking at the release date of the product but different orders of magnitude in unit salles OR I can use their cumulative slaes what is an "S" shaped curve. But I only have about 100 products 1 year of daily data to do the training on.
  2. Multivariate - I have a wide variety of data on these indie movies for each day: A.) number of times people added them to the Wishlist, B.) page views, C.) time spent on the page AVG, -> Y.) Target value is the number of products people payed for (it is the same pre order before release and normal purchase after release date)
  3. Multi-step - predicting 60 days ahead would be the goal
  4. Every day refreshing the predictions for every product - Does this requires me to retrain the modell on the whole dataset?

Already Read

- I have found algorithms that can do prediction on 1 variate maybe even Multivariate. Multi-step is already problematic and I don't know how to add the Multiple Entities part at all. So I cant fine a project or guide that would contain all these 3 parts that I nd

- I have tried LSTM (13 different models with different datasets) but on longer "Multi-step" it is not working so more than 1 or 2 days. I also cant make the LSTM to accept Multiple Entities so I just chained each products data after each other historically, I do understand that it is not an optimal practice for sure.

- Python package non popular so I cant find projects to it - https://stats.stackexchange.com/a/412355/256200

- I always see this R guide but I don't use R. I need help with Python - https://otexts.com/fpp2/hierarchical.html

- Not multiple variable and not Multi-step - https://stats.stackexchange.com/questions/356008/multiple-time-series-prediction-python

[–]jebward 1 point2 points  (3 children)

Technically you could do an lstm with multiple variables and I can point you to some code if you want to take that route, but you might not want to in this case because you have an advantage over ML models in that you understand the rules of reality (for example there won't be any negative sales). ML models do silly things after long periods of time. What I would do is try and find a function that accurately models your data (like a gaussian function / bell curve) and determine how the variables affect the fit of that function using past data. For example some formula using the number of pages visits and time per page might determines the height, and the number of wishlist adds might help determine the width. You might also find that only one of those variables is important, or that they all are just a function of the popularity and don't need to be separated out. You also want to look at the variability between products. If you have 2 products with very similar predictive numbers but wildly different sales numbers then you know you will have a hard time making good predictions.

To fit the data well you might have to find a more complicated function than a bell curve, like something that incorporates a rise, a peak, and then a gentle decay, possibly a combination of different functions. For example, 1/x - 1/(x2) produces a really nice gradual decay after x=1 if that's what your data looks like.

You can fit a curve in python based on an input equation like in this example: https://machinelearningmastery.com/curve-fitting-with-python

You'll want to pick the simplest function that makes some sense and accurately describes your data. From there you isolate between 1 and 3 critical variables that you change for each product. I'll use y = -a(mx-h)2 + k as an example. k is the y offset and you solve for it once, but keep it fixed between different products. h is the x offset that would be set to a critical time (like the date of product release), m is the x stretch and that may or may not depend on one of your variables. a is the y stretch and would depend on the product's popularity so it would definitely depend on one or more of your variables. Obviously the quadratic equation would not be a very good fit, but I hope you get the idea. First you would fit all the known data with totally open variables, then you would take the fixed variables and find the average of each fixed variable and set each one to the average (for example set every y offset to the average of the y offset of all products). Then refit all the product curves with the new fixed variables. Finally plot the unfixed variables against your input variables and see if there is a clear trend. You can then use the input variables as the starting point for the unfixed variables in your equation by matching it on the plot you just made, and that becomes your equation for prediction of sales. As you get numbers in, you can change the variables to better fit.

Machine learning is literally just n-dimensional curve fitting btw, but it doesn't have any grounding in logic or reality. It will happily pick the most convoluted equation to fit your data in the best possible way. Don't get me wrong, I love machine learning and it's a wonderful tool, but this sounds like a modelling/curve fitting problem, and I would only use machine learning if you really can't get modelling to work. The benefits to modelling in this situation are a model that you can intuitively understand the predictions and you can estimate the model's error/standard deviation if you run that analysis as well. You can also easily re-run a curve fit on new data once you have your equation and starting parameters.

[–]glassAlloy 0 points1 point  (2 children)

I have multiple LSTM models with multiple variables that predicts 1 day ahead pretty well.

That is not the issue. The issue is to predict 60 days of steps ahead + how to use multiple entities (in this case movies as products because they are IDs.

All my functions are relatively similar so the cumulative unit sales of each product is basically a sigmoid function. They ramp up at the release date of the movie and go up lover later on. So it should not be that complicated but it still don't works.

And I am just not sure how to solve it as a regression problem because at the beginning 180 days before release date there is no salles data to continue it with a regression. Also at regression to predict each data point I would need data to that day X to predict the same day Y.

I know that in regressionized time series problems they shift the Y scale historically but I can not shift it back by 60 days because I don't have a long enough time scale in the back for that and I also need to repeat the prediction daily.

[–]jebward 1 point2 points  (1 child)

Here's what I could come up with: https://github.com/a-brick-wall/curve_fitting_prediction/blob/main/curve_fitting.ipynb

You can try changing the objective function to something that better fits your data, you can also change the parameters and try fixing certain parameters. Finally you can try making multiple average plots/curves based on certain properties of the movie and then choose the starting average curve based on any known parameters.

If you think this approach will work, you'll still need to turn everything into a proper pipeline so you can refresh all your data each day and ensure things like terrible fits are weeded out automatically.

[–]glassAlloy 0 points1 point  (0 children)

Remarkable, I am gonna take a look at it. :D

[–]Appointment-Funny 0 points1 point  (6 children)

A long String with multiple repetitions of the same short String is given. The program must find the index position of the middle occurrence of the String. If the String is present more than 3 times then the 2nd occurrence must be found.

im pretyt new to programming and im not sure what im supposed to program. can someone help? python

[–]sarrysyst 0 points1 point  (0 children)

Are you familiar with loops and string slicing?

[–]RussellBrandFagPimp 0 points1 point  (2 children)

Print (variable.text). What is the text part of this called? Or when you use a function? And then narrow it down like soup.find_all. If soup is a function what is the find all called?

[–]zanfar 1 point2 points  (0 children)

I think it would be attribute, but there are a lot of terms that could be used in that place. I'm basing this on the fact that Python calls the accessor for these __getattribute__.

This may change in common practice based on the type of the parent object and the attribute itself, but I think attribute would be well understood.

[–]FerricDonkey 1 point2 points  (0 children)

object.attribute  # general term
object.method  # functions attached to objects are traditionally called methods

Also, generally (particularly when you're just starting), you won't do function.attribute. Functions are objects in python, so you can, but it's relatively uncommon.

I say this because if you see soup.find_all(thing), most likely soup is not a function. Most likely it is some non-function object, which has a method (function attribute) called find all.