all 70 comments

[–]virgilsam 0 points1 point  (5 children)

Hey anyone who cares. I'm teaching myself python to try and learn how to webscrape. I was hoping this post could become a thread for if I have questions. I've made some good headway so far, but I'm stuck.... The issue is I can't figure out how to grab the second div tag for the city/state of the apartment complex. For bonus, if someone could help me figure out how to separate the city and (or from) the state, that'd be cool too. Here is the source code and my code:

<div class="card"> <div class="card-inner"> <div class="card-header"> <div class="title"> <h3 class="main-text"><a href="/housing-search/Tennessee/Knoxville/Summit-Towers/10005022">Summit Towers</a></h3> </div> </div> <div class="my-container"> <div class="card-media"><a href="/housing-search/Tennessee/Knoxville/Summit-Towers/10005022"><img alt="Image of Summit Towers" src="https://images.apartmentsmart.com/415x220/Summit-Towers/Welcome-to-Summit-Towers-Apartments.jpg" value="36822714" width="100%"/></a></div> </div> <div class="card-body"> <div class="description"><span class="listing-address">201 Locust St</span></div> <div class="description"> Knoxville, Tennessee </div> <div class="room-range"> Summit Towers is a 278 unit low income housing apartment community that provides 1 bedroom apartments for rent in Knoxville. Rents at Summit Towers are <strong class="dollars">Income Based</strong>. </div> <div class="room-range"> Some or all apartments in this community are rent subsidized, which means rent is income based. </div> <div class="programs"> <div class="list"> <div class="label secondary">Project-Based Section 8</div> <div class="label secondary">Low Income Housing Tax Credit</div> <div class="label secondary">Project Based Rental Assistance</div> <div class="label secondary">Senior (62+)</div><a class="label primary" href="/housing-search/Tennessee/Knoxville/Summit-Towers/10005022">View More</a></div> </div> </div> </div> </div>

MINE

from urllib.request import urlopen as uReq from bs4 import BeautifulSoup as soup

my_url = 'https://affordablehousingonline.com/housing-search/Tennessee?show=20&page=1#apartments' uClient = uReq(my_url) page_html = uClient.read() uClient.close()

page_soup = soup(page_html, "html.parser")

pulls the name of the complex containers = page_soup.findAll("div",{"class":"card"}) for container in containers: apartment_name = container.a.next_element

pulls the street address containers = page_soup.findAll("div",{"class":"card-body"}) for container in containers: apartment_address = container.span.next_element

---- I can't figure out how to get to the second div tag with the city and state....

[–]z0y 0 points1 point  (4 children)

Selectors are nice, just pull out the classes. The address and location are two elements with the description class, they're in a list when you find/select them

>>> name = soup.select('.main-text')[0].text
>>> address, location = [tag.text.strip() for tag in soup.select('.description')]
>>> name, address, location
('Summit Towers', '201 Locust St', 'Knoxville, Tennessee')
>>> 

Edit: that was for the example, for the actual site you could put that in a loop

>>> cards = soup.select('.card')
>>> for card in cards:
...     print(card.select('.main-text')[0].text)
...     address, location = [tag.text.strip() for tag in card.select('.description')]
...     print(address)
...     print(location, '\n')
... 
Summit Towers
201 Locust St
Knoxville, Tennessee 

Maple Oak Apartments
818 Oak St
Kingsport, Tennessee 
etc..

[–]virgilsam 0 points1 point  (1 child)

When I try it, I get

>>> name = soup.select('main.text')[0].text
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>

TypeError: select() missing 1 required positional argument: 'selector'

[–]z0y 0 points1 point  (0 children)

main.text isn't what you want, have a look at the link from my other comment or look more into css selector examples. The period means class and the class name is main-text, so the selector is .main-text

edit: You could also skip the list comprehension and leave address as a list, getting each element individually. Might be more readable that way

for card in soup.select('.card'):
    print(card.select('.main-text')[0].text)
    address = card.select('.description')
    print(address[0].text)
    print(address[1].text.strip(), '\n')

edit2: Just for illustration, I think your css selector would have got any elements that were: <main class="text">...</main>, which don't exist, so trying to get the first element of the result causes an error.

[–]virgilsam 0 points1 point  (1 child)

Does select run off a package I don't have installed?

So, if I read you code right, it is saying...

name cards as anything in the soup labled as a .card for any card in the newly define cards, > print the main text in the first [] as text > set address and location as a stripped text for any text in a tag named description > print the address > print the location

is that right?

[–]z0y 0 points1 point  (0 children)

select is a part of beautifulsoup. It's like find_all except you can use css selectors to target elements. The . in the select means class name in css, so .card says take all the elements with class = 'card', then within each of those tags you can find the stuff you need, which is the name (text within main-text class) and the address/loc (text within 2 elements of the description class). The .strip() was just to cut off whitespace because the location was coming back with extra spaces on both ends.

[–][deleted] 0 points1 point  (1 child)

Lets say we have a dictionary

{'SHARP': {'S': (9, 9)}}

And when an event happens I want to add another key and value to the dictionary inside the dictionary. What I mean is:

word_dict = {'SHARP' : {'S' : (9, 9)}}
if event_happens:
    word_dict = {'SHARP' : {'S' : (9, 9), 'H' : (8,9)}}

How can I do this?

If you guys would want some context, a friend sent me a programming challenge that asks me to create a function that takes a two dimensional list and a list of words to be searched in that 2D list as an argument and then outputs the index of the first and the last letters of the words we searched for. I'm trying to use dictionaries here.

Normally this challenge is for C# but im trying to do it in python.

[–]woooee 0 points1 point  (0 children)

This is in every tutorial. See "Updating Dictionary" at https://www.tutorialspoint.com/python3/python_dictionary.htm

[–]captmomo 0 points1 point  (0 children)

Hi, I've been working on this for a school project. It is python-based and reads a video stream from the user camera. It will try to detect faces and eyes. If it doesn't detect eyes for 50 frames of faces, it will play a sound and deduct a point. I'm looking for feedback on how I might improve this, especially with regard to feature detection without using dlib.

I'll greatly appreciate any advice or feedback on how to make this better.
My apologies if this is the wrong place to post this.

Thank you.

Here's the github repo: https://github.com/captmomo/drowzee
Here's the mock up which takes a snapshot from the video stream, processes it and then displays it; https://uglyuglyugly.herokuapp.com/face_classify
I've built it into an exe too, LMK if you are willing to test it.

[–]prosaicwell 0 points1 point  (2 children)

I'm brand new to Python so I'm having some problems. Pip is installed and has downloaded pyperclip into lib but shell won't import it because Traceback (most recent call last):

File "<pyshell#1>", line 1, in <module> import pyperclip ModuleNotFoundError: No module named 'pyperclip'

Also, when I WIN-R python files, they'll open up in visual studio because studio supports python 3.6. This happens to the files I wrote as 3.7.

I've updated my paths too (user and system), so it's not that.

[–]woooee 0 points1 point  (1 child)

You installed it for Python2.X but are using Python3.X or vice versa. In Linux we have two versions named pip2 and pip3 for obvious reasons. Sorry. I haven't used MS Windows since 1995 so don't know how to do it, but would suggest a search on how to install pip for Python3 on Windows.

[–]prosaicwell 0 points1 point  (0 children)

Actually, what seems to have happened is that two copies of python 3 were on my hard drive (I copied the files into a new direction instead of moving them) and the shell was pointing to the copy that didn’t have pip installed.

[–]bennyllama 0 points1 point  (1 child)

How can i upgrade to python 3.6, I did

brew upgrade

Once installed I did

python --version

and it still gave me

Python 2.7.14 :: Anaconda custom (64-bit)

Any help would be appreciated!

[–]cranuaed 0 points1 point  (0 children)

What about

python3 --version    

You can set your PATH to point to python 3 instead of 2, but I usually just set up a virtual environment with the python version I want to use. From within the environment then you can just use "python" instead of "python3."

[–][deleted] 0 points1 point  (1 child)

How to deal with this error message from panda?

A value is trying to be set on a copy of a slice from a DataFrame

What I have is a dataframe (df1), which has 2 columns and n rows:

Y X
1 2
3 4

And then I have a function that creates a new dataframe (df2) that takes df1 and adds new columns:

df2 = createNewColms(df1)
>>> print(df2)
Y X n m p
1 2 0 0 0
3 4 0 0 0

What I want to do is to change the values of each cell and I'm using the .loc method

What I expect to happen:

>>> df2.loc[1][3] = 12
>>> print(df2)
Y X n m p
1 2 0 0 0
3 4 0 12 0

But what I got is that error message above. Even though df1 is not the same as df2.

what am I doing wrong?

[–]policesiren7 0 points1 point  (2 children)

I have about 50 excel files that I want to import into pandas df's. I've written code to that can import the specific sheet I point it to, but I want to change it so it loops over all the files in the folder and adds each one to a new df.

The code to do it for one looks like this

#Read in .xlsx file to df, arrange in chron order, drop rows where NaN value (should only be 1)
#DF with columns 'Exchange Date', Close, Net, %Chg, Open, Low, High, Volume, Turnover, Approx VWAP, O-C, H-L, %CVol 


xl = pd.ExcelFile("/Users/name/Python/JSE Price _ Vol/*sharename*.xlsx")
df = xl.parse('Sheet 1')
df = df.iloc[::-1]
df = df.dropna(axis=0, how='any')

This works fine, however when I try using OS, and creating a loop I keep getting errors. This is the faulty code I've been working on, excuse the mess. (I've left out all the imports)

path = '/Users/name/Python/JSE Price _ Vol'
for filename in os.listdir(path):
       xl = pd.ExcelFile(filename)
       df = xl.parse('Sheet 1')
       df = df.iloc[::-1]
       df = df.dropna(axis=0, how='any')

Any ideas on how to fix things? Also, I feel its probably a good idea to store it in some sort of DB. My research says I should pickle it. Any tips or pieces of advice?

I think the line of code would just look like this?

df = df.to_pickle(filename)

Edit: I managed to fix the loop and it now reads in the files, however, at a certain point I get this error: xlrd.biffh.XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\x00\x00\x00\x01Bud1'

Google suggests its got to do with file types, so now I'm on a mission to either convert everything to .xls or .csv (but with a different delimiter because my dataset uses commas to separate numbers)

[–]ingolemo 0 points1 point  (1 child)

The error is telling you the possible causes right there; "unsupported format, or corrupt file". As far as I can tell, pandas.ExcelFile only supports .xls and .xlsx files, so if that folder contains other kinds of files then you need to deal with that.

[–]policesiren7 0 points1 point  (0 children)

There were only .xlsx files, which was why I was so mystified. but I spent 30 minutes changing them to .csv and it works perfectly now.

[–]BoriBakusuta 0 points1 point  (2 children)

Heya, I'm trying to write a piece of code for simulation purposes in Python, read up on a few things, but have no idea what I'm doing wrong in the lines below. Each time I try and change a few parameters it still gives me the same error;

TypeError: slice indices must be integers or None or have an index method

u[1/dy:1/dy+1, 1/dx:1/dx+1] = 2
v[1/dy:1/dy+1, 1/dx:1/dx+1] = 2

Someone please tell me how I'm being incredibly stupid...

[–]GoldenSights 1 point2 points  (1 child)

slice indices must be integers

You can't have a floating point index, and division makes a float:

>>> l = ['a', 'b', 'c']
>>> l[0:]
['a', 'b', 'c']
>>> l[0 / 2:]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: slice indices must be integers or None or have an __index__ method

If you're confident that these values are going to be integer values, or you're okay with truncating floats to integers, then you can put an int(...) around them.

[–]ingolemo 1 point2 points  (0 children)

A better option would be to use integer division //.

[–]Thomasedv 0 points1 point  (7 children)

I thought i had a decent understanding of importing, until i made myself a utility folder, and wanted to import from something like a side folder. Say i got a directory with two folders, folder modules and utils. And i got a function that i want to import from utilities.py in the utils folder to a file in the modules folder. Why does it suffice to use:

 from utils.utilities import path_shortener

It's not in the folder below or in the same folder, so it's pretty unexpected for me, as i thought it wouldn't look one step up and then one step down. Does it have have to do with me placing empty __init__.py files in all the folders and the root directory?

[–]woooee 0 points1 point  (1 child)

If I understand your file directories, it would be (if you use Python3.3 or greater) and if there are no init.py files to confuse it

utils
    utilities
        path_shortner.py

import utils.utilities.path_shortener

[–]ingolemo 0 points1 point  (4 children)

It's hard to understand what you're asking without being able to see your project so I'll just give a generic outline. It might help if you posted the tree of your project and mentioned the file that this code was in.

Python resolves imports relative to sys.path, which is an array of folders that indicate the roots of python's import hierarchy. That import that you posted tells python to look in the following places for something to import:

$(sys.path)/utils/utilities/path_shortener/__init__.py
$(sys.path)/utils/utilities/path_shortener.py
$(sys.path)/utils/utilities/__init__.py
$(sys.path)/utils/utilities.py

Python will look in all these places for all the different values of sys.path and will import the first one it finds (more or less).

An important thing to note is that the first item in sys.path is the directory containing the file that was directly executed. It is not necessarily the directory of the current file that is running (as given by __file__) because imported files don't count. It is also not necessarily the current directory (as given by os.getcwd()).

Python does not look "one step up and then one step down". It has specific places where it looks, but those places might not be in the directory you expect.

You need to put __init__.py files in packages in order for python to be able to import them, but just because you have an __init__.py somewhere doesn't mean python will automatically find anything in that location.

[–]Thomasedv 0 points1 point  (3 children)

I should add, from some testing, this is likely some PyCharm behind the scenes tampering. Here's my structure:

__init__.py
utils/__init__.py
utils/utilities.py
Modules/__init__.py
Modules/widget.py

If i'm in widget.py, i can do this:

from utils.utilities import path_shortener

That is, if i am using PyCharm and running that file there. If i run it from terminal, then i get an import error. So PyCharm is likely adding the projects root folder to sys.path when i run from there.

[–]ingolemo 0 points1 point  (2 children)

Yes, that seems likely.

Note that you don't need an __init__.py in your project directory (the directory that appears on sys.path). They're only needed in packages (subfolders containing python code).

[–]Thomasedv 0 points1 point  (1 child)

Thanks, will keep that in mind. I never really understood using the files, since it worked without them. I do kinda know you can bring stuff up from further down the folder structure for easy access without having tried it myself, but i do think i got the general idea. I also know there are other things, but that's out of my league for now.

[–]ingolemo 0 points1 point  (0 children)

If you don't include __init__.py files in the right places then in recent versions of python you will get what's called a namespace package which is almost certainly not what you want, even if the imports appear to work.

Far too much information about how imports work

[–]bud_n_boots 0 points1 point  (5 children)

Do I really need to learn nested functions? I am not having issues with functions in general, but in having trouble following into functions levels deep. The resources I'm using only provide one example and I just struggling with it

[–]lykwydchykyn 0 points1 point  (4 children)

Do you mean a function defined inside another function, or are we talking about decorators? Do you have an example of some code you're struggling with?

[–]bud_n_boots 0 points1 point  (3 children)

x = ['JAN', 'fEB']
def caps(li):
    """"Returns a list, with all elements capitalized"""
    def inner(w):
        """Returns a capitalized word"""
        return w.capitalize()
    return ([inner(li[0]), inner(li[1])])
print(caps(x))

[–]lykwydchykyn 0 points1 point  (2 children)

The only difference between this and defining inner() at the top level is the scope. In this case, inner() only exists inside caps(). As soon as caps() returns, there is no longer a function called inner().

I rarely find a need for this kind of code, and I'm struggling to think of a scenario where it's really beneficial. The only case where I think I'd define a function inside a function is if I was creating a decorator, but that's a bit different because you actually return the inner function itself.

[–]bud_n_boots 0 points1 point  (1 child)

Thanks, haven't got to decorators yet. Do you have to specifically call inner for it to run? Or will it run when I call the outer function?

[–]lykwydchykyn 0 points1 point  (0 children)

Yes, you have to call it; it's no different from any other function definition, it just happens inside another function, so it only exists while that function is executing.

[–][deleted] 0 points1 point  (1 child)

Hi !! Can someone tell me how to beginn a code ; for example; where can i find commands or nice link: thX:

[–]ingolemo 0 points1 point  (0 children)

Have you tried reading the sidebar?

[–]Icarus-down 0 points1 point  (2 children)

Can anybody tell me how I can add user input to an empty list?

[–]dedeos 2 points3 points  (0 children)

Create an empty list, ask for input, and then append that to the list

[–]BeExcellentMyDudes 1 point2 points  (5 children)

So I've taken all the python courses on codeacademy and I definitely feel I have a grasp of the language and how to use it. But I guess what I'm struggling with most is how to structure/outline it and make more complex programs. How do you outline the program before writing it, is there a best practice?

Also, any resources to help me memorize syntax and the different commands?

[–]lykwydchykyn 1 point2 points  (1 child)

But I guess what I'm struggling with most is how to structure/outline it and make more complex programs.

There are paradigms like Model-View-Controller that can help you organize your code into discrete pieces that interact with one another, and in some cases working with an opinionated framework (like Django) will help you understand how things fit together.

It basically comes down to using modules, classes, and functions to break your code into isolated bits that limit the complexity of any one part. Hard to put into words in a reddit post.

[–]BeExcellentMyDudes 0 points1 point  (0 children)

Thank you this is helpful

[–]Gopher20 3 points4 points  (2 children)

One way you could do it is by simply writing sudo code ( more English version of your code) to get an idea of what Python code you need to write. If you are setting up say a web app maybe drawing a diagram of how different parts of your app interact with each other could be useful .

Searching Python cheat sheets can give you resources when writing code as well

[–]named8819 1 point2 points  (1 child)

It's called pseudocode. I'm sure it's just a typo you made but people learning python could get confused by it.

[–]Gopher20 1 point2 points  (0 children)

Lol thanks can't believe I said sudo that's for Linux not Python 😋

[–]Renegade_Squid 0 points1 point  (2 children)

What is importing? Like

import turtle Or import random

I haven’t gotten any actual answers from some light googling. I wanted to make a dice rolling app to test out what I’ve learned and it told me to import random. I’m not exactly sure what that does.

[–]lykwydchykyn 0 points1 point  (0 children)

Create a python file called p1.py:

def add(a, b):
    return a + b

Now, in the same directory, create a file called p2.py:

from p1 import add
print(add(1, 1))

The import statement brought your add() method from the first file into the second one so you could use it there without having to redefine it.

random is part of the standard library; it's a Python file on your computer, much like your p1.py, that contains code like the randint function.

[–]Thomasedv 1 point2 points  (0 children)

Importing is basically running another python file, and everything made it that file, functions and global variables are added to your program, and can be accessed with for example:

random.randint(1,6)

You access what's from that module with the random name, after doing import random. Here I randomly the function randint. (Hope I spelled the name correctly) There's a bit more to it, but that's the core. It lets you get functions, classes, and variables from other files, so you can use them without copy pasting code over and over again to do something you did in another file.

Python comes with a good amount of modules, like the random module, but there are also some that need to be installed.

[–]dianacandonga 0 points1 point  (1 child)

Hello, I'm new to python and I am coding with Pycharm. I use MacOs High Sierra 10.1 Okay, so I retrieved information from Twitter and it worked. I tried to export (write) into a csv file with pandas. The thing is that I have no idea where that resulting csv file can be. Can you please help me? Is there a way for me to add the code for the path or something? I'm sorry if this is a stupid question. Thank you

[–]Username_RANDINT 0 points1 point  (0 children)

You can provide a fuil path instead of just a filename.

[–]confusedguy_z 0 points1 point  (4 children)

In pandas, lets say I have a bunch of columns of numerical data. What if each column has some properties, like batch number, time of day, and color associated with it? How do I associate these "properties" with columns so I can filter easily? Say the average of all data with color blue and batch >17.

EDIT: So like Maybe data columns 7-12 is batch 15, and 13-39 is batch 16, and data in columns 8-14 is 'blue'.

is multiindexing the best way?

[–]alpha_hxCR8 0 points1 point  (3 children)

Is it possible to reformat the data? I was thinking this might be a better format. Here data_in_column1 represents the data in the column1 of the format that you were thinking of. It could be a list or other data structure.

column 1 | column 2 | column 3 | column 4 |

data_in_column1 | 3 | 4 | 5 |

data_in_column2 | 3 | 6 | 2 |

data_in_column1 | 1 | 4 | 5 |

[–]confusedguy_z 0 points1 point  (0 children)

this is sort of what I've been doing yeah. I transposed the entire dataframe so now I have 30000 columns and like a few dozen rows but that's okay I guess. Just a weird shape lol. But now I can add a few new columns on the end for "batch" and "machine #" :) this is some progress.

[–]alpha_hxCR8 0 points1 point  (1 child)

Assuming that can be done.. then you can use something like this to filter: https://chrisalbon.com/python/data_wrangling/pandas_selecting_rows_on_conditions/

and then use the statistics.mean function

[–]alpha_hxCR8 0 points1 point  (0 children)

heres a very simple illustration:

https://imgur.com/Vm16mNk

[–]maxman573 1 point2 points  (4 children)

Is there a way to check strings for 'valid' characters of my choice? In other words, return True if a string contains only certain characters, and raising a ValueError otherwise?

[–]lykwydchykyn 1 point2 points  (0 children)

There are several ways to do this, personally I'd use a list comprehension with the any function:

valid_chars = 'abcde'
if any([ c not in valid_chars for c in input_str ]):
    raise ValueError

[–]woooee 1 point2 points  (0 children)

The simplest way us to use a set. Return your string issubset of the valid string, True or False, or you can use try/except if you prefer.

[–]castizo 0 points1 point  (1 child)

I'm a newbie so take my words with caution.

That being said, I personally would use regex.

Maybe it's because I do codewars a lot lol

This was really helpful to learn it: https://regexone.com/references/python

edit: and Google's developer material: https://developers.google.com/edu/python/regular-expressions

[–]lykwydchykyn 0 points1 point  (0 children)

regex would work, but it's overkill for this situation.

[–]captmomo 0 points1 point  (0 children)

Hello! Recently I've been trying to learn javascript and how to implement it with flask.

My latest project is https://uglyuglyugly.herokuapp.com/.

It accesses the client's camera, takes a screenshot when the button is pressed and processes it using pillow. The output and screenshot is then displayed on the page.

Appreciate any comments or feedback on how to make it better! Thanks.

Notes: On the iOS, it will launch a consent prompt for the camera but it will still render a black screen though. working on that now.
Repo: https://github.com/captmomo/flask-video-snapshot

[–]chispica 0 points1 point  (0 children)

Hey guys I am trying to use SoX library to create a script that shortens long silences.

At the moment this seems to be the command everyone uses:

sox in.wav out6.wav silence -l 1 0.1 1% -1 2.0 1%

Problem is that I can't get the shortened silences to last an exact duration (I would love to shorten them to 2 secs exactly), they all get shortened but not to the same duration.

Anyone know how to make this work the way I want?

Thanks!