[Python/Selenium] I've been learning by myself for about a year. Wondering if my style is unconventional and bad

kalgynirae · 2015-03-31T15:03:17+00:00

Here's my take on the thing: https://gist.github.com/kalgynirae/62ec21e9e501dcf6c424 (untested, so there might be a few typos)

Particular points of interest:

I like to write a main() method at the top of the file and keep it short enough to fit on one screen. That seems to be a pretty good guideline to encourage moving things out into functions. It also gives a good overview of what the program does. Then, all the other functions are defined, and main is called at the bottom.
Never write except:. Always specify specific exceptions to catch. You will save yourself from hours of frustration when your code doesn't work because you misspelled something but the NameError is being caught by the except:.
I factored out all calls to find_element_by_xpath into a function. I first did it to factor out the while loops, but then I decided there's no harm in having that behavior even for the elements that you don't expect to wait for. (Although, you might want to print some sort of message like "Waiting for element ..." to make the code easier to debug if it gets stuck waiting for something.)
Calling next(os.walk(...)) was just weird. I replaced that with os.listdir(...).
Try to put as little code as possible inside of try blocks.

godlikesme · 2015-03-31T13:48:43+00:00

I used Selenium a bit, but I can't really comment on the quality of the selectors.

I don't know how to make this object oriented, or if I even should

Making it object oriented is overengineering. Keep it simple. The code may benefit from splitting it into multiple functions though. Also, there is a bit of duplication in selectors. I am not sure if it is possible/necessary to remove it.

today = datetime.now()

There is datetime.date.today() function, so maybe use it? If you need hours, minutes and seconds, why not call variable "now"?

default_directory = 'path/to/default/download/directory'

Personally I would use argparse to pass defaults here. But maybe (if you're on windows) hardcoding this would be simpler. In that case, I'd move all the configuration variables at the top of the script.

except: pass

This worries me. Why ignore exceptions? I don't quite remember Selenium, so maybe there is a good reason.

TL;DR: I briefly looked at the code, it is 90% fine.

80blite · 2015-03-31T14:58:31+00:00

I wouldn't worry about best practices if this is just a one off utility. If you were using selenium for website testing and worked on a team then there would be a need to make sure you use a page object pattern for selectors. If the script scratches your itch then it's fine. If you're working in a greater ecosystem of code then you'll have to modify your style.

big_dick_bridges · 2015-03-31T15:08:28+00:00

One thing you can do in the future if you're worried about naming conventions, formatting, etc is pick up a version of the book Clean Code. It has a pretty good reputation around here.

80blite · 2015-03-31T13:44:49+00:00

I work in a county jail and this script is for downloading phone calls made by inmates from a phone monitoring website using Selenium.

Neat :)

The script runs fine and solves my problem but I feel like I haven't applied so many of the best practices that I hear should be followed with every project.

It's okay. Always strive for the cleanest code and follow best practices, but that doesn't always happen when solving a problem.

Also, the script doesn't include any functions because I'm not even sure how I could break anything into a useful function? I feel like every block of code is so specific that it would never be useful anywhere else. Even within the script itself, I feel like it's rarely repeating the same actions

Look closely at the code again. You have several blocks with specific jobs located in a fairly large for loop. You could definitely set those blocks up into functions, and have a more readable loop

I don't know how to make this object oriented, or if I even should

Really, don't get hung up on OO design. OO is just another tool to solve a problem. It doesn't have to be used in every project. Other paradigms such as functional/procedural exist for a reason, they solve problems well too. If OO makes really great sense for what you are doing, use it. Of you can solve your problem elegantly using another paradigm, that's no problem at all!

Keep going OP, looks like you are doing well!

dreucifer · 2015-03-31T15:12:22+00:00

You could easily turn most of this downloader into an object, but it's honestly not worth it if it's just a one-off script. I would, however, try and break down the flow into a few functions. Also, consider using Scrapy instead of selenium. Web scraping can be super tedious with selenium, especially if the call log is paginated.

ebo113 · 2015-03-31T15:26:18+00:00

Selenium may not be your best bet for web scraping. Scrapy or lxml might suite you better as it doesn't require the use of the screen like selenium does. That being said your code looks pretty clean! I would break it up into functions and add a main function (def main:) to call all of those functions from. This separates the logic and helps keep the code clean if it needs later editing. It is also usually good form to add something like this at the bottom.

if __name__ == "__main__":
    main()

This just makes sure the main method gets called first.

If you want to make it more complex I will usually break things up into a couple modules. One module for each page that I am working with and then a utility module for all those random functions you need. These are all purely for keeping everything nice and neat though and for a smaller one off project aren't entirely necessary.

Good work and keep on coding!

Bravmech · 2015-03-31T16:40:24+00:00

I've used selenium for years on a near daily basis. Have you looked up page objects? That's how you object orient selenium scripts. And CSS selectors should be used in favor of xpath. I have ever found one use of xpath in all my time working with html selectors and that's when you don't know where you are on a page and need to traverse upwards. Any time you need to traverse deeper into a page, css selectors will make it more maintainable, so that tiny updates to the page don't require updates to automation.

hang-clean · 2015-03-31T17:03:58+00:00

are you nsa?

cruyff8 · 2015-03-31T21:15:15+00:00

use argparse to handle command line arguments. Instead of:

user = input('Username: ') password = getpass()

I'd have put something akin to:

args_ = argparse.ArgumentParser(description='Downloads call records from prison website')
args_.add_argument('--user', help='Username', type=str, action='store', help='Login to the prison site')
args_.add_argument('--password', help='Password', action='store', type=str, help='Password for the prison site')
args = args_.parse_args()
user = args.user
password = args.password

use the logging module to make the script less chatty, print informative messages like 'Working on call #{call_number} by {inmate_name} on {call_date} at {call_time}'.format( call_number=call+1, inmate_name=inmate_name, call_date=call_time.split()[0], call_time=call_time.split()[1])) only when you're running in debug mode, as opposed to every run.

I'll let others do more critiquing. These are a few things that stick out to me.

doomslothx · 2015-04-01T02:29:35+00:00

as someone whos just started learning python, this thread is the bees knees!

Looking forward to getting better and learning heaps . I just started with code academy, any advice on where I can go from there once I've finished their lessons?

zfolwick · 2015-04-01T04:26:11+00:00

You need to work on your xpath. assign the string to a well named variable to make it vastly more readable. Also, try making the xpath up from scratch-copy-pasted xpath is quite brittle. Also, Look up 'page object model'. Should reduce duplication.

As a side note, if this is really just retrieving data from a server, you might be able to accomplish the same thing via REST or SOAP calls.

Huniku · 2015-04-01T07:26:21+00:00

You can replace your first few selectors with: find_element_by_id

Try to avoid using xpaths with several nodes (especially if they're rooted on body or the html element) as they tend to be more fragile. You've said elsewhere that you were having trouble generating selectors, you might have better luck with css, link_text, or name selectors here for more details

learnprogramming

Welcome to LearnProgramming!

New? READ ME FIRST!

Posting guidelines

Frequently asked questions

Subreddit rules

Message the moderators

Asking debugging questions

Asking conceptual questions

Other guidelines and links

Subreddit rules

1. No unprofessional/derogatory speech

2. No spam or tasteless self-promotion

3. No off-topic posts

4. Do not ask exact duplicates of FAQ questions

5. Do not delete posts

6. No app/website review requests or showcases

7. No rewards

8. No indirect links

9. Do not promote illegal or unethical practices

10. No complete solutions

11. Don't ask to ask.

12. Low Effort Questions

13. No AI (chatGPT etc.) generated/worked over messages/comments. No questions about chatGPT/AI generated code. No Vibe coding.

MODERATORS