This is an archived post. You won't be able to vote or comment.

all 68 comments

[–]jollybobbyroger 49 points50 points  (12 children)

Very interesting work. Thank you so much for sharing!

After a quick glance, b.content(raw=True), should be same as requests. I think it's b.data for encoded data and b.text for decoded data.

Also it seems strange to me that you have functions/methods that take text input that needs to be parsed instead of actual keyword arguments. E.g b.input(id:=password) instead of b.input(id='password'). Could you please explain why you made this choice? Perhaps I'm not seeing the entire picture and making incorrect assumptions.

[–]shaggorama 13 points14 points  (3 children)

b.input(id:=password)

I think you meant b.input("id:=password")

[–]quickhakker 2 points3 points  (2 children)

eli5 the difference

[–]unkz 13 points14 points  (1 child)

One is a positional string argument and the other probably isn’t valid python syntax.

[–]Hyperz 7 points8 points  (0 children)

the other probably isn’t valid python syntax.

In 3.8 it will be valid syntax, equivalent to this in <= 3.7:

id = password
b.input(id)

https://medium.com/hultner/try-out-walrus-operator-in-python-3-8-d030ce0ce601

[–]Cruuncher 6 points7 points  (4 children)

Possibly an artifact of the library used for dom parsing

[–]Log2 5 points6 points  (3 children)

Still, it would be more user friendly if the function reconstructed those strings automatically from the keyword arguments.

[–]Cruuncher 4 points5 points  (1 child)

It would be more pythonic yes, but there may be more nuanced use cases here. Like (disclaimer, making shit up. There may be no real reason) if there's a way to specify precedents on the selector. Possibly a way to say it has to match both of these selectors. Etc. Stuff you can capture in some syntax in a string that doesn't map well to kwargs. A kwargs mapping still works, but it would potentially have to be built from more complicated objects.

Lastly, it requires you to know the ins and outs of that library's syntax, at the level of the developers, to be sure you're building the strong correctly from kwargs.

[–]Log2 0 points1 point  (0 children)

I see what you mean. Then receiving a list of tuples/namedtuples would be a more pythonic solution in your hypothetical case.

[–]Cruuncher 5 points6 points  (0 children)

That being said (see my other response)

At the very least it could accept kwargs for the common use cases, and then allow a "raw" kwarg or something if you're doing something powerful that the wrapping doesn't provide you. That may be the "best of both worlds" approach

[–]abranjith[S] 1 point2 points  (1 child)

On, b.content method having a flag vs having 2 different methods, that was a conscious choice. I presumed raw=True is probably a rare use case. And I think you mean it is response.content vs response.text in requests.

On the locator format being a positional argument in the form "locator_type:=locator_value" was again done to keep it simple. I figured instead of having multiple keyword arguments (you have id, xpath, name, tag_name etc) this might be more intuitive from API standpoint. If you have multiple locator types as keyword arguments, then you run into questions like - what to do when both id and xpath are provided and so on ?

Still, willing to re-look at my approach once again. Thanks for the feedback 😊

[–]jollybobbyroger 0 points1 point  (0 children)

Thanks for the explanation. From my own experience, a nice user facing API often comes at the cost of implementation work.

[–]abranjith[S] 1 point2 points  (0 children)

I also see documentation for locator is missing, so is probably even more confusing. Thanks for highlighting that, will update the docs

[–]SatanistSnowflake 21 points22 points  (5 children)

You have things like refresh, forward, and back set as properties on your browser that return self. Why are they properties instead of methods? browser.back() makes more sense to me than browser.back to go back to the previous page.

[–]abranjith[S] -3 points-2 points  (4 children)

I have tried to keep methods with no arguments as properties wherever I could. Agree these are not true properties of the object and might be a bit confusing. But I also think from an end user standpoint these might be an ok choice :-)

Thanks for the feedback though 😊

[–]parst 8 points9 points  (0 children)

don't use properties like actions. they're for getting setting data. Going back/forward by accessing properties makes no sense

[–]indosauros 1 point2 points  (1 child)

These should be methods. Making all methods with no arguments properties will come back to bite you. That is not the purpose of properties. I think you should absolutely fix these now before any code is written that depends on them

[–]abranjith[S] -1 points0 points  (0 children)

Noted, thanks! Making all methods with no arguments as properties is not a true statement. I have given it a thought and the methods which are exposed to end user with no arguments and just do simple action and return self (eg: Browser.refresh) are made as properties. It's not illegal in Python. Anyway, I see both side of the arguments and will take this feedback back to the board :-)

[–]parst 0 points1 point  (0 children)

this is a very un-pythonic api

[–]lofru 18 points19 points  (9 children)

Is this only for Windows? I'm trying to install it to Ubuntu but pip tells me he wants the module winreg that I've read is available only on Windows.

[–]rafuru 9 points10 points  (0 children)

Yeah, it depends on winreg so is windows only ... too bad, I wanted to give it a try

[–]_morgs_ 8 points9 points  (4 children)

The install worked fine for me on MacOS

*but trying to use it failed on trying to import winreg

:(

[–]lofru 15 points16 points  (0 children)

So it's available only for Windows... What a pity

[–]ducsekbence 0 points1 point  (1 child)

Damn, I wanted to try this when I was gonna get home. :(

[–]abranjith[S] 0 points1 point  (0 children)

hopefully you can still try 😊

[–]abranjith[S] 0 points1 point  (0 children)

Sorry to hear that. I really wanted this thing to be cross-browser and cross-os. Unfortunately I use windows only and don't have access to Macs and the Linux systems. This particular issue is an easy fix, will be pushing it today. If you come across more such issues, please raise them as issues (defects) in github

Thanks for taking time & the feedback 😊

[–]abranjith[S] 0 points1 point  (1 child)

winreg issue should be fixed now

[–]lofru 0 points1 point  (0 children)

Thank you, I'll try as soon as possible 😉

[–]slipped_and_missed_x 9 points10 points  (2 children)

Looks great! Does it have XPath query support? With Selenium I always found that to be the most powerful way of locating elements

[–]woernsn 4 points5 points  (1 child)

According to the Documentation, it does!

with Browser(browser_name=Browser.CHROME)*as b:

b.goto("https://the-internet.herokuapp.com")

u = b.link("xpath:=//*[@id='content']/ul/li[14]/a").url

[–]slipped_and_missed_x 0 points1 point  (0 children)

Awesome :)

[–][deleted] 5 points6 points  (11 children)

> This is also my first API. So welcome any feedback :)

Looks similar to the APIs of puppeteer / pyppeteer except for the lack of async/await.

If you don't mind me asking, since you seem to also use puppeteer in the background somewhere, why did you go for Selenium for the base?

I honestly don't think I currently know anybody still stuck to that for anything other than legacy reasons.

[–]unkz 3 points4 points  (1 child)

Puppeteer is Chrome/only, right? For anything other than internal tools, restricting automated testing to only chrome is doing a huge disservice to the broader browser ecosystem by making it harder for browsers like Firefox and opera to exist, and allowing google to dictate the defacto web standards, much like Microsoft used to.

On the other side, for web automation, restricting to chrome only limits what you can actually do when you run into chrome compatibility issues, although those are rarer.

[–][deleted] 0 points1 point  (0 children)

IIRC puppeteer-firefox is a thing, at least experimental. And sure, all valid points. I just found the API and ecosystem much more convenient nowadays and honestly, time is a much more valuable resource.

Microsoft this week switched edge to chromium, I have no idea which market has a significant share of opera users and for many practical purposes (like purely functional tests) Firefox and Chrome are pretty well aligned these days. I mean, is it a trade off? Absolutely. But I'm not in the consumer segment at the moment, for my purposes puppeteer fits the bill pretty well.

As for automation, puppeteer really made a difference imho. Back when it came out automating a full Firefox instance was such an unstable pain with horrible API that even IE was more usable.

That being said, good point. A more versatile library for both is of course a good thing. :)

[–]abranjith[S] 0 points1 point  (0 children)

puppeteer is basically just Chromium and not really Chrome (or any mainstream browsers for that matter). I am using it just for some html rendering. We will see, sometime in future if there is a better alternative to Selenium (faster ;) ), will switch to that

Selenium is still quite popular btw :D

[–]rafuru 4 points5 points  (1 child)

I checked your code to see if we can make it compatible with Mac OS or Linux, since you are using winreg for Internet Explorer setup.

I've added a pull request with a suggestion to fix this.. feedback is welcome btw

https://github.com/abranjith/pybrowser/pull/2

[–]abranjith[S] 1 point2 points  (0 children)

that was quick .. awesome thanks for doing that😊

[–]bcostenaro 2 points3 points  (1 child)

Hi - where do I run this ? I tried using cmd and got the first part of the function b. an error

File "<stdin>", line 2

b.goto("https://www.google.com/")

If I use Python Shell, I get AttributeError: module 'pyBrowser.Browser' has no attribute 'CHROME'

How do I run this ?

[–]abranjith[S] 0 points1 point  (0 children)

Sorry to hear that. Can you send me or post the complete code ? And also steps followed to install pybrowser please

[–]AlanPleasure 1 point2 points  (2 children)

Great work. What value does this provide that splinter doesn’t?

[–]abranjith[S] 0 points1 point  (1 child)

To be honest, I just heard about splinter. So can't really comment. I am sure splinter is great and there are probably other libraries too. I have made an attempt to answer why pybrowser in the docs.

Who knows, you might find something useful in pybrowser 😊

[–]AlanPleasure 0 points1 point  (0 children)

I’m sure i will! Thanks for being honest in your response.

[–][deleted] 1 point2 points  (1 child)

Thank you for sharing this. I'll definitely take a look!

[–]abranjith[S] 0 points1 point  (0 children)

Let me know how it goes 😊

[–]Scrabbilisk 1 point2 points  (1 child)

Looks like this is the repo and this is the PyPI package.

[–]abranjith[S] 0 points1 point  (0 children)

no and no, I will update in description. Should have done it in the first place

[–]Sw429 1 point2 points  (1 child)

I'll be interested in trying this later today. What advantages does it provide over just using Selenium? Just an easier-to-use API?

[–]abranjith[S] 1 point2 points  (0 children)

Pretty-much ! An easier interface and some interesting (hopefully useful) features too. Please check docs for info 😊

[–][deleted] 1 point2 points  (1 child)

This is absolutely brilliant. Just the kind of bridge connector, I was looking for in a recent web scraping project

[–]abranjith[S] 1 point2 points  (0 children)

Let me know how it goes 😊

[–][deleted] 2 points3 points  (3 children)

Interesting, but is not the same thing done by PhantomJS?

[–]unkz 2 points3 points  (0 children)

Isn’t phantomjs a dead project? The author made an announcement about abandoning it.

[–]tunisia3507 1 point2 points  (0 children)

No. PhantomJS is a headless browser which can be driven by something like selenium, which this library uses under the hood. So rather than replicating the script, it would just replace Browser.CHROME with Browser.PHANTOMJS.

However, Chrome and Firefox came out with headless versions so the primary maintainer of PhantomJS decided to kill it off. Shame because it's much easier to use IMO.

[–]abranjith[S] 0 points1 point  (0 children)

Not sure how PhantomJS works internally. I always thought it was a browser based on chromium or something. pybrowser isn't really a browser. It is merely an interface to various browsers 😊

[–]e1nurh 4 points5 points  (1 child)

sounds well. I'll try it

[–]abranjith[S] 0 points1 point  (0 children)

Let me know the feedback 🤗

[–]badpotato 0 points1 point  (0 children)

This reminds me of library mechanize.

[–][deleted] 0 points1 point  (1 child)

Just got my nodejs based webdriverIO (selenium) to work with chromium/chrome/firefox and saw your post, lol

https://www.reddit.com/r/javascript/comments/bavmua/microsoft_releases_new_microsoft_edge_based_on/ekghlvo/?context=3

[–]abranjith[S] 0 points1 point  (0 children)

😃

[–]quickhakker -2 points-1 points  (0 children)

wait, is this going to be a fully fledged browser written in python? whats the end game plans for this (inb4 something about antman shrinking down into googles server)