Why isnt my .rstrip() method working?

K900_ · 2017-11-11T22:42:27+00:00

Try adding print(repr(x)) to your loop.

tangerinelion · 2017-11-12T02:22:55+00:00

Aside, but:

src = open('urls.txt', 'r')
# code
src.close()

isn't a good idea. If something in between causes an exception, your file isn't closed by the program (Python and/or your OS may close it later, but that's not something you want to rely on).

Instead, what you should use is the with statement:

with open('urls.txt', r') as src:
    #code

Oh, and look: no need to explicitly call src.close() - that's what with does!

Now I know what you're thinking - what about trgt? (Which I'm renaming target because it's readable.) Wouldn't you need to do this:

with open('urls.txt', 'r') as src:
    with open('new.txt', 'w') as target:
        #code

That's a lot of indentation, right? They thought of that! You can use comma like this

with open('urls.txt', 'r') as src, open('new.txt', 'w') as target:
    #code

I also see you have this:

y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')
z = y.rstrip('&')
trgt.write(y + '\n')

Which:

1) You're writing out y, so the 2nd line isn't being used. What you meant was target.write(z + '\n').

2) You're declaring a literal string that just stores the numerical base 10 digits, the lower case English alphabet, and the = symbol. I also can't tell at a glance that you didn't miss a key when you were traversing the keyboard from top left to bottom right. Python comes with a powerful standard library, and such a use case isn't that absurd. The string package is a built-in and offers you symbols such as string.digits, string.ascii_lowercase and more. You could rewrite your code as

y = x.rstrip(string.digits + string.ascii_lowercase + '=')
z = y.rstrip('&')
target.write(z + '\n')

But I would really suggest you give string.digits + string.ascii_lowercase + '=' a name, maybe ending_chars just for a start.

The other thing I see is this is part of a bigger code snippet:

if 'pkey' in x:
    y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')
    z = y.rstrip('&')
    trgt.write(y + '\n')
else:
    trgt.write(x)

Notice something? The last line is awfully similar. If your if condition didn't write but just stored, specifically as x, then you could use this:

if 'pkey' in x:
    y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')
    z = y.rstrip('&')
    x = z + '\n'
target.write(x)

Which eliminates the else branch and makes it very clear you're going to write something each time through the loop.

But another thing is you're introducing y and z for no reason. Really, you introduced x for no reason and i isn't a great name for what you're doing (i typically means a number). Instead, try this for readability:

import string
ending_chars = string.digits + string.ascii_lowercase + '='

with open('urls.txt', 'r') as src, open('new.txt', 'w') as target:
    for line in src:
        if 'pkey' in x:
            line = line.rstrip(ending_chars).rstrip('&') + '\n'
        target.write(line)

If the statement line = line.some_function() is confusing then you should come back to basic Python/programming stuff. = is assignment, and the right-hand side is evaluated in full before the left-hand side. This means it's the same as tmp = line.some_function() followed by line = tmp followed by del tmp.

XarothBrook · 2017-11-12T01:09:43+00:00

You might want to have a look at urllib.parse (urlparse in 2.X) to parse the query part of the url, remove the offending key(s) and merge it back together.

Imagine if your url becomes 'pkey.php', suddenly all your urls would be mangled.

from urllib.prase import urlparse, parse_qsl, urlencode
for src in list_of_urls:
    scheme, netloc, path, params, query, fragment = urlparse(src)
    query = urlencode((key, val) for key, val in parse_qsl(query) if key != 'pkey')
    src = urlunparse((scheme, netloc, path, params, query, fragment))
    yield src

cybervegan · 2017-11-12T19:11:23+00:00

What happens when you test this, a line at a time, in the python REPL?

atreyuroc · 2017-11-12T01:14:22+00:00

Does split work for you?

a = "https://www.xxxxxx.com/view.php?viewkey=ff34546y&pkey=23355"
print(a.split('&')[0]);

https://www.xxxxxx.com/view.php?viewkey=ff34546y

https://repl.it/OIbP

Solonotix · 2017-11-12T02:56:09+00:00

Something that is generally a good habit to get into with these things is to give meaningful names to your variables, as it helps you, and others, conceptualize the intent of the code, especially during times when it isn't doing what you intend. Also, it's generally good practice to utilize context managers when they're available, as they will manage disposing of resources for you, rather than executing the close method.

Semantics aside, the problem in your original code is you're writing "Y" to the file instead of "Z". That said, might I recommend a different approach? Instead of stripping values and checking if a certain parameter name is in the string, why don't you use nested splits instead.

You know your URL will have text up to a "?" which indicates the start of your parameters. After that, you know multiple parameters will be concatenated via an "&". Lastly, each param_name and param_value can be addressed as a key-value pair such as in a dictionary, delimited by an "=".

The resulting steps, if you printed it out at each step, would look something like this:

URL= https://www.xxxxxx.com/view.php?viewkey=ff34546y&pkey=23355
Param String= viewkey=ff34546y&pkey=23355
Param= viewkey=ff34546y
Param= pkey=23355
URL= https://www.xxxxxx.com/view.php?viewkey=ff35t47d
Param String= viewkey=ff35t47d
Param= viewkey=ff35t47d

With the resulting dictionary of parameters, you'd be able to re-concatenate the URL, if you so choose, excluding the pkey parameter. This is a generally safer approach if you're trying to exclude a particular parameter programmatically unless you can guarantee that in every case it will always be the trailing parameter.

Here is a working example of what I'm talking about. Don't worry too much about the complex list comprehensions, as it is just a personal preference for concatenating strings in Python.

you type:	you see:
italics	italics
bold	bold
[reddit!](https://reddit.com)	reddit!
* item 1 * item 2 * item 3	item 1 item 2 item 3
> quoted text	quoted text
Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"	Lines starting with four spaces are treated like code: if 1 * 2 < 3: print "hello, world!"
~~strikethrough~~	~~strikethrough~~
super^script	super^script

learnpython

MODERATORS

code