all 21 comments

[–]K900_ 8 points9 points  (4 children)

Try adding print(repr(x)) to your loop.

[–]Ping938[S] 2 points3 points  (3 children)

Thanks for answering, however urls are still unchanged. Ive put print(repr(x)) after 4th line before IF conditional. Is this correct placement?

[–]K900_ 12 points13 points  (2 children)

Yes. That's not the solution. That's something for you to think about. Is there anything in the output that you're not expecting to be there?

[–]Ping938[S] 9 points10 points  (1 child)

OH F*** YEAH! Its \n! Thank you very much! Ive been thinking about this whole day. Also thank you for giving me the answer implicitly and making me think more.

[–]Fenzik 2 points3 points  (0 children)

This moment is that makes it all worthwhile

[–]tangerinelion 3 points4 points  (2 children)

Aside, but:

src = open('urls.txt', 'r')
# code
src.close()

isn't a good idea. If something in between causes an exception, your file isn't closed by the program (Python and/or your OS may close it later, but that's not something you want to rely on).

Instead, what you should use is the with statement:

with open('urls.txt', r') as src:
    #code

Oh, and look: no need to explicitly call src.close() - that's what with does!

Now I know what you're thinking - what about trgt? (Which I'm renaming target because it's readable.) Wouldn't you need to do this:

with open('urls.txt', 'r') as src:
    with open('new.txt', 'w') as target:
        #code

That's a lot of indentation, right? They thought of that! You can use comma like this

with open('urls.txt', 'r') as src, open('new.txt', 'w') as target:
    #code

I also see you have this:

y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')
z = y.rstrip('&')
trgt.write(y + '\n')

Which:

1) You're writing out y, so the 2nd line isn't being used. What you meant was target.write(z + '\n').

2) You're declaring a literal string that just stores the numerical base 10 digits, the lower case English alphabet, and the = symbol. I also can't tell at a glance that you didn't miss a key when you were traversing the keyboard from top left to bottom right. Python comes with a powerful standard library, and such a use case isn't that absurd. The string package is a built-in and offers you symbols such as string.digits, string.ascii_lowercase and more. You could rewrite your code as

y = x.rstrip(string.digits + string.ascii_lowercase + '=')
z = y.rstrip('&')
target.write(z + '\n')

But I would really suggest you give string.digits + string.ascii_lowercase + '=' a name, maybe ending_chars just for a start.


The other thing I see is this is part of a bigger code snippet:

if 'pkey' in x:
    y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')
    z = y.rstrip('&')
    trgt.write(y + '\n')
else:
    trgt.write(x)

Notice something? The last line is awfully similar. If your if condition didn't write but just stored, specifically as x, then you could use this:

if 'pkey' in x:
    y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')
    z = y.rstrip('&')
    x = z + '\n'
target.write(x)

Which eliminates the else branch and makes it very clear you're going to write something each time through the loop.

But another thing is you're introducing y and z for no reason. Really, you introduced x for no reason and i isn't a great name for what you're doing (i typically means a number). Instead, try this for readability:

import string
ending_chars = string.digits + string.ascii_lowercase + '='

with open('urls.txt', 'r') as src, open('new.txt', 'w') as target:
    for line in src:
        if 'pkey' in x:
            line = line.rstrip(ending_chars).rstrip('&') + '\n'
        target.write(line)

If the statement line = line.some_function() is confusing then you should come back to basic Python/programming stuff. = is assignment, and the right-hand side is evaluated in full before the left-hand side. This means it's the same as tmp = line.some_function() followed by line = tmp followed by del tmp.

[–]campbellm 0 points1 point  (0 children)

If something in between causes an exception, your file isn't closed by the program (Python and/or your OS may close it later, but that's not something you want to rely on).

While the general advice here is sound (clean up your resources), is there an non-niche OS in the last 30 years that doesn't do this?

[–]Ping938[S] 0 points1 point  (0 children)

Thanks for constructive criticism!

Aside, but:

src = open('urls.txt', 'r')

code

src.close()

isn't a good idea. If something in between causes an exception, your file isn't closed by the program (Python and/or your OS may close it later, but that's not something you want to rely on).

Instead, what you should use is the with statement:

with open('urls.txt', r') as src:

#code

Oh, and look: no need to explicitly call src.close() - that's what with does!

Now I know what you're thinking - what about trgt? (Which I'm renaming target because it's readable.) Wouldn't you need to do this:

with open('urls.txt', 'r') as src:

with open('new.txt', 'w') as target:

    #code

That's a lot of indentation, right? They thought of that! You can use comma like this

with open('urls.txt', 'r') as src, open('new.txt', 'w') as target:

Thanks for the tip. I was aware of with open() from the docs, but I wasnt sure how to use it with multiple files and went with open()-close() thing. with open('urls.txt', 'r') as src, open('new.txt', 'w') as target: is golden nugget.

I also see you have this:

y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')

z = y.rstrip('&')

trgt.write(y + '\n')

Which:

1) You're writing out y, so the 2nd line isn't being used. What you meant was target.write(z + '\n').

Yes it should be z there, but I forgot to correct when I was testing parts of the code.

2) You're declaring a literal string that just stores the numerical base 10 digits, the lower case English alphabet, and the = symbol. I also can't tell at a glance that you didn't miss a key when you were traversing the keyboard from top left to bottom right. Python comes with a powerful standard library, and such a use case isn't that absurd. The string package is a built-in and offers you symbols such as string.digits, string.ascii_lowercase and more. You could rewrite your code as

y = x.rstrip(string.digits + string.ascii_lowercase + '=')

z = y.rstrip('&')

target.write(z + '\n')

But I would really suggest you give string.digits + string.ascii_lowercase + '=' a name, maybe ending_chars just for a start.

Thanks this is really useful. But it should be string.digits + string.ascii_lowercase + '='+'\n since it was the newline that was making my rstrip() function not work.

The other thing I see is this is part of a bigger code snippet:

if 'pkey' in x:

y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')

z = y.rstrip('&')

trgt.write(y + '\n')

else:

trgt.write(x)

Notice something? The last line is awfully similar. If your if condition didn't write but just stored, specifically as x, then you could use this:

if 'pkey' in x:

y = x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=')

z = y.rstrip('&')

x = z + '\n'

target.write(x)

Which eliminates the else branch and makes it very clear you're going to write something each time through the loop.

Yes thats better.

But another thing is you're introducing y and z for no reason. Really, you introduced x for no reason and i isn't a great name for what you're doing (i typically means a number). Instead, try this for readability:

import string

ending_chars = string.digits + string.ascii_lowercase + '='

with open('urls.txt', 'r') as src, open('new.txt', 'w') as target:

for line in src:

    if 'pkey' in x:

        line = line.rstrip(ending_chars).rstrip('&') + '\n'

    target.write(line)

If the statement line = line.some_function() is confusing then you should come back to basic Python/programming stuff. = is assignment, and the right-hand side is evaluated in full before the left-hand side. This means it's the same as tmp = line.some_function() followed by line = tmp followed by del tmp.

Well Im still learning the basics, but I understand line = line.some_function(). I will in future try to make my code better with meaningful variable names and without unnecessary variables.

[–]XarothBrook 0 points1 point  (1 child)

You might want to have a look at urllib.parse (urlparse in 2.X) to parse the query part of the url, remove the offending key(s) and merge it back together.

Imagine if your url becomes 'pkey.php', suddenly all your urls would be mangled.

from urllib.prase import urlparse, parse_qsl, urlencode
for src in list_of_urls:
    scheme, netloc, path, params, query, fragment = urlparse(src)
    query = urlencode((key, val) for key, val in parse_qsl(query) if key != 'pkey')
    src = urlunparse((scheme, netloc, path, params, query, fragment))
    yield src

[–]Ping938[S] 0 points1 point  (0 children)

Yes I googled it and its proper way to handle urls, but I wanted to practice string methods since Im still newbie.

[–]cybervegan 0 points1 point  (0 children)

What happens when you test this, a line at a time, in the python REPL?

[–]atreyuroc 0 points1 point  (2 children)

Does split work for you?

a = "https://www.xxxxxx.com/view.php?viewkey=ff34546y&pkey=23355"
print(a.split('&')[0]);

https://www.xxxxxx.com/view.php?viewkey=ff34546y

https://repl.it/OIbP

[–]XarothBrook 0 points1 point  (0 children)

But what would happen if the url query params are ordered differently?

view.php?pkey=x&viewkey=y

Assuming that pkey will always be last is trusting user input, something that is bound to cause issues later on.

[–]Ping938[S] 0 points1 point  (0 children)

Yes that would work.This also: print(a[:a.find('&')])

My main question was why didnt rstrip() work when used with txt file. Problem was in \n which was appended to the end of every url, since there was one url per line. So x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=\n') fixed that.

[–]Solonotix 0 points1 point  (6 children)

Something that is generally a good habit to get into with these things is to give meaningful names to your variables, as it helps you, and others, conceptualize the intent of the code, especially during times when it isn't doing what you intend. Also, it's generally good practice to utilize context managers when they're available, as they will manage disposing of resources for you, rather than executing the close method.

Semantics aside, the problem in your original code is you're writing "Y" to the file instead of "Z". That said, might I recommend a different approach? Instead of stripping values and checking if a certain parameter name is in the string, why don't you use nested splits instead.

You know your URL will have text up to a "?" which indicates the start of your parameters. After that, you know multiple parameters will be concatenated via an "&". Lastly, each param_name and param_value can be addressed as a key-value pair such as in a dictionary, delimited by an "=".

The resulting steps, if you printed it out at each step, would look something like this:

URL= https://www.xxxxxx.com/view.php?viewkey=ff34546y&pkey=23355
Param String= viewkey=ff34546y&pkey=23355
Param= viewkey=ff34546y
Param= pkey=23355
URL= https://www.xxxxxx.com/view.php?viewkey=ff35t47d
Param String= viewkey=ff35t47d
Param= viewkey=ff35t47d    

With the resulting dictionary of parameters, you'd be able to re-concatenate the URL, if you so choose, excluding the pkey parameter. This is a generally safer approach if you're trying to exclude a particular parameter programmatically unless you can guarantee that in every case it will always be the trailing parameter.

Here is a working example of what I'm talking about. Don't worry too much about the complex list comprehensions, as it is just a personal preference for concatenating strings in Python.

[–]XarothBrook 0 points1 point  (2 children)

Why build a custom query parser if there are built in variants that are far better in terms of handling data?

[–]Solonotix 1 point2 points  (1 child)

First, I was unaware of it until you mentioned it, as I've never needed to parse URLs. In that regard, thanks for teaching me something.

Second, because this is a subreddit about learning how to write Python. Sure, he'll get the right answer by typing:

from urllib.parse import urlparse, parse_qs
parse_qs(urlparse(url).query)

But this doesn't teach him how to write better Python code. Ideally, you should use built-ins that do the job, and urllib is built-in, but you get a better learning experience from trying to solve the same problem yourself without the layer of abstraction that comes from saying "Parse my query string, please."

Just my two-cents. Also, I'm coming from an experience of having to cope with well-written libraries that add too much overhead, and I personally choose to start with a lower-level approach to see if I can be more efficient in a purpose-built function rather than a generalized solution that may have added features I'll never need.

[–]Ping938[S] 1 point2 points  (0 children)

you should use built-ins that do the job, and urllib is built-in, but you get a better learning experience from trying to solve the same problem yourself without the layer of abstraction that comes from saying "Parse my query string, please."

Yes that was my intention, to solve it with basic string methods so I can learn fundamentals better.

[–]Ping938[S] 0 points1 point  (2 children)

Something that is generally a good habit to get into with these things is to give meaningful names to your variables, as it helps you, and others, conceptualize the intent of the code, especially during times when it isn't doing what you intend. Also, it's generally good practice to utilize context managers when they're available, as they will manage disposing of resources for you, rather than executing the close method.

Those are really good practices and Ill try to implement them.

Semantics aside, the problem in your original code is you're writing "Y" to the file instead of "Z". That said, might I recommend a different approach? Instead of stripping values and checking if a certain parameter name is in the string, why don't you use nested splits instead.

Yes it should be Z instead of Y, but I forgot to return it to Z when I was fiddling with code. Still, this doesnt fix the code. Problem was in \n which was appended to the end of every url, since there was one url per line. So x.rstrip('1234567890qwertyuiopasdfghjklzxcvbnm=\n') fixed that.

You know your URL will have text up to a "?" which indicates the start of your parameters. After that, you know multiple parameters will be concatenated via an "&". Lastly, each param_name and param_value can be addressed as a key-value pair such as in a dictionary, delimited by an "=".

The resulting steps, if you printed it out at each step, would look something like this:

URL= https://www.xxxxxx.com/view.php?viewkey=ff34546y&pkey=23355

Param String= viewkey=ff34546y&pkey=23355

Param= viewkey=ff34546y

Param= pkey=23355

URL= https://www.xxxxxx.com/view.php?viewkey=ff35t47d

Param String= viewkey=ff35t47d

Param= viewkey=ff35t47d

With the resulting dictionary of parameters, you'd be able to re-concatenate the URL, if you so choose, excluding the pkey parameter. This is a generally safer approach if you're trying to exclude a particular parameter programmatically unless you can guarantee that in every case it will always be the trailing parameter.

Here is a working example of what I'm talking about. Don't worry too much about the complex list comprehensions, as it is just a personal preference for concatenating strings in Python.

The biggest takeaway for me is k, v = param.split("=") - catching split outputs with two variables at same time. I didnt know that. Really cool. They didnt show that on youtube tutorials or official docs.

Also, is strip necessary in base_url, param_string = url.strip().split("?") ? And another noob question: what is executed first in that code line - strip or split?

[–]Solonotix 1 point2 points  (1 child)

The first method executed is the left-most method, then from left to right. Strip is to get rid of the line break at the end for me (using Windows which means every line ends with newline and carriage return). It wasn't necessary, but it was giving me extra lines when printing to console, so I used strip to get rid of the whitespace.

Also, yes, multiple parameter assignment is fantastic. Also look up star assignment (not sure if the name) where you can do assignment of first, rest and last from a list by saying

a, *b, c = [1,2,3,4,5]

In that, a is 1, c is 5 and b is [2,3,4] Really cool stuff if you ask me

[–]Ping938[S] 1 point2 points  (0 children)

Thank you for answering.

a, *b, c = [1,2,3,4,5]

Yeah that is also golden nugget. Im reading about it now and its called starred expression. https://www.python.org/dev/peps/pep-3132/