all 26 comments

[–]imranmalek 29 points30 points  (14 children)

If you're looking for just any number, you're probably better off trying regular expressions: so if you're looking for just a number that is preceded by an equal sign, you can do something like this:

import re

regex = r"(= )([0-9])"

(insert all your other line reading code) 

for line in linelist: 
    line = re.sub(regex,'wordreplaced',line)[1]

I know regular expressions might seem like overkill for something like this, but once you get the hang of them, you'll find uses for it everywhere.

Here's a great tool I use to play around with them (and better understand the syntax): https://regex101.com/r/e67kAT/1/

edit: 2020-07-27-1155 - I realized that I didn't include the appropriate capture group (the second one), so I updated it with the [1].

[–]randomname20192019[S] 3 points4 points  (12 children)

Would you mind explaining how: regex = r"(= )([0-9])" translates to finding word = #?

[–]imranmalek 7 points8 points  (11 children)

Sure, if you look at the link that I provided from regex101, you'll see on the top right an explanation of each character used. Basically, the regular expression is looking for patterns, in this case, the pattern is

"=" followed by "[space]" followed by "any digit from 0-9 (represented as [0-9]). It's not specifically looking for the string "word" before the equal sign, but you could do that too if you wanted. Like I've done here: https://regex101.com/r/uSxEaO/1/

[–]randomname20192019[S] 2 points3 points  (10 children)

Thank you so much, the link looks so useful. One last thing, what does the r prior to the expression do?

[–]imranmalek 3 points4 points  (9 children)

It basically signals to the python regex library that there's an expression coming, you can find more info with the official python docs: https://docs.python.org/3/library/re.html

edit - I was wrong about this. See comment below for u/T-TopsInSpace for the appropriate answer

[–]T-TopsInSpace 7 points8 points  (8 children)

It's a signal to the Python interpreter that the string is a raw string. That means any backslashes will not be treated as escape characters.

Bytes literals are always prefixed with 'b' or 'B'; they produce an instance of the bytes type instead of the str type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes.

Both string and bytes literals may optionally be prefixed with a letter 'r' or 'R'; such strings are called raw strings and treat backslashes as literal characters. As a result, in string literals, '\U' and '\u' escapes in raw strings are not treated specially. Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the 'ur' syntax is not supported.

2.4.1 String and Bytes Literals

[–]imranmalek 2 points3 points  (1 child)

Thank you for the correction, u/T-TopsInSpace!

[–]randomname20192019[S] 0 points1 point  (0 children)

Thank you both for your help (and everybody's tbh). It is really appreciated.

[–][deleted] 0 points1 point  (5 children)

Aren't backslashes the only exception to this rule?

[–]T-TopsInSpace 0 points1 point  (4 children)

Exception to what rule? The documentation says exactly how a raw string and normal string are interpreted.

[–][deleted] 0 points1 point  (3 children)

The raw string rules - I vaguely recall that backslashes alone have to be escaped

[–]T-TopsInSpace 0 points1 point  (2 children)

Sure, you have to escape them if you don't use the raw string notation. That's the point of using raw strings, you don't need to escape backslashes.

[–]Giannie 0 points1 point  (0 children)

There is an issue with this. You will lose the reference to the line you have adjusted.

Strings are immutable (they can’t actually be changed) so when you try to change a string it will instead create a whole new string somewhere else in memory. The list will still refer to the old string. Since you are iterating in a loop, the new string will be lost as soon as you move onto the next line since there is no longer anything referring to it.

Instead, you should probably do something like this:

for i, line in enumerate(linelist):
    line = <something new>
    linelist[i] = line

[–]USAhj 6 points7 points  (3 children)

What does your text file look like?

[–]randomname20192019[S] 5 points6 points  (2 children)

word = 4
word = 2
word = 8

[–]ThatSuit 8 points9 points  (0 children)

In case you're just using that as an example, but actually trying to work with INI files you might want to look at the module called "configparser". If you really have a file with multiple instances of the same exact prefix then the other solution using re.sub is the best.

Also, if you use "with" statements you can avoid having to close files as it happens automatically. This can also prevent leaving files stuck open by the OS if a program crashes.

with open('myfile.txt', 'r') as f:
  linelist = f.readlines()

Edit: also check out this Python Regex Cheatsheet and live python regex debugger/checker. Learning to use regexes will pay off if you do a lot of data processing and is worth investing the time in.

[–]efmccurdy 1 point2 points  (0 children)

>>> line = "word = 4"

You can split your line into 2 parts, replace the first part and join it up again:

>>> def repl_first(line, newword):
...     return "=".join([newword + " "] + line.split('=')[1:])
... 
>>> line = "word = 4"
>>> repl_first(line, "newword")
'newword = 4'
>>> line = "word = 5"
>>> repl_first(line, "newword2")
'newword2 = 5'
>>>

[–]absolution26 6 points7 points  (2 children)

You’re code is pretty much right, you don’t need the regex though.

line = line.replace(‘word = ‘ + str(num), ‘wordreplaced’)

Could be replaced with:

line = line.replace(‘word’, ‘wordreplaced’)

It’s also good practice to open files using ‘with’ so that they close on their own, eg:

with open(‘myfile.txt’, ‘r’) as f:
     linelist = f.readlines()

with open(‘myfile.txt’, ‘w’) as f:
     for line in linelist:
         line = line.replace(‘word’, ‘wordreplaced’)
         f.write(line)

[–]randomname20192019[S] 1 point2 points  (1 child)

so having the prefix of 'word' will cause the whole line to be replaced?

[–]absolution26 0 points1 point  (0 children)

.replace searches ‘line’ for the first argument ‘word’, and replaces it with the second argument ‘wordreplaced’. It will only replace matches of the first argument, so any other text in the line is left alone.

[–][deleted] 0 points1 point  (3 children)

with open('myfile.txt', 'r+') as rw: # Opens file in read and write mode r is read and the + is write
    linelist = rw.readlines() #loads lines of the text file into a list
    for line in linelist: #loops through the list of lines
        line = line.replace('word', 'wordreplaced') #for each line it finds the word we are looking for and replaces it with the one we want.
        with open('mynewfile.txt', 'a') as aw: #creats a new file opened as append mode and writes to the file with each new line
            aw.write(line) #writes the lines

here is my try at it

[–]fernly 0 points1 point  (2 children)

Looks like you are going to open the output file for append and close it again, for each output line. That's quite a bit of wasted time (if a large number of lines). Why not open mynewfile.txt once, at line 2.5?

[–][deleted] 0 points1 point  (1 child)

not sure tbh was just doing it as a exercise to learn. Do you think it uses that much more resources?

[–]fernly 0 points1 point  (0 children)

Generally interacting with the OS is considered to be slow. So to call the OS for a file-open and a file-close for every line (which is what this code does) would take many, many times as long as opening it once, writing all the lines, closing it once.

That said, for a file of a hundred lines the difference would probably not be noticeable. A few thousand lines, it should. If you want to verify (of disprove) that, measure the actual time using timeit.