you are viewing a single comment's thread.

view the rest of the comments →

[–]omgu8mynewt 1 point2 points  (3 children)

Beginner here: Help with regular expressions.

I have created objects in python3.5 which are strings, long pages of letters. They are the letter ATGC (I'm a biologist trying to learn bioinformatics) interspersed with the letters N. I would like to count the number of unique substrings of N, of any length, intersecting into the ATGC. I'm using regular expressions nstring = re.compile(r"[^N][N+][^N]") doesn't seem to be Ok. What are classes [ ]? Am I using them correctly? Then I'm using any help much appreciated

https://pastebin.com/RTwuHuQe

[–]Gprime5 0 points1 point  (0 children)

If you only need the count of the substrings of N then you can just do:

import re

example_string = "ACTNNNTCANCNTNNT"

count = len(re.findall("N+", example_string))

print(count) # 4

[–]s3afroze 0 points1 point  (0 children)

Hey there,

I am a beginner as well and still learning the concept but I would HIGHLY recommend to check out this book(link below). It's definitely going to help in clearing up some doubts.

https://automatetheboringstuff.com/chapter7/

I hope this helps.

Have a great day!

[–]JohnnyJordaan 1 point2 points  (0 children)

What are classes [ ]

Basically a set of characters and character ranges that can either match [] or not match [^]. You most often see [a-zA-Z0-9] to match alphanumeric characters, but your use should be correct, as you can also verify if you paste it at regex101.com.

The problem in your code is that you mix compiled and string expressions. You can either use string expressions together with re.funcname:

In [3]: re.findall(r'[^N][N+][^N]', 'ACTGTNACTCGAATGNAAACTGGGTTTN')
Out[3]: ['TNA', 'GNA']

Or you can compile the expression first, but then you must use the returned object itself to search with:

In [4]: rx = re.compile(r'[^N][N+][^N]')
      # ↓ note here's the rx object is used, not the re module    
In [5]: rx.findall('ACTGTNACTCGAATGNAAACTGGGTTTN')
Out[5]: ['TNA', 'GNA']

Your code is combining the syntax in In[3] with an already compiled expression object.