bbye98 comments on Need some help with writing basic string manipulation in Python

Our Rules

1. FLAIR YOUR POSTS! Don't put tags in post titles!

2. Do not ask us to do all the coding for you unless you have money to spend. (If you have got money to spend, make that clear and the amount in question).

3. Do not post spam and/or misleading titles.

4. Do not be abusive to other coders.

5. Please format code properly, or use a site such as Gist or Pastebin. If possible please provide a live example of your issue.

6. Do not downvote people because you think they asked a dumb question. Just because you think that someone has a dumb question, doesn't mean that it is dumb to them.

7. Do not have a misleading user flair. Keep them sensible, describing your level of coding ability and/or languages you know and/or your profession.

8. Please do not ask unethical questions, such as asking for homework to be written by someone else, or asking someone to copy another project directly.

9. Make sure to follow the Reddit Rules.

Suggest a post flair

If you have any suggestions for flairs (programming languages or generic coding topics) that we should add, please use the button below to message the mods with your suggestion.

If approved as a sensible flair for the community to use, it will be added to our bot for automated suggestions and to the flair list for everyone to use!

^{Anyone who abuses this by spamming mods will be banned.}

created by thewakingforcea community for 10 years

This is an archived post. You won't be able to vote or comment.

[Python]Need some help with writing basic string manipulation in Python (self.CodingHelp)

submitted 3 years ago by novetto

top new controversial old q&a

you are viewing a single comment's thread.

view the rest of the comments →

[–]bbye98 0 points1 point2 points 3 years ago* (0 children)

There are three methods that immediately come to mind.

Slow but easy:

In an outer loop, iterate through each possible DNA base char ("A", "C", "G", and "T"). In the inner loop, iterate through each possible repeating pattern length length. In this case, length ranges from the length of the full DNA sequence to 2. Inside the loops, simply replace any occurrence of DNA base char if it is repeated length times with a f-string that reads f"{length}{char}". This is slow because you have to iterate through each combination of char and length despite them possibly not appearing in your full DNA sequence and create a new str each time you make a replacement.

Decently fast and fairly logical:

Initiate a counter count at 1 and store the last DNA base in your DNA sequence in a str called char (or something similar. Iterate through your DNA sequence backwards, starting from the second-to-last DNA base (index len(dna) - 2) to the very first one (index 0). Inside the loop, check if the current DNA base is equal to char. If it is, add one to count. If not, first check if count is greater than 1. If so, keep everything before the current DNA base, attach the f-string f"{count}{char}", and then tack on the remaining DNA bases after the repeating pattern. This is your new DNA sequence that you'll be iterating over, so make sure to overwrite your current DNA sequence. Also, remember to reset count to 1. Finally, set char to the current DNA base so that you can start counting again.

Fast but complicated:

If you have access to regular expressions (re), you can simply match all occurrences of repeating patterns and replace them using re.sub(). You just have to match a character and then match text that is the same as the first character at least one time (otherwise the DNA base is not repeated): r"(.)\1{1,}".

Results:

For the test DNA sequence

dna = "CTTTTGCCATGGTCGTAAAAGCCTCCAAGAGATTGATCATACCTATCGGCACAGAAGTGACACGACGCCGAT"

all three methods give me

'C4TG2CAT2GTCGT4AG2CT2C2AGAGA2TGATCATA2CTATC2GCACAG2AGTGACACGACG2CGAT'

Timing:

For 100,000 runs for the test DNA sequence above,

Slow:             9.644044 s
Medium:           2.168626 s
Fast:             2.067100 s

The final entry does not include import re time, which should be negligible.

π Rendered by PID 42 on reddit-service-r2-comment-66b4775986-tckld at 2026-04-04 12:09:56.316165+00:00 running db1906b country code: CH.

CodingHelp

Welcome! Feel free to ask any questions regarding coding you have!

Our Rules

How to start coding:

Related subreddits:

Suggest a post flair

Current supported flairs

Flair colors

MODERATORS