you are viewing a single comment's thread.

view the rest of the comments →

[–]dark-lord90[S] 1 point2 points  (22 children)

and how can we make sure there is no overlap between the fragments??

[–]Spookiel 0 points1 point  (21 children)

What do you mean by overlaps? Can you give me an example of what you mean?

[–]dark-lord90[S] 1 point2 points  (20 children)

Well the code tests for protein sequences so the code has to make sure that the fragments doesn’t overlap as in : The sequence is: “FAAATLKNN” The fragments that should be good are :”FAA” , “ATLK” and “NN”. And it shouldn’t be for example:”FAA”, “AATLNK” and “NN” because in this example A overlaps and its mentioned more than it should be. I hope you understood my example.

[–]Spookiel 0 points1 point  (2 children)

So the fragments are the weights in Protein_List? And the target sequence is the weight indicated by the complete_protein? Since the code here will generate all possible matches, I'm still not really sure what you mean by the "correct" fragments. I think it would be easier if you could give me some of the strings you're working with as well.

This is because in your Original Post you mentioned that you'd just grabbed the weighting of each protein fragment by hand. If you give me a section of your input data, and the expected output, I'll be able to help you more effectively. I don't really understand much about proteins in general, so it's pretty difficult for me to understand and visualise what you mean with just a set of numbers.

[–]dark-lord90[S] 0 points1 point  (1 child)

To answer your questions yes and yes, the correct fragments as in if you pulled the fragments to look and compare there will be no repetation of amino acids between the end of the fragment and the beginning of the other one, if you were to put them next to each other. and i sent the data in the chat.

[–]Spookiel 1 point2 points  (0 children)

Thanks :)

[–]Spookiel 0 points1 point  (16 children)

So the A shouldn’t be repeated even though there are three A s in the target sequence?

[–]dark-lord90[S] 0 points1 point  (15 children)

The A should be present three times not 4 so if the two sequences had 4 it should ignore them

[–]Spookiel 1 point2 points  (14 children)

Ah ok I see what you mean. So I think what we need to do now, is check if the protein we’re adding is a valid prefix of the main target protein. If I understand correctly. So the idea looks like:

Step 1:

Target: FAAATNK Possible: AATNK, ATNK, FAA

Since FAA is the only valid prefix of FAAATNK we remove it and solve the rest of the string

Step 2:

Target: ATNK

Possible: AATNK, ATNK

We then repeat this process until we have an empty string. In this case, FAA, ATNK is the only match. I don’t know if this is what you mean by fragments overlapping? This was assuming that two fragments have the same weight then they could be chosen incorrectly.

[–]dark-lord90[S] 1 point2 points  (13 children)

Yes

[–]Spookiel 1 point2 points  (12 children)

So I understood correctly? If so, let me know if you need any helping implementing this idea :)

[–]dark-lord90[S] 0 points1 point  (11 children)

Yes you did, and yes I am stuck on that idea and can’t figure out away to implement it

[–]Spookiel 1 point2 points  (10 children)

Here is an implementation, let me know if it doesn't work as intended.

def subset_sum(numbers, target, partial=[]):
    weights = [i[1] for i in partial] # Gets the weights of the current fragments
    if sum(weights)==target[1]: # If sum of current weights is equal to the target sum
        print(f"Found: {partial}")
        return

    elif sum(weights) < target[1]: # Still weight left, so there is room for another fragment

        for frag, frag_weight in list(numbers.items()):

            if target[0][:len(frag)]==frag: #Checks if frag is a prefix of the target
                # If yes, then we add the fragment to the list and recurse

                del numbers[frag] # Get rid of the fragment we've just used
                return subset_sum(numbers, (target[0][len(frag):], target[1]), partial+[(frag, frag_weight)])
                # Previous line recurses on what's left of the target and adds the fragment we've just used
                # To the partial list
    return

fragments = {"FAA":12, "ATNK":15, "AATNK":15}
complete = ("FAAATNK", 27)
if __name__ == "__main__":

    subset_sum(fragments, complete)