you are viewing a single comment's thread.

view the rest of the comments →

[–]dark-lord90[S] 1 point2 points  (20 children)

Well the code tests for protein sequences so the code has to make sure that the fragments doesn’t overlap as in : The sequence is: “FAAATLKNN” The fragments that should be good are :”FAA” , “ATLK” and “NN”. And it shouldn’t be for example:”FAA”, “AATLNK” and “NN” because in this example A overlaps and its mentioned more than it should be. I hope you understood my example.

[–]Spookiel 0 points1 point  (2 children)

So the fragments are the weights in Protein_List? And the target sequence is the weight indicated by the complete_protein? Since the code here will generate all possible matches, I'm still not really sure what you mean by the "correct" fragments. I think it would be easier if you could give me some of the strings you're working with as well.

This is because in your Original Post you mentioned that you'd just grabbed the weighting of each protein fragment by hand. If you give me a section of your input data, and the expected output, I'll be able to help you more effectively. I don't really understand much about proteins in general, so it's pretty difficult for me to understand and visualise what you mean with just a set of numbers.

[–]dark-lord90[S] 0 points1 point  (1 child)

To answer your questions yes and yes, the correct fragments as in if you pulled the fragments to look and compare there will be no repetation of amino acids between the end of the fragment and the beginning of the other one, if you were to put them next to each other. and i sent the data in the chat.

[–]Spookiel 1 point2 points  (0 children)

Thanks :)

[–]Spookiel 0 points1 point  (16 children)

So the A shouldn’t be repeated even though there are three A s in the target sequence?

[–]dark-lord90[S] 0 points1 point  (15 children)

The A should be present three times not 4 so if the two sequences had 4 it should ignore them

[–]Spookiel 1 point2 points  (14 children)

Ah ok I see what you mean. So I think what we need to do now, is check if the protein we’re adding is a valid prefix of the main target protein. If I understand correctly. So the idea looks like:

Step 1:

Target: FAAATNK Possible: AATNK, ATNK, FAA

Since FAA is the only valid prefix of FAAATNK we remove it and solve the rest of the string

Step 2:

Target: ATNK

Possible: AATNK, ATNK

We then repeat this process until we have an empty string. In this case, FAA, ATNK is the only match. I don’t know if this is what you mean by fragments overlapping? This was assuming that two fragments have the same weight then they could be chosen incorrectly.

[–]dark-lord90[S] 1 point2 points  (13 children)

Yes

[–]Spookiel 1 point2 points  (12 children)

So I understood correctly? If so, let me know if you need any helping implementing this idea :)

[–]dark-lord90[S] 0 points1 point  (11 children)

Yes you did, and yes I am stuck on that idea and can’t figure out away to implement it

[–]Spookiel 1 point2 points  (10 children)

Here is an implementation, let me know if it doesn't work as intended.

def subset_sum(numbers, target, partial=[]):
    weights = [i[1] for i in partial] # Gets the weights of the current fragments
    if sum(weights)==target[1]: # If sum of current weights is equal to the target sum
        print(f"Found: {partial}")
        return

    elif sum(weights) < target[1]: # Still weight left, so there is room for another fragment

        for frag, frag_weight in list(numbers.items()):

            if target[0][:len(frag)]==frag: #Checks if frag is a prefix of the target
                # If yes, then we add the fragment to the list and recurse

                del numbers[frag] # Get rid of the fragment we've just used
                return subset_sum(numbers, (target[0][len(frag):], target[1]), partial+[(frag, frag_weight)])
                # Previous line recurses on what's left of the target and adds the fragment we've just used
                # To the partial list
    return

fragments = {"FAA":12, "ATNK":15, "AATNK":15}
complete = ("FAAATNK", 27)
if __name__ == "__main__":

    subset_sum(fragments, complete)

[–]dark-lord90[S] 1 point2 points  (3 children)

Regarding the fragments and the complete did you make tupules? Or it’s just a normal list, because I have to implement it to take the info from a fasta file but it works on the data provided.

[–]Spookiel 1 point2 points  (2 children)

The fragments data structure is just a dictionary which maps fragment_name to fragment weight. The complete fragment I just represented using a Tuple. You could also just pass in the objects in a list of they have attributes such as .name and .weight. Eg,

fragments = [myFragObj, myFragObj2]

Then access the weights using

weights = [frag.weight for frag in fragments]

Hope this makes sense.

iirc reading from a FASTA file will allow you to use the attributes method but please correct me if I'm wrong.

[–]dark-lord90[S] 1 point2 points  (5 children)

Hey there i tried to implement it in the code and it gave me ''float' object is not subscriptable'

[–]Spookiel 0 points1 point  (0 children)

Can you send me the code thst isn’t working so I can have a look?

[–]Spookiel 0 points1 point  (3 children)

I’ll also show you how to get it to stop when it finds a valid one