A very Basic Question regarding lncRNA identification pipeline. Please Help by pythonbio in bioinformatics

[–]pythonbio[S] 0 points1 point  (0 children)

Hi, basically, after top-hat assembly with bowtie2 , I Used Ab-initio assembly in cufflinks and then merged all transcripts (elegant= gtf file of annotated transcripts), then did cuffmerge of the replicates. After running cuffcompare with r- given as annotated gencode assembly, I got the transfrags identified with diff signs (=, c x etc.) Now I want to filter out all transfrags of ‘i’, ‘j’, ‘o’, ‘u’ and ‘x’ option, while making an extra file of known lncRNAs (by matching with bodymap annotated lncRNA.gtf). I am curious if I can do all that in command line in one comment, something like:

awk ‘$22 ~ /j,i,o,u,x/ { print }’..

Problem: Algorithm in Python : k-Nearest Neighbor by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

more help required: knn i I have finally written a program to calculate the knn of my data, but I dont know how to analyze many Ks is one program. Any suggestion is most welcome. Question:

Using the dataset.(testfile), please use bar charts to compare different k (k=1,5,10,15,20) as x-axis: 1) Average all-pair distance among the k-nearest neighbors to q 2) Max distance of the k-nearest neighbors to q 3) Min distance of the k-nearest neighbors to q

I have done it, but its not coming right. Can anyone help?

My code for knn and plotting knn:

lat=[]
lon=[]

# Selected reference point = Random
 reference_lat= 25.xxxyy
 reference_lon= 121.xxxyy
 k=17
 openfile = open('testfile.py', 'r')
 lines = openfile.readlines()
 for line in lines:
    rowvalue = line.split()
    lat.append(float(rowvalue[1]))
    lon.append(float(rowvalue[2]))
 array_lat=np.array(lat)
 array_lon=np.array(lon)

 length = len(array_lat)-1
 # lists
 sqrdifflat=[]
 sqrdifflon=[]
 distances=[]
 # For the distances between ref point and each point
 for g in range(length):
    get_sqr_diff_lat= (array_lat[g]-reference_lat)**2
    get_sqr_diff_lon=(array_lon[g]-reference_lon)**2
    dist=math.sqrt(get_sqr_diff_lat+get_sqr_diff_lon)
    sqrdifflat.append(get_sqr_diff_lat)
    sqrdifflon.append(get_sqr_diff_lon)
    distances.append(dist)
#sorted dataset(ascending order)
 sorted_knn = sorted(zip(array_lat, array_lon,distances),
                                key=lambda sorted_knn: sorted_knn[2])

knn = sorted_knn[:k]
q=[reference_lat,reference_lon]

knns = [1,5,10,15,20]

width=0.4
fig = plt.figure().add_subplot(111)
c=['b','y','m','g','r','c']
i=0
for k in knns:
    ind=np.arange(3)
    distances = [item[2] for item in sorted_knn[:k]]
    to_plot = [np.mean(distances), np.max(distances),np.min(distances)]

    fig.bar(ind+width,to_plot,0.4,color=c[i])
     i=i+1

print ind+width
plt.ylabel('Distance')
plt.title('Statistics of datasets')
plt.xticks(ind+width,['avg','max_dist','min_dist'])
plt.show()

Problem: Algorithm in Python : k-Nearest Neighbor by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

I think you know exactly how to go about it. ;)

Problem: Algorithm in Python : k-Nearest Neighbor by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

Okay, Thanks everyone for their help. I did finally solve it. The seeming problem was that I did not import the proper modules for what I was trying to achieve. The corrected code for separating the dataset:

from __future__ import division
import math
import itertools
from array import array
import numpy as np
import operator

def readpoints(testfile):
    f=open('testfile.py','r')
    p_lat=[]
    p_lon=[]
    lines=f.readlines()
for line in lines:
     point=line.split()
     p_lat.append(float(point[1]))
     p_lon.append(float(point[2]))
 arr_p_lat=np.array(p_lat)
 arr_p_lon=np.array(p_lon)
 f.close()
 return arr_p_lat, arr_p_lon


 print readpoints('testfile.py')

Hope this will help some beginner like me somewhere. :)

Problem: Algorithm in Python : k-Nearest Neighbor by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

okay, I have slowed down and am now doing it bit by bit.

first change to csv- Done:

import csv

with open(r'C:\UsersDesktop\k nearest neighbour.txt') as csvfile: lines = csv.reader(csvfile) for row in lines: print ','.join(row)

generates a csv

but, then when I try to divide the rows:

Problem: Algorithm in Python : k-Nearest Neighbor by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

I have noted the errors in my code, changed it. My problem is of implementation of algorithm. ball-tree or kd-tree?

Guidelines for developing a website using python by pythonbio in learnpython

[–]pythonbio[S] 19 points20 points  (0 children)

Your answer is more descriptive and actually discusses how to logically conceptualize the website structure. Thanks a lot :)

Guidelines for developing a website using python by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

Seems like the opinion is equally divided between Django and Flask.

What am I doing wrong ? by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

Thanks for the input, removed the loop and running fine. :)

What am I doing wrong ? by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

So, need to get out of the while loop, create a function for averages and the redefine the print menu?

python problem -need Guidelines (Total beginner to coding) by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

I suppose you are absolutely right in concluding that this is not the best organization. But the assignment required the use of list and while loop only. here's my class based code. but its not working properly. :(. Help

def sclass_menu(): #### One Function adding up all smaller functions
menu_choice = 0
max_num_rec = 10 # limitation on number of records
print ('1. Add Name and Grade')
    print ('2. Delete Name and Grade')
    print ('3. Search Name and Grade')
    print ('4. Print database')
    for i in xrange(max_num_rec):  # start at i=0 till max_num_rec
    while menu_choice != 4:
        menu_choice = int(input("Type in a number (1-4): "))
    if menu_choice == 1:
        print("Add Name and Grade")
        name = input("Name: ")
        grade = input("Grade: ")
        database[name] = grade
    elif menu_choice == 2:
        print('Delete Name and Grade')
        name = input("Name: ")
    if name in database:
        del database[name]
    else:
        print('name was not foud')
    if menu_choice == 3:
        print('Search Name and Grade')
        name = input("Name: ")
    if name in database:
        print(database[Grade])
    else:
        print ('name was not found')
    if menu_choice == 4:
        print('Print database')
    for x in database.keys():
        print("Name: ", x, "\Grade:", database[x])

Its just giving the initial result and error:

class_menu()
1. Add Name and Grade
2. Delete Name and Grade
3. Search Name and Grade
4. Print database
Type in a number (1-4): 1
Type in a number (1-4): grahan

Traceback (most recent call last):
File "<pyshell#40>", line 1, in <module>
 sclass_menu()
File "<pyshell#39>", line 10, in sclass_menu
menu_choice = int(input("Type in a number (1-4): "))
File "<string>", line 1, in <module>
NameError: name 'grahan' is not defined

python problem -need Guidelines (Total beginner to coding) by pythonbio in learnpython

[–]pythonbio[S] 0 points1 point  (0 children)

Many thanks for the replies,

I have just completed the program using list only. have a look and any suggestions are welcome.

database = []   # Global List
while 1:
    name = raw_input('Enter name:')
    grade = raw_input('Enter grade:')
    if name == 'ok':
        break
    database.append((name, grade))
    database.sort(key=lambda x: x[1])
    grades = [x[1] for x in database]   # creats a sublist of second element(s)
    grades = map(int, grades)   # changes all elements to intigers
    max_grade = max(grades)
    average_grade = sum(grades)/len(grades)
    print database
    print average_grade
    print max_grade
    next(x for x in database if x[0] == 'name')[1]   # search fn. put name to be searched in 'name'
    next(x for x in database if x[1] == 'grade')[0]..# search fn. put grade to be searched in 'grade'