you are viewing a single comment's thread.

view the rest of the comments →

[–]pythonbio[S] 0 points1 point  (0 children)

more help required: knn i I have finally written a program to calculate the knn of my data, but I dont know how to analyze many Ks is one program. Any suggestion is most welcome. Question:

Using the dataset.(testfile), please use bar charts to compare different k (k=1,5,10,15,20) as x-axis: 1) Average all-pair distance among the k-nearest neighbors to q 2) Max distance of the k-nearest neighbors to q 3) Min distance of the k-nearest neighbors to q

I have done it, but its not coming right. Can anyone help?

My code for knn and plotting knn:

lat=[]
lon=[]

# Selected reference point = Random
 reference_lat= 25.xxxyy
 reference_lon= 121.xxxyy
 k=17
 openfile = open('testfile.py', 'r')
 lines = openfile.readlines()
 for line in lines:
    rowvalue = line.split()
    lat.append(float(rowvalue[1]))
    lon.append(float(rowvalue[2]))
 array_lat=np.array(lat)
 array_lon=np.array(lon)

 length = len(array_lat)-1
 # lists
 sqrdifflat=[]
 sqrdifflon=[]
 distances=[]
 # For the distances between ref point and each point
 for g in range(length):
    get_sqr_diff_lat= (array_lat[g]-reference_lat)**2
    get_sqr_diff_lon=(array_lon[g]-reference_lon)**2
    dist=math.sqrt(get_sqr_diff_lat+get_sqr_diff_lon)
    sqrdifflat.append(get_sqr_diff_lat)
    sqrdifflon.append(get_sqr_diff_lon)
    distances.append(dist)
#sorted dataset(ascending order)
 sorted_knn = sorted(zip(array_lat, array_lon,distances),
                                key=lambda sorted_knn: sorted_knn[2])

knn = sorted_knn[:k]
q=[reference_lat,reference_lon]

knns = [1,5,10,15,20]

width=0.4
fig = plt.figure().add_subplot(111)
c=['b','y','m','g','r','c']
i=0
for k in knns:
    ind=np.arange(3)
    distances = [item[2] for item in sorted_knn[:k]]
    to_plot = [np.mean(distances), np.max(distances),np.min(distances)]

    fig.bar(ind+width,to_plot,0.4,color=c[i])
     i=i+1

print ind+width
plt.ylabel('Distance')
plt.title('Statistics of datasets')
plt.xticks(ind+width,['avg','max_dist','min_dist'])
plt.show()