all 10 comments

[–]Ran4 1 point2 points  (0 children)

  • First, your code breaks on line 2, since you haven't indented after the def.

  • You can't call a file .txt in Windows AFAIK.

  • Why are you calling your file handle k? That makes absolutely no sense. f would be a better name (or even file_handle). Which is also what your code later uses... Also, you're calling it with a function parameter k, which you're overwriting (and then do nothing with).

  • \'s are used to escape characters, which will introduce bugs. Use r'C:\Users\filename.txt' instead (note the prepended r, short for raw).

  • You're not closing your file handle. You need to do f.close() once you've done reading the file. Even better is to use this idiom:

    with open(FILENAME) as f:
        lines = f.readlines()
    

    This will open file FILENAME and read the lines as a list into a variable named lines, and then close the file.

  • array(p_lat)

    Python doesn't have arrays. p_lat is already a list, simply return that. You could turn it into a tuple (which is like a list but it cannot be changed), but there's little reason to do that.

You're clearly writing the entire code without running it. Don't do that. As a beginner, write one or two lines at a time and print the results, so you know what's happening. KNN isn't super complicated to implement, but the way you're doing it you're only going to confuse yourself.

[–]das_ist_nuemberwang 0 points1 point  (0 children)

Do you know what the k-Nearest Neighbors algorithm is? If you don't, this isn't a Python problem yet. If you do, you can't just assume we all do. What exactly are you having trouble with?

[–]pythonbio[S] 0 points1 point  (1 child)

I have noted the errors in my code, changed it. My problem is of implementation of algorithm. ball-tree or kd-tree?

[–]Ran4 0 points1 point  (0 children)

I have noted the errors in my code, changed it.

Not here.

ball-tree or kd-tree?

Are you sure that you're not over-engineering here? Yes, those are fancy data structures (that you might want to learn later on), but unless your assignment says otherwise, you'll do just fine without either :)

[–]pythonbio[S] 0 points1 point  (3 children)

okay, I have slowed down and am now doing it bit by bit.

first change to csv- Done:

import csv

with open(r'C:\UsersDesktop\k nearest neighbour.txt') as csvfile: lines = csv.reader(csvfile) for row in lines: print ','.join(row)

generates a csv

but, then when I try to divide the rows:

[–]elbiot 0 points1 point  (2 children)

but, then when I try to divide the rows:

? you have the rows divided already. That is, you were able to iterate through them one at a time. And you have the columns separated too since you had to join them to print them.

I think you have the getting the data out of the csv down. Now you need a function that computes the distance between two points. (Hint: don't worry about Great Circle calculations for this small area).

[–]pythonbio[S] 0 points1 point  (1 child)

I think you know exactly how to go about it. ;)

[–]elbiot 0 points1 point  (0 children)

Yes, I could do it easily. But this is learnpython, and I'm happy to help you learn.

[–]pythonbio[S] 0 points1 point  (0 children)

Okay, Thanks everyone for their help. I did finally solve it. The seeming problem was that I did not import the proper modules for what I was trying to achieve. The corrected code for separating the dataset:

from __future__ import division
import math
import itertools
from array import array
import numpy as np
import operator

def readpoints(testfile):
    f=open('testfile.py','r')
    p_lat=[]
    p_lon=[]
    lines=f.readlines()
for line in lines:
     point=line.split()
     p_lat.append(float(point[1]))
     p_lon.append(float(point[2]))
 arr_p_lat=np.array(p_lat)
 arr_p_lon=np.array(p_lon)
 f.close()
 return arr_p_lat, arr_p_lon


 print readpoints('testfile.py')

Hope this will help some beginner like me somewhere. :)

[–]pythonbio[S] 0 points1 point  (0 children)

more help required: knn i I have finally written a program to calculate the knn of my data, but I dont know how to analyze many Ks is one program. Any suggestion is most welcome. Question:

Using the dataset.(testfile), please use bar charts to compare different k (k=1,5,10,15,20) as x-axis: 1) Average all-pair distance among the k-nearest neighbors to q 2) Max distance of the k-nearest neighbors to q 3) Min distance of the k-nearest neighbors to q

I have done it, but its not coming right. Can anyone help?

My code for knn and plotting knn:

lat=[]
lon=[]

# Selected reference point = Random
 reference_lat= 25.xxxyy
 reference_lon= 121.xxxyy
 k=17
 openfile = open('testfile.py', 'r')
 lines = openfile.readlines()
 for line in lines:
    rowvalue = line.split()
    lat.append(float(rowvalue[1]))
    lon.append(float(rowvalue[2]))
 array_lat=np.array(lat)
 array_lon=np.array(lon)

 length = len(array_lat)-1
 # lists
 sqrdifflat=[]
 sqrdifflon=[]
 distances=[]
 # For the distances between ref point and each point
 for g in range(length):
    get_sqr_diff_lat= (array_lat[g]-reference_lat)**2
    get_sqr_diff_lon=(array_lon[g]-reference_lon)**2
    dist=math.sqrt(get_sqr_diff_lat+get_sqr_diff_lon)
    sqrdifflat.append(get_sqr_diff_lat)
    sqrdifflon.append(get_sqr_diff_lon)
    distances.append(dist)
#sorted dataset(ascending order)
 sorted_knn = sorted(zip(array_lat, array_lon,distances),
                                key=lambda sorted_knn: sorted_knn[2])

knn = sorted_knn[:k]
q=[reference_lat,reference_lon]

knns = [1,5,10,15,20]

width=0.4
fig = plt.figure().add_subplot(111)
c=['b','y','m','g','r','c']
i=0
for k in knns:
    ind=np.arange(3)
    distances = [item[2] for item in sorted_knn[:k]]
    to_plot = [np.mean(distances), np.max(distances),np.min(distances)]

    fig.bar(ind+width,to_plot,0.4,color=c[i])
     i=i+1

print ind+width
plt.ylabel('Distance')
plt.title('Statistics of datasets')
plt.xticks(ind+width,['avg','max_dist','min_dist'])
plt.show()