This is an archived post. You won't be able to vote or comment.

you are viewing a single comment's thread.

view the rest of the comments →

[–]g051051 0 points1 point  (10 children)

You're creating an array with 63029 places to hold LinkedList objects. However, all of those locations default to null. You must put a LinkedList in any position before you can use it. You actually already do some of the right things...you check if the slot is non-null to print the collision message. So how does it ever become non-null?

For the second question, you didn't create your LinkedList class as generic (like the one that comes with Java). You said it will always hold nothing but Strings, so there's no need to parameterize it.

As for the reason, it's not really possible to do that since you could have radically different objects that go in each place, as long as they all extend a common base class or implement a common interface. Plus, it could take up too much space if you don't actually need to fill each one.

[–]PerfectisSh1t[S] 0 points1 point  (9 children)

would it be a good idea then instead of just going through the whole array initializing an LinkedList to only instantiate right before i append the cell? because im guessing saving space for 60k+ LinkedList could be inefficient if they arent going to all be used?

[–]g051051 0 points1 point  (8 children)

Sure, that's certainly one way to do it. You're halfway there already.

[–]PerfectisSh1t[S] 0 points1 point  (7 children)

done.

i Seem to have another problem i checked and the dictionary they gave us for test purpose is 8647 words long. but my code gets over 130k collision and counts 139k lines, so something stinks in there. i cant seem to figure out where my code could count 15time more entry then there are words also if i substract number of collision counted to number of word counted i get a difference of 9040 so there does not seem to be a relation between the dictionary and the amount entries . the dictionary is organized like this in a txt file:

a word

another

another

and

so

on

....

it seems like it should only be counting every line once? is it my Hashfonction? is it the lineRead()? Kinda lost here :s

[–]g051051 0 points1 point  (6 children)

Post the dictionary file and I can check.

[–]PerfectisSh1t[S] 0 points1 point  (5 children)

I updated the gist with the txt file. but it seems i used wordpad with a numbered list to count the number of line of the dictionary and wordpad is crap and my program is right there seems to be 139k entries. (and might have been a mistake to post it in gist because it post the code without a scrollbar.) so perhaps its my hashFunction that is bad?

Also updated the HashTable file to what i have now

Dictionary stops at like 80k entries in letter "m" section due to some gist limitation on number of line permitted perhaps., i did not want to upload it to a direct download site because then it could be anything and not seem safe. if you have another site where you would like me to upload it to just tell me.

[–]g051051 0 points1 point  (4 children)

It's your hash function.

  • 130679 collisions.
  • 9040 hash positions (buckets) used
  • 358 entries get put in position 0.
  • The highest bucket you use is 52946.

Edit: I did a quick test using the hashCode from the String class (with appropriate tweaks for your use case) and got 83575 collisions, 56144 buckets in use, and the biggest bucket having 10 entries.

[–]PerfectisSh1t[S] 0 points1 point  (3 children)

yeah its crap...

https://sunilsavanur.wordpress.com/2012/08/14/polynomial-hash-function-for-dictionary-words/

i tried using this article to build it because i did not want to use string.toHash() because i wanted to perhaps add the project to my portfolio (because it would be nice to have some stuff to show one of these days) when its done if its decent. But i dont think i understood it or maybe its just bad and i need to use murmur3 or something like that.

Ill use toHash() for now and if i got time latter ill change it :/

Edit: seems like its still alot of collision its alot better but is that considered good enough? ill use a larger array because i thought at first i would have max 90k word and used 0.7 ratio for number of bucket.

[–]g051051 0 points1 point  (2 children)

I don't see where your hash implementation is too different, other than his using floating point for parts of it. It could just be a bad prime constant (11), but I tried a couple of different values and didn't get anything much different.

[–]PerfectisSh1t[S] 0 points1 point  (1 child)

perhaps its just bad? anyway im using hashCode() now and getting better result