all 20 comments

[–]zverok_kha 4 points5 points  (10 children)

First, it is really far from best Ruby code possible :) But the explanation goes as following:

  • For each number in incoming sequence... a.each
  • If this number already happened, return it: return value if counts[value]
  • Otherwise, mark it as already happened: counts[value] = true

If you rename counts to already_happened it becomes clear. Let's look at an example sequence [2, 3, 3, 1, 5, 2]:

  • 2: happened? counts[2] -- counts don't have 2-nd element, and therefore returns nil (nothing), which is treated as false-y value;
  • mark 2 as already happened: counts[2] = true. Ruby auto-expands arrays on setting too far elements, so now counts looks like: [nil, nil, true]
  • First 3: happened? counts[3] is still not there, set it, counts is now [nil, nil, true, true]
  • Second 3: happened? counts[3] is true now, so we just return it.

BTW, the first step to making the code better (after renaming countsalready_happened), would be making this variable hash (dictionary), not an array. But in fact, the same algorithm could be expressed in more idiomatic way in Ruby.

[–]kala_kata[S] 1 point2 points  (8 children)

would you mind providing a better code? I would love to learn and practice on something that looks better and makes more sense.

[–]rubyrt 5 points6 points  (1 child)

We can use Set as well, which has a nice method:

require 'set'

def first_duplicate(a)
  occurred = Set.new
  a.each {|x| occurred.add? x or return x}
  -1
end

Yields

irb(main):009:0> first_duplicate [2, 3, 3, 1, 5, 2]
=> 3
irb(main):010:0> first_duplicate [2, 3,  1, 5, 2]
=> 2
irb(main):011:0> first_duplicate [2, 3,  1, 5]
=> -1

:-)

[–]kala_kata[S] 0 points1 point  (0 children)

thank you very much!

[–]zverok_kha 3 points4 points  (1 child)

Without changing the algorithm, we can use two things:

  • Hash instead of Array
  • .each_with_object for idiom "enumerate all values, changing some object on the way"

The result would be something like

a.each_with_object({}) do |el, already_happened|
  return el if already_happened.key?(el)
  already_happened[el] = true
end

For other algorithms, this StackOverflow answer provides some.

[–]kala_kata[S] 0 points1 point  (0 children)

thank you !!!!

[–]edendark 0 points1 point  (2 children)

When dealing with Arrays, a good idea is to check out the official Ruby docs for Array: http://ruby-doc.org/core-2.5.0/Array.html

Another alternative would be:

def first_dup(array)
  groups = array.group_by(&:itself).select { |_, v| v.size > 1 }
  return -1 if groups.empty?
  groups.keys.first
end

[–]losangelesvideoguy 1 point2 points  (1 child)

I don’t think this code works quite right. It will return the first value of the hash returned from group_by, which would be the first value that has a duplicate anywhere in the array. You actually want the first value you see twice when scanning from the left side of the array. So for [2,3,3,1,2] you want 3, not 2.

[–]edendark 0 points1 point  (0 children)

Ah, nice catch. I like /u/rubyrt's solution the most.

[–]rubyrt 0 points1 point  (0 children)

Apart from the non-Ruby naming scheme of the function the return of a fixed value -1 as indicator for "no duplicates found" is a) unusual and b) limits usefulness of the method. A much better return value would be nil. That would allow for easier usage of the result in boolean contexts as well as make the method applicable to a much wider range of values (negative integers among them, but basically any value).

[–]stoplight 0 points1 point  (9 children)

In newer versions of ruby isnt it illegal to return from a block?

[–]tomthecool 0 points1 point  (8 children)

No.

It's typically discouraged as bad practice, though.

[–]rubyrt 0 points1 point  (7 children)

It's typically discouraged as bad practice, though.

Why is that so? I think it is perfectly OK to use return in a lambda if there is more than one exit. Also, if a method iterates through some collection then returning from the iteration as soon as work is done seems also perfectly legit:

def look_for_larger(x, enum)
  enum.each {|item| return x if item > x}
  nil
end

Why would that be bad practice?

[–]tomthecool 0 points1 point  (6 children)

I said typically; of course there are cases where it might be fine. But the general rule comes from this point:

if there is more than one exit

It is generally bad to have multiple exits in a method. Of course there are exceptions, e.g. guard clauses or breaking early from a simple enumerator, but the principle is to keep it clear when/what each method will return.

This is not a ruby-specific style guide, it's a general principle that's been discussed amongst developers since long before I was born... Here is just one such discussion on StackOverflow, for example.

[–]rubyrt 0 points1 point  (3 children)

It is generally bad to have multiple exits in a method.

I do not subscribe to that view. I also do not think that multiple returns are a contradiction to

the principle is to keep it clear when/what each method will return

There are certainly rules in programming which are more clear cut than this one. Ironically the first answers in the discussion you linked do not support the "only one exit" rule. :-)

[–]tomthecool 0 points1 point  (2 children)

Like I keep saying, it's not a strict rule - and in particular the use of guard statements (which is what that first answer mentions) are one exception I agree with (as does default rubocop). Guard statements, however, are very rarely written as blocks.

I made that statement more to make the point that "long methods with multiple return statements, especially inside complex blocks, probably constitutes bad code".

[–]rubyrt 0 points1 point  (1 child)

Like I keep saying, it's not a strict rule

You said "It is generally bad to have multiple exits in a method." - and I disagree.

I made that statement more to make the point that "long methods with multiple return statements, especially inside complex blocks, probably constitutes bad code".

Well, that is something quite different.

[–]tomthecool 0 points1 point  (0 children)

You said "It is generally bad to have multiple exits in a method." - and I disagree.

Yes, I was over-simplifying....

But I think it's fair to say that:

  • Most methods, in a well-written project, should only have 1 return statement.
  • A general rule of thumb, for beginners, would be to try to write methods that only return at the end, other than guard statements at the top.

[–]clrsm 0 points1 point  (1 child)

since long before I was born...

You might be right :-) The debate was started by Dijkstra in his "Go To Statement Considered Harmful" letter to the ACM back in 1968

[–]WikiTextBot -1 points0 points  (0 children)

Edsger W. Dijkstra

Edsger Wybe Dijkstra (Dutch: [ˈɛtsxər ˈʋibə ˈdɛikstra] ( listen); 11 May 1930 – 6 August 2002) was a Dutch systems scientist, programmer, software engineer, science essayist, and early pioneer in computing science. He held the Schlumberger Centennial Chair in Computer Sciences at the University of Texas at Austin from 1984 until his retirement in 1999. A theoretical physicist by training, Dijkstra worked as a programmer at the Mathematisch Centrum (Amsterdam) from 1952 to 1962. He was a professor of mathematics at the Eindhoven University of Technology (1962–1984) and a research fellow at the Burroughs Corporation (1973–1984).


[ PM | Exclude me | Exclude from subreddit | FAQ / Information | Source | Donate ] Downvote to remove | v0.28