4
\$\begingroup\$

I have tried to implement the Array-shuffle method myself. Haven't had a look on some similar algorithm-examples by purpose and tried to figure out something myself.

The actual Array-extension:

class Array
  def own_shuffle
    copy = [] # Create a copy of the array.
    self.each do |item|
      copy << item
    end

    i = 0
    left = 0
    right = 0

    while i < (copy.size * 5)
      loop do # Make sure, that an element isn't overwritten with itself.
        left = rand(copy.size)
        right = rand(copy.size)

        if left != right
          break
        end
      end
      # Swap elements
      temp = copy[left]
      copy[left] = copy[right]
      copy[right] = temp

      i += 1
    end
    # Return the shuffled copy.
    copy
  end
end

Usage:

require_relative "./own_shuffle.rb"

numbs = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12]
result = numbs.own_shuffle

puts result
=begin
3
1
9
5
0
11
7
12
8
2
6
10
=end

I tried around with the while-loop. Setting the upper-limit to 5-times the array-size, seems to me alright currently. I couldn't find any improvement by doing more iterations. Although I haven't tested systematically with arrays of different sizes.

I can expect my algorithm to work correctly?

I'm I using the Ruby-language in a good, idiomatic way? Would you have done it differently and why?

\$\endgroup\$

2 Answers 2

7
\$\begingroup\$

Single-quoted strings

Most communities have developed standardized community style guides. In Ruby, there are multiple such style guides. One popular such style guide is the one found at https://rubystyle.guide/. All the different community style guides in Ruby agree on the basics (e.g. indentation is 2 spaces), but they might disagree on more specific points (e.g. single quotes or double quotes).

In your case, you are consistent with your string literals, so that is good!

I, personally prefer to use single-quoted strings whenever there is no string interpolation. That way, it is immediately obvious that no string interpolation is taking place.

So, I, personally, would not write

require_relative "./own_shuffle.rb"

but instead I would write

require_relative './own_shuffle.rb'

And so on …

Note that it is perfectly fine to use double quoted strings if you otherwise needed to use escapes, e.g. if you had some something like

puts "michael's Ruby exercise"

that reads much better than

puts 'michael\'s Ruby exercise'

So in this case, double quotes would be preferred.

Frozen string literals

Immutable data structures and purely functional code are always preferred, unless mutability and side-effects are required for clarity or performance. In Ruby, strings are always mutable, but there is a magic comment you can add to your files (also available as a command-line option for the Ruby engine), which will automatically make all literal strings immutable:

# frozen_string_literal: true

It is generally preferred to add this comment to all your files.

In future versions of Ruby, files that don't have a # frozen_string_literal: magic comment will slowly default to frozen strings, see Feature #20205 Enable frozen_string_literal by default

Remove the redundant current directory path

Kernel#require_relative already requires the named file relative to the file the message send is in. There is no need to specify the current directory:

require_relative './own_shuffle.rb'

should just be:

require_relative 'own_shuffle.rb'

Remove the redundant file extension

Ruby will try to append the extension .rb to the filename in both Kernel#require_relative and Kernel#require. It will also try a couple of other file extensions, e.g., .so (to load a compiled C extension on Linux), .dylib (same on macOS), .jar (to load a compiled extension on JRuby), and so on. In general, it is a good idea to let Ruby pick the most appropriate version, unless you have a specific reason to override this choice.

require_relative 'own_shuffle.rb'

should just be

require_relative 'own_shuffle'

Avoid block comments

Block comments were used by RD, a Ruby documentation tool that has been obsolete for over two decades. Prefer a series of line comments instead:

=begin
3
1
9
5
0
11
7
12
8
2
6
10
=end

should be

# 3
# 1
# 9
# 5
# 0
# 11
# 7
# 12
# 8
# 2
# 6
# 10

self is the implicit receiver

self is the implicit receiver of any message send that does not explicitly specify a receiver. Therefore, it is not required to specify self as the explicit receiver in most cases.

Therefore,

self.each do |item|
  copy << item
end

should just be

each do |item|
  copy << item
end

Prefer trailing modifier if / unless

For conditionals whose body only contains a single expression, prefer the trailing modifier form.

Replace

if left != right
  break
end

with

break if left != right

Linting

You should run some sort of linter or static analyzer on your code. Rubocop is a popular one, but there are others.

Rubocop was able to detect all of the style violations I pointed out above (plus some more), and also was able to autocorrect all the ones I listed.

Let me repeat that: everything I pointed out to you above, you can actually correct within milliseconds at the push of a button. I have set up my editor such that it automatically runs Rubocop with auto-fix as soon as I hit "save".

It is a good idea to set up your tools such that the linter is automatically run when you paste code, edit code, save code, commit code, or build your project, and that passing the linter is a criterium for your CI pipeline.

In my editor, I actually have multiple linters and static analyzers integrated so that they automatically always analyze my code, and also as much as possible automatically fix it while I am typing. This can sometimes be annoying (e.g. I get over 20 notices for your original code, lots of which are duplicates because several different tools report the same problem), but it is in general tremendously helpful. It can be overwhelming when you open a large piece of code for the first time and you get dozens or hundreds of notices, but if you start a new project, then you can write your code in a way that you never get a notice, and your code will usually be better for it.

However, even by simply hitting "Save", my editor applies a series of automatic fixes which brings the number of notices down substantially. Running Rubocop as described above, further reduces this, and as mentioned, lots of these are duplicates because I have multiple different linters and analyzers configured.

That was easy!

Note that all we did so far was either done automatically for us by the editor or Rubocop's auto-correct feature, or we were blindly following instructions such as renaming methods. We did not yet have to think at all.

This is one of the great things about using automatic code formatters, linters, and static analyzers: you don't have to think about all the simple stuff. Where do I put my parentheses? How do I indent my code? What are the naming conventions? All of that is taken care of by the tools, I don't have to worry about it and can instead focus on the important stuff: what should the code actually do?

Another advantage of following community style guides and using community tools is that, if my code looks like everybody else's code, and everybody else's code looks like my code, it is much easier to get someone else to help me with my code.

There's a linter called Standard Ruby, which is essentially a set of standard configurations for Rubocop that cannot be changed. This takes all the discussions and decisions out of the whole topic.

Vertical whitespace

I, personally, would separate the final return value from the rest of the method body by a blank line. You are actually doing that in your while loop and your loop block, but not in your method:

# …
    end
    # Return the shuffled copy.
    copy

should be

# …
    end

    # Return the shuffled copy.
    copy

This part also looks a bit squeezed:

# …
  if left != right
    break
  end
end
# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp

I would insert a blank line in between the last end and the # Swap elements comment:

# …
  if left != right
    break
  end
end

# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp

No Monkey Patching

It is generally bad form to "monkey patch" (i.e. modify) existing modules and classes.

In this case, of course, you are rewriting an existing method as an exercise, so not all "general" rules apply.

IFF monkey patching is absolutely, positively, required, there are a couple of things we can do to make it nicer.

Imagine you are a maintenance programmer and you see a piece of unfamiliar code for the very first time which looks like this:

[1, 2, 3].own_shuffle

You are wondering where this own_shuffle method that you've never heard of comes from, so you ask Ruby herself by using the Method#owner method:

method = [1, 2, 3].method(:own_shuffle)

method.owner
#=> Array

Hmm … that's confusing. There is no method named own_shuffle mentioned in the documentation of the Array class.

Use a clearly named mixin to mark the extension

So, the first thing we can do is to wrap the method into a mixin whose name makes it clear that we are monkey patching a core class and that allows a reader to make an educated guess where to find the documentation:

module ::OwnShuffle
  module ArrayExtension
    def own_shuffle
      copy = [] # Create a copy of the array.
      each do |item|
        copy << item
      end

      i = 0
      left = 0
      right = 0

      while i < (copy.size * 5)
        loop do # Make sure, that an element isn't overwritten with itself.
          left = rand(copy.size)
          right = rand(copy.size)

          break if left != right
        end

        # Swap elements
        temp = copy[left]
        copy[left] = copy[right]
        copy[right] = temp

        i += 1
      end

      # Return the shuffled copy.
      copy
    end
  end
end

class ::Array
  include ::OwnShuffle::ArrayExtension
end

Now, if we try the same thing as above, we get a more useful response:

method.owner
#=> OwnShuffle::ArrayExtension

And the reader can now "guess" that the documentation for this method, as well as its source code, can probably be found in a file named own_shuffle/array_extension.rb within a library or Gem called own_shuffle, at least assuming that the author followed standard Ruby naming and directly layout guidelines.

Note: we could have also used Method#source_location method to find where the method is defined. However, this method does not always work, so our poor maintenance programmer may be forgiven for not trying it in the first place. In particular, on most Ruby implementations, Method#source_location only works for methods that are defined in Ruby.

On the most popular Ruby implementation, YARV, the core Array class is written in C, which means source_location wouldn't work for methods defined in the Array core class. So, a maintenance programmer might not even try to use source_location on a method they find on an array.

Wrap the extension in a Refinement

Refinements are Ruby's way of controlling the scope of monkey patching. Refinements are lexically scoped, so the monkey patches are only visible within the lexical scope (script, module definition, or eval string) where they are explicitly activated.

module ::OwnShuffle
  module ArrayExtension
    def own_shuffle
      # …
    end
  end

  refine ::Array do
    import_methods ::OwnShuffle::ArrayExtension
  end
end

This gives the user the option: they can either use the OwnShuffle::ArrayExtension mixin to monkey patch Array themselves, if they really want to make this monkey patch available globally. Alternatively, they can use the Refinement.

The nice thing about Refinements is that they are only visible if you explicitly activate them:

# Here, the method does not exist:
[1, 2, 3].own_shuffle
# in `<main>': undefined method `own_shuffle' for an instance of Array (NoMethodError)

# I need to explicitly activate the Refinement:
using ::OwnShuffle

[1, 2, 3].own_shuffle
#=> [2, 1, 3]

Above, I activated the Refinement in the script scope, so it will be visible in the entire script. However, I can also activate the Refinement in class or module scope:

class ::Foo
  using ::OwnShuffle

  def self.foo = [1, 2, 3].own_shuffle
end

class ::Bar
  def self.bar = [1, 2, 3].own_shuffle
end

::Foo.foo
#=> true

::Bar.bar
# in `bar': undefined method `own_shuffle' for an instance of Array (NoMethodError)

Unnecessary use of temporary variable

Thanks to a feature in Ruby called multiple assignment

# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp

can just be written as

copy[left], copy[right] = copy[right], copy[left]

This is the idiomatic way of writing a swap and will be immediately recognized by every Ruby programmer as a swap. This also makes the comment above superfluous.

Comments

Speaking of comments. The Ruby Style Guide starts its section on comments off with this gem:

No comments

Write self-documenting code and ignore the rest of this section. Seriously!

And I whole-heartedly agree!

If your code is so complicated that you think it needs a comment to explain it … make your code simpler instead so it doesn't need a comment!

Specifically, comments should never explain how the code does something. Because the code already explains how the code does something. That is, in fact, what code is: a specification for how to do something. Comments also should not explain what the code does. This should be obvious from good, intention-revealing names.

Comments should only ever explain why code does something in a certain, non-obvious, way. Again, ideally, code should be obvious, and I shouldn't need to explain why the code is the way it is – it should be obvious from the code. However, there are certain cases where the obvious way to do something is the wrong way to do it. And in that case, for example, it would make sense to write a comment explaining what the obvious way is, why it doesn't work, and why this particular, non-obvious way, is the correct way.

Here is an example that could benefit from a comment:

while i < (copy.size * 5)

Why 5 instead of 4 or 6?

The most useless kind of comment is a comment that just repeats what the code already says.

For example, here:

# Return the shuffled copy.
copy

Every Rubyist knows that the last expression evaluated inside a method body is the return value of that method body. There is no reason to educate a reader about this fact.

Or here:

copy = [] # Create a copy of the array.

The name of the variable already says what the code does.

Constant expression in loop

This expression is evaluated over and over again in the loop:

copy.size

as well as

copy.size * 5

Both of these expressions are constant, they will never change. Therefore, there is no need to evaluate them multiple times:

copy_size = copy.size
iterations = copy_size * 5

while i < iterations
  loop do # Make sure, that an element isn't overwritten with itself.
    left = rand(copy_size)
    right = rand(copy_size)

    break if left != right
  end

  copy[left], copy[right] = copy[right], copy[left]

  i += 1
end

This actually not so much about optimizing the code, since any half-smart compiler will perform this optimization anyway. The main point is that this makes the code more readable by

  • giving a name to the iterations variable, clearly telling the reader what this expression is meant to do and
  • making it clear to the reader that this expression is not changing, making the code easier to analyze.

Higher-level iterators

Ruby has many powerful iteration methods in its collections library. Using each directly is almost always sub-optimal. This particular pattern that you are using:

  • Initialize an accumulator. (In this case the array assigned to copy.)
  • Iterate over the collection and add to the accumulator.
  • Return the accumulator.

Is known as a Fold, and is available in Ruby in two forms, Enumerable#each_with_object and Enumerable#inject.

Enumerable#inject is the more functional form, which is intended for a functional transformation of the code, and using it would look a bit like this:

copy = inject([]) { |result, item| result + [item] }

However, in this case, I think Enumerable#each_with_object is slightly more readable:

copy = each_with_object([]) { |item, result| result << item}

Excursion: Generality of fold (inject / each_with_object)

When I wrote above that you can rewrite this iteration with inject or each_with_object, that was actually a tautological statement. I didn't even have to read the code to make this statement.

It turns out that fold is "general". Every iteration over a collection can be expressed using fold. This means, if we were to delete every method from Enumerable, except inject, then we could re-implement the entire Enumerable module again, using nothing but inject. As long as we have inject, we can do anything.

A better higher-level iterator

But actually, what you are doing here is simply transforming each element of the collection. You don't need something as powerful as a fold for that. This is a much simpler operation called Map, and it is also available in Ruby as Enumerable#map:

copy = map(&:itself)

The right method

However, all of this is moot in the end, because there is no need for iterating over the array in the first place. (Almost) every object in Ruby responds to clone, and Array is no exception:

copy = clone

Excursion: Code Smells

The concept of Code Smells is often misunderstood. A Code Smell is something that should be investigated. However, it is not necessarily something bad that needs fixing.

The smell metaphor is used here just like in the real world: a smell might be a leaky toilet or a Swiss cheese.

Feature Envy

Your code has a Code Smell called Feature Envy. Feature Envy is when one object calls another object's methods much more that its own.

In your case, you almost exclusively call methods on copy rather than self. This implies that this part of the code "wants to live" inside copy instead of self.

In this case, self and copy are both instances of Array, so both methods would still end up in the same class. One way would be to extract just the part which deals with copy into a helper method with protected visibility.

Something like

module ::OwnShuffle
  module ArrayExtension
    def own_shuffle = clone.own_shuffle_helper

    protected

    def own_shuffle_helper
      raise FrozenError if frozen?

      i = 0
      left = 0
      right = 0

      array_size = size
      iterations = size * 5

      while i < iterations
        loop do # Make sure, that an element isn't overwritten with itself.
          left = rand(array_size)
          right = rand(array_size)

          break if left != right
        end

        self[left], self[right] = self[right], self[left]

        i += 1
      end

      self
    end
  end

  refine ::Array do
    import_methods ::OwnShuffle::ArrayExtension
  end
end

Basically, we now have one method own_shuffle_helper which performs the shuffle in-place and another method own_shuffle which returns a new shuffled array. It turns out that the Ruby core library has those exact same two methods: Array#shuffle! and Array#shuffle.

So, we can just rename own_shuffle_helper to shuffle! and make it public.

This is actually a common implementation technique for when there is a pair of methods where one is in-place and the other returns a modified copy. They are typically implemented either this way:

class Foo
  def bar!
    # do the work
  end

  def bar = clone.bar!
end

Or the other way around:

class Foo
  def bar
    # do the work
  end

  def bar! = replace(bar)
end

Testing

There is no automated testing in your code. Apart from the single example in your second code block (which is not automated), there is no testing at all.

You should always strive to have as close to 100% test coverage as possible. It doesn't really matter if you have unit tests, functional tests, integration tests, end-to-end tests, or a mix of them, but you should have tests, and they should be automated.

In this particular case, since you are implementing a Ruby core method, there are already plenty of tests written for you in the Ruby/Spec project as well as the YARV test suite.

Running the Ruby/Spec tests against your code yields 12 errors, 6 failures, and only 3 passing tests.

The YARV test suite has 0% passed tests.

The main reason for all these test failures and errors is twofold:

  • The test suites for Array#shuffle also contain tests for Array#shuffle!. (We have already fixed that part.)
  • You are not implementing the API of Array#shuffle correctly. It takes an optional keyword argument random which is the random number generator to be used for shuffling.

One possibility is to remove all the tests which test for this behavior. This can be legitimate if your intention is to only replicate part of the behavior of Array#shuffle.

If we remove all the tests which use this argument, then your code does indeed pass all tests! However, there is a slight wrinkle here: after deleting all the tests which use the random argument, there are barely any tests left over.

Providing a simple, quick and dirty implementation that lets us run a few more of the tests is relatively straightforward:

module ::OwnShuffle
  module ArrayExtension
    def own_shuffle(...) = clone.own_shuffle!(...)

    def own_shuffle!(random: ::Random)
      i = 0
      left = 0
      right = 0

      array_size = size
      iterations = size * 5

      while i < iterations
        loop do # Make sure, that an element isn't overwritten with itself.
          left = random.rand(array_size)
          right = random.rand(array_size)

          break if left != right
        end

        self[left], self[right] = self[right], self[left]

        i += 1
      end

      self
    end
  end

  refine ::Array do
    import_methods ::OwnShuffle::ArrayExtension
  end
end

And when we run this, we notice … we run into an infinite loop.

Which brings me to …

Bug #1 – potential arbitrarily long loop

This loop:

loop do # Make sure, that an element isn't overwritten with itself.
  left = rand(array_size)
  right = rand(array_size)

  break if left != right
end

Is not guaranteed to terminate quickly. It can take an arbitrary amount of time to finish, although with smaller and smaller probability.

There is a 1/array_size chance that left and right will be the same in the first try. There is a 1/(array_size²) chance that left and right will be the same in tries #1 and #2. And so on. The chance that left and right will be the same N times in a row gets smaller and smaller as N gets bigger, but crucially, it never gets to 0. So, there is always a non-zero chance that this loop will have to try for a very long time.

Bug #2 – potential infinite loop

In fact, it is slightly worse: left and right are not true random numbers. They are pseudorandom. They are generated by a pseudorandom number generator (PRNG). A PRNG is a computer program. Like all computer programs, a PRNG is deterministic: given the same starting state, it will produce the same sequence of pseudorandom numbers.

Since the number of possible states a computer can be in is finite, this means a PRNG cannot produce arbitrarily many pseudorandom numbers. At some point, it will be in a state it has been in before and produce the same sequence of numbers it produced before. This is called a cycle, and is unavoidable: every PRNG must have cycles because computers cannot have infinitely many states.

Therefore, it is possible that there exists some starting state such that the PRNG produces a sequence of random numbers such that left will always equal right.

Bug #3 – definite infinite loop

For an array of size 1, your code will run into an infinite loop: an array of size 1 only has one index, so your code will loop infinitely waiting for left and right to be different … which is impossible because there is only one index and thus left and right must be the same index.

Bug #4 – bias

Your code is biased. The shufflings should follow a uniform random distribution, but that is not the case. The easiest way to demonstrate this, is with a two-element array.

If I call

[23, 42].own_shuffle

I should get the result [23, 42] 50% of the time and the result [42, 23] 50% of the time. Instead, I get [23, 42] 100% of the time.

The reason for this is that left can never equal right. As a result, the two elements will always be swapped. This happens 10 times (array size times 5), which is an even number, so after an even number of swaps, we end up with the original order.

There are similar biases with other array sizes. For example, [1, 2, 3].own_shuffle should return [1, 2, 3] 1/6th of the time, but, in fact, never returns [1, 2, 3].

The good news is that all four of these bugs can easily be fixed by just deleting the loop:

module ::OwnShuffle
  module ArrayExtension
    def own_shuffle(...) = clone.own_shuffle!(...)

    def own_shuffle!(random: ::Random)
      i = 0
      left = 0
      right = 0

      array_size = size
      iterations = size * 5

      while i < iterations
        left = random.rand(array_size)
        right = random.rand(array_size)

        self[left], self[right] = self[right], self[left]

        i += 1
      end

      self
    end
  end

  refine ::Array do
    import_methods ::OwnShuffle::ArrayExtension
  end
end

This gets most of the tests to pass. There are still some failures and errors, but those are mostly concerned with testing pathological cases like "what happens if random does not return an Integer" or "what happens if random returns a number outside of the range we asked for?"

A well-known algorithm

However, I am still not convinced that the algorithm is actually bias-free. But I'm also not smart enough to check that. Which is why I would rely on implementing a well-known algorithm that has been proven correct, such as the (modified) Fisher-Yates shuffle.

It should be fairly easy to implement it in Ruby, something like this:

module ::OwnShuffle
  module ArrayExtension
    def own_shuffle(...) = clone.own_shuffle!(...)

    def own_shuffle!(random: ::Random)
      highest_index = size - 1
      (0...highest_index).each do |i|
        j = random.rand(i..highest_index)

        self[i], self[j] = self[j], self[i]
      end

      self
    end
  end

  refine ::Array do
    import_methods ::OwnShuffle::ArrayExtension
  end
end

This passes almost all tests of both the Ruby/Spec suite and the YARV test suite. The only failing tests fall into four categories:

  • calling shuffle! on a frozen array should raise a FrozenError
  • highly technical ones that test for pathological random number generators
  • some that test for 100% identical behavior with the C implementation of Array#shuffle in YARV
  • one test that tests that calling shuffle on a subclass of Array returns an Array

It would be possible to get those tests to pass as well, but it will mostly involve adding lots of checks and raising the appropriate errors or performing the appropriate conversions. But that will just make the code more ugly and obscure the algorithm.

For example, here I have added a few checks which silence some (but not all) of the errors and failures:

def own_shuffle!(random: ::Random)
  raise(::FrozenError) if frozen?

  highest_index = size - 1
  (0...highest_index).each do |i|
    j = random.rand(i..highest_index).to_int

    raise(::RangeError) if j.negative? || j > highest_index

    self[i], self[j] = self[j], self[i]
  end

  self
end

I added a check for a frozen array, I convert the return value of rand to an Integer, and I check for various out-of-range conditions on the return value of rand. All of these checks are part of the test suites, but, to be honest, I am not happy about at least some of them.

We explicitly asked rand for a number in a specific range, so we shouldn't need to check for a number outside of that range. That would indicate a bug in rand, not in our code.

\$\endgroup\$
0
3
\$\begingroup\$

The accepted answer contains a decent amount of good advice and a low signal-to-noise ratio.

The important takeaway is "use a Fisher-Yates shuffle". If for some reason you'd rather not, you could at least simplify by leaving out the check for self-assignment. Doing a self swap is harmless, and probably actually provides a better distribution.

A few other ruby style points not mentioned previously:

  • Instead of manually copying the elements, do copy=self.dup()

  • Instead of managing the loop indexes yourself with

i=0
...
while i < (iterations)
...
   i += 1

just use iterations.times do |i|

\$\endgroup\$

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.