Most communities have developed standardized community style guides. In Ruby, there are multiple such style guides. One popular such style guide is the one found at https://rubystyle.guide/. All the different community style guides in Ruby agree on the basics (e.g. indentation is 2 spaces), but they might disagree on more specific points (e.g. single quotes or double quotes).
In your case, you are consistent with your string literals, so that is good!
I, personally prefer to use single-quoted strings whenever there is no string interpolation. That way, it is immediately obvious that no string interpolation is taking place.
So, I, personally, would not write
require_relative "./own_shuffle.rb"
but instead I would write
require_relative './own_shuffle.rb'
And so on …
Note that it is perfectly fine to use double quoted strings if you otherwise needed to use escapes, e.g. if you had some something like
puts "michael's Ruby exercise"
that reads much better than
puts 'michael\'s Ruby exercise'
So in this case, double quotes would be preferred.
Frozen string literals
Immutable data structures and purely functional code are always preferred, unless mutability and side-effects are required for clarity or performance. In Ruby, strings are always mutable, but there is a magic comment you can add to your files (also available as a command-line option for the Ruby engine), which will automatically make all literal strings immutable:
# frozen_string_literal: true
It is generally preferred to add this comment to all your files.
In future versions of Ruby, files that don't have a # frozen_string_literal:
magic comment will slowly default to frozen strings, see Feature #20205 Enable frozen_string_literal
by default
Remove the redundant current directory path
Kernel#require_relative
already requires the named file relative to the file the message send is in. There is no need to specify the current directory:
require_relative './own_shuffle.rb'
should just be:
require_relative 'own_shuffle.rb'
Ruby will try to append the extension .rb
to the filename in both Kernel#require_relative
and Kernel#require
. It will also try a couple of other file extensions, e.g., .so
(to load a compiled C extension on Linux), .dylib
(same on macOS), .jar
(to load a compiled extension on JRuby), and so on. In general, it is a good idea to let Ruby pick the most appropriate version, unless you have a specific reason to override this choice.
require_relative 'own_shuffle.rb'
should just be
require_relative 'own_shuffle'
Block comments were used by RD, a Ruby documentation tool that has been obsolete for over two decades. Prefer a series of line comments instead:
=begin
3
1
9
5
0
11
7
12
8
2
6
10
=end
should be
# 3
# 1
# 9
# 5
# 0
# 11
# 7
# 12
# 8
# 2
# 6
# 10
self
is the implicit receiver of any message send that does not explicitly specify a receiver. Therefore, it is not required to specify self
as the explicit receiver in most cases.
Therefore,
self.each do |item|
copy << item
end
should just be
each do |item|
copy << item
end
For conditionals whose body only contains a single expression, prefer the trailing modifier form.
Replace
if left != right
break
end
with
break if left != right
Linting
You should run some sort of linter or static analyzer on your code. Rubocop is a popular one, but there are others.
Rubocop was able to detect all of the style violations I pointed out above (plus some more), and also was able to autocorrect all the ones I listed.
Let me repeat that: everything I pointed out to you above, you can actually correct within milliseconds at the push of a button. I have set up my editor such that it automatically runs Rubocop with auto-fix as soon as I hit "save".
It is a good idea to set up your tools such that the linter is automatically run when you paste code, edit code, save code, commit code, or build your project, and that passing the linter is a criterium for your CI pipeline.
In my editor, I actually have multiple linters and static analyzers integrated so that they automatically always analyze my code, and also as much as possible automatically fix it while I am typing. This can sometimes be annoying (e.g. I get over 20 notices for your original code, lots of which are duplicates because several different tools report the same problem), but it is in general tremendously helpful. It can be overwhelming when you open a large piece of code for the first time and you get dozens or hundreds of notices, but if you start a new project, then you can write your code in a way that you never get a notice, and your code will usually be better for it.
However, even by simply hitting "Save", my editor applies a series of automatic fixes which brings the number of notices down substantially. Running Rubocop as described above, further reduces this, and as mentioned, lots of these are duplicates because I have multiple different linters and analyzers configured.
That was easy!
Note that all we did so far was either done automatically for us by the editor or Rubocop's auto-correct feature, or we were blindly following instructions such as renaming methods. We did not yet have to think at all.
This is one of the great things about using automatic code formatters, linters, and static analyzers: you don't have to think about all the simple stuff. Where do I put my parentheses? How do I indent my code? What are the naming conventions? All of that is taken care of by the tools, I don't have to worry about it and can instead focus on the important stuff: what should the code actually do?
Another advantage of following community style guides and using community tools is that, if my code looks like everybody else's code, and everybody else's code looks like my code, it is much easier to get someone else to help me with my code.
There's a linter called Standard Ruby, which is essentially a set of standard configurations for Rubocop that cannot be changed. This takes all the discussions and decisions out of the whole topic.
Vertical whitespace
I, personally, would separate the final return value from the rest of the method body by a blank line. You are actually doing that in your while
loop and your loop
block, but not in your method:
# …
end
# Return the shuffled copy.
copy
should be
# …
end
# Return the shuffled copy.
copy
This part also looks a bit squeezed:
# …
if left != right
break
end
end
# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp
I would insert a blank line in between the last end
and the # Swap elements
comment:
# …
if left != right
break
end
end
# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp
It is generally bad form to "monkey patch" (i.e. modify) existing modules and classes.
In this case, of course, you are rewriting an existing method as an exercise, so not all "general" rules apply.
IFF monkey patching is absolutely, positively, required, there are a couple of things we can do to make it nicer.
Imagine you are a maintenance programmer and you see a piece of unfamiliar code for the very first time which looks like this:
[1, 2, 3].own_shuffle
You are wondering where this own_shuffle
method that you've never heard of comes from, so you ask Ruby herself by using the Method#owner
method:
method = [1, 2, 3].method(:own_shuffle)
method.owner
#=> Array
Hmm … that's confusing. There is no method named own_shuffle
mentioned in the documentation of the Array
class.
Use a clearly named mixin to mark the extension
So, the first thing we can do is to wrap the method into a mixin whose name makes it clear that we are monkey patching a core class and that allows a reader to make an educated guess where to find the documentation:
module ::OwnShuffle
module ArrayExtension
def own_shuffle
copy = [] # Create a copy of the array.
each do |item|
copy << item
end
i = 0
left = 0
right = 0
while i < (copy.size * 5)
loop do # Make sure, that an element isn't overwritten with itself.
left = rand(copy.size)
right = rand(copy.size)
break if left != right
end
# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp
i += 1
end
# Return the shuffled copy.
copy
end
end
end
class ::Array
include ::OwnShuffle::ArrayExtension
end
Now, if we try the same thing as above, we get a more useful response:
method.owner
#=> OwnShuffle::ArrayExtension
And the reader can now "guess" that the documentation for this method, as well as its source code, can probably be found in a file named own_shuffle/array_extension.rb
within a library or Gem called own_shuffle
, at least assuming that the author followed standard Ruby naming and directly layout guidelines.
Note: we could have also used Method#source_location
method to find where the method is defined. However, this method does not always work, so our poor maintenance programmer may be forgiven for not trying it in the first place. In particular, on most Ruby implementations, Method#source_location
only works for methods that are defined in Ruby.
On the most popular Ruby implementation, YARV, the core Array
class is written in C, which means source_location
wouldn't work for methods defined in the Array
core class. So, a maintenance programmer might not even try to use source_location
on a method they find on an array.
Wrap the extension in a Refinement
Refinements are Ruby's way of controlling the scope of monkey patching. Refinements are lexically scoped, so the monkey patches are only visible within the lexical scope (script, module definition, or eval
string) where they are explicitly activated.
module ::OwnShuffle
module ArrayExtension
def own_shuffle
# …
end
end
refine ::Array do
import_methods ::OwnShuffle::ArrayExtension
end
end
This gives the user the option: they can either use the OwnShuffle::ArrayExtension
mixin to monkey patch Array
themselves, if they really want to make this monkey patch available globally. Alternatively, they can use the Refinement.
The nice thing about Refinements is that they are only visible if you explicitly activate them:
# Here, the method does not exist:
[1, 2, 3].own_shuffle
# in `<main>': undefined method `own_shuffle' for an instance of Array (NoMethodError)
# I need to explicitly activate the Refinement:
using ::OwnShuffle
[1, 2, 3].own_shuffle
#=> [2, 1, 3]
Above, I activated the Refinement in the script scope, so it will be visible in the entire script. However, I can also activate the Refinement in class or module scope:
class ::Foo
using ::OwnShuffle
def self.foo = [1, 2, 3].own_shuffle
end
class ::Bar
def self.bar = [1, 2, 3].own_shuffle
end
::Foo.foo
#=> true
::Bar.bar
# in `bar': undefined method `own_shuffle' for an instance of Array (NoMethodError)
Unnecessary use of temporary variable
Thanks to a feature in Ruby called multiple assignment
# Swap elements
temp = copy[left]
copy[left] = copy[right]
copy[right] = temp
can just be written as
copy[left], copy[right] = copy[right], copy[left]
This is the idiomatic way of writing a swap and will be immediately recognized by every Ruby programmer as a swap. This also makes the comment above superfluous.
Speaking of comments. The Ruby Style Guide starts its section on comments off with this gem:
No comments
Write self-documenting code and ignore the rest of this section. Seriously!
And I whole-heartedly agree!
If your code is so complicated that you think it needs a comment to explain it … make your code simpler instead so it doesn't need a comment!
Specifically, comments should never explain how the code does something. Because the code already explains how the code does something. That is, in fact, what code is: a specification for how to do something. Comments also should not explain what the code does. This should be obvious from good, intention-revealing names.
Comments should only ever explain why code does something in a certain, non-obvious, way. Again, ideally, code should be obvious, and I shouldn't need to explain why the code is the way it is – it should be obvious from the code. However, there are certain cases where the obvious way to do something is the wrong way to do it. And in that case, for example, it would make sense to write a comment explaining what the obvious way is, why it doesn't work, and why this particular, non-obvious way, is the correct way.
Here is an example that could benefit from a comment:
while i < (copy.size * 5)
Why 5
instead of 4
or 6
?
The most useless kind of comment is a comment that just repeats what the code already says.
For example, here:
# Return the shuffled copy.
copy
Every Rubyist knows that the last expression evaluated inside a method body is the return value of that method body. There is no reason to educate a reader about this fact.
Or here:
copy = [] # Create a copy of the array.
The name of the variable already says what the code does.
Constant expression in loop
This expression is evaluated over and over again in the loop:
copy.size
as well as
copy.size * 5
Both of these expressions are constant, they will never change. Therefore, there is no need to evaluate them multiple times:
copy_size = copy.size
iterations = copy_size * 5
while i < iterations
loop do # Make sure, that an element isn't overwritten with itself.
left = rand(copy_size)
right = rand(copy_size)
break if left != right
end
copy[left], copy[right] = copy[right], copy[left]
i += 1
end
This actually not so much about optimizing the code, since any half-smart compiler will perform this optimization anyway. The main point is that this makes the code more readable by
- giving a name to the
iterations
variable, clearly telling the reader what this expression is meant to do and
- making it clear to the reader that this expression is not changing, making the code easier to analyze.
Higher-level iterators
Ruby has many powerful iteration methods in its collections library. Using each
directly is almost always sub-optimal. This particular pattern that you are using:
- Initialize an accumulator. (In this case the array assigned to
copy
.)
- Iterate over the collection and add to the accumulator.
- Return the accumulator.
Is known as a Fold, and is available in Ruby in two forms, Enumerable#each_with_object
and Enumerable#inject
.
Enumerable#inject
is the more functional form, which is intended for a functional transformation of the code, and using it would look a bit like this:
copy = inject([]) { |result, item| result + [item] }
However, in this case, I think Enumerable#each_with_object
is slightly more readable:
copy = each_with_object([]) { |item, result| result << item}
Excursion: Generality of fold (inject
/ each_with_object
)
When I wrote above that you can rewrite this iteration with inject
or each_with_object
, that was actually a tautological statement. I didn't even have to read the code to make this statement.
It turns out that fold is "general". Every iteration over a collection can be expressed using fold. This means, if we were to delete every method from Enumerable
, except inject
, then we could re-implement the entire Enumerable
module again, using nothing but inject
. As long as we have inject
, we can do anything.
A better higher-level iterator
But actually, what you are doing here is simply transforming each element of the collection. You don't need something as powerful as a fold for that. This is a much simpler operation called Map, and it is also available in Ruby as Enumerable#map
:
copy = map(&:itself)
The right method
However, all of this is moot in the end, because there is no need for iterating over the array in the first place. (Almost) every object in Ruby responds to clone
, and Array
is no exception:
copy = clone
Excursion: Code Smells
The concept of Code Smells is often misunderstood. A Code Smell is something that should be investigated. However, it is not necessarily something bad that needs fixing.
The smell metaphor is used here just like in the real world: a smell might be a leaky toilet or a Swiss cheese.
Feature Envy
Your code has a Code Smell called Feature Envy. Feature Envy is when one object calls another object's methods much more that its own.
In your case, you almost exclusively call methods on copy
rather than self
. This implies that this part of the code "wants to live" inside copy
instead of self
.
In this case, self
and copy
are both instances of Array
, so both methods would still end up in the same class. One way would be to extract just the part which deals with copy
into a helper method with protected
visibility.
Something like
module ::OwnShuffle
module ArrayExtension
def own_shuffle = clone.own_shuffle_helper
protected
def own_shuffle_helper
raise FrozenError if frozen?
i = 0
left = 0
right = 0
array_size = size
iterations = size * 5
while i < iterations
loop do # Make sure, that an element isn't overwritten with itself.
left = rand(array_size)
right = rand(array_size)
break if left != right
end
self[left], self[right] = self[right], self[left]
i += 1
end
self
end
end
refine ::Array do
import_methods ::OwnShuffle::ArrayExtension
end
end
Basically, we now have one method own_shuffle_helper
which performs the shuffle in-place and another method own_shuffle
which returns a new shuffled array. It turns out that the Ruby core library has those exact same two methods: Array#shuffle!
and Array#shuffle
.
So, we can just rename own_shuffle_helper
to shuffle!
and make it public
.
This is actually a common implementation technique for when there is a pair of methods where one is in-place and the other returns a modified copy. They are typically implemented either this way:
class Foo
def bar!
# do the work
end
def bar = clone.bar!
end
Or the other way around:
class Foo
def bar
# do the work
end
def bar! = replace(bar)
end
Testing
There is no automated testing in your code. Apart from the single example in your second code block (which is not automated), there is no testing at all.
You should always strive to have as close to 100% test coverage as possible. It doesn't really matter if you have unit tests, functional tests, integration tests, end-to-end tests, or a mix of them, but you should have tests, and they should be automated.
In this particular case, since you are implementing a Ruby core method, there are already plenty of tests written for you in the Ruby/Spec project as well as the YARV test suite.
Running the Ruby/Spec tests against your code yields 12 errors, 6 failures, and only 3 passing tests.
The YARV test suite has 0% passed tests.
The main reason for all these test failures and errors is twofold:
- The test suites for
Array#shuffle
also contain tests for Array#shuffle!
. (We have already fixed that part.)
- You are not implementing the API of
Array#shuffle
correctly. It takes an optional keyword argument random
which is the random number generator to be used for shuffling.
One possibility is to remove all the tests which test for this behavior. This can be legitimate if your intention is to only replicate part of the behavior of Array#shuffle
.
If we remove all the tests which use this argument, then your code does indeed pass all tests! However, there is a slight wrinkle here: after deleting all the tests which use the random
argument, there are barely any tests left over.
Providing a simple, quick and dirty implementation that lets us run a few more of the tests is relatively straightforward:
module ::OwnShuffle
module ArrayExtension
def own_shuffle(...) = clone.own_shuffle!(...)
def own_shuffle!(random: ::Random)
i = 0
left = 0
right = 0
array_size = size
iterations = size * 5
while i < iterations
loop do # Make sure, that an element isn't overwritten with itself.
left = random.rand(array_size)
right = random.rand(array_size)
break if left != right
end
self[left], self[right] = self[right], self[left]
i += 1
end
self
end
end
refine ::Array do
import_methods ::OwnShuffle::ArrayExtension
end
end
And when we run this, we notice … we run into an infinite loop.
Which brings me to …
Bug #1 – potential arbitrarily long loop
This loop:
loop do # Make sure, that an element isn't overwritten with itself.
left = rand(array_size)
right = rand(array_size)
break if left != right
end
Is not guaranteed to terminate quickly. It can take an arbitrary amount of time to finish, although with smaller and smaller probability.
There is a 1/array_size
chance that left
and right
will be the same in the first try. There is a 1/(array_size
²) chance that left
and right
will be the same in tries #1 and #2. And so on. The chance that left
and right
will be the same N times in a row gets smaller and smaller as N gets bigger, but crucially, it never gets to 0. So, there is always a non-zero chance that this loop will have to try for a very long time.
Bug #2 – potential infinite loop
In fact, it is slightly worse: left
and right
are not true random numbers. They are pseudorandom. They are generated by a pseudorandom number generator (PRNG). A PRNG is a computer program. Like all computer programs, a PRNG is deterministic: given the same starting state, it will produce the same sequence of pseudorandom numbers.
Since the number of possible states a computer can be in is finite, this means a PRNG cannot produce arbitrarily many pseudorandom numbers. At some point, it will be in a state it has been in before and produce the same sequence of numbers it produced before. This is called a cycle, and is unavoidable: every PRNG must have cycles because computers cannot have infinitely many states.
Therefore, it is possible that there exists some starting state such that the PRNG produces a sequence of random numbers such that left
will always equal right
.
Bug #3 – definite infinite loop
For an array of size 1, your code will run into an infinite loop: an array of size 1 only has one index, so your code will loop infinitely waiting for left
and right
to be different … which is impossible because there is only one index and thus left
and right
must be the same index.
Bug #4 – bias
Your code is biased. The shufflings should follow a uniform random distribution, but that is not the case. The easiest way to demonstrate this, is with a two-element array.
If I call
[23, 42].own_shuffle
I should get the result [23, 42]
50% of the time and the result [42, 23]
50% of the time. Instead, I get [23, 42]
100% of the time.
The reason for this is that left
can never equal right
. As a result, the two elements will always be swapped. This happens 10 times (array size times 5), which is an even number, so after an even number of swaps, we end up with the original order.
There are similar biases with other array sizes. For example, [1, 2, 3].own_shuffle
should return [1, 2, 3]
1/6th of the time, but, in fact, never returns [1, 2, 3]
.
The good news is that all four of these bugs can easily be fixed by just deleting the loop:
module ::OwnShuffle
module ArrayExtension
def own_shuffle(...) = clone.own_shuffle!(...)
def own_shuffle!(random: ::Random)
i = 0
left = 0
right = 0
array_size = size
iterations = size * 5
while i < iterations
left = random.rand(array_size)
right = random.rand(array_size)
self[left], self[right] = self[right], self[left]
i += 1
end
self
end
end
refine ::Array do
import_methods ::OwnShuffle::ArrayExtension
end
end
This gets most of the tests to pass. There are still some failures and errors, but those are mostly concerned with testing pathological cases like "what happens if random
does not return an Integer
" or "what happens if random
returns a number outside of the range we asked for?"
A well-known algorithm
However, I am still not convinced that the algorithm is actually bias-free. But I'm also not smart enough to check that. Which is why I would rely on implementing a well-known algorithm that has been proven correct, such as the (modified) Fisher-Yates shuffle.
It should be fairly easy to implement it in Ruby, something like this:
module ::OwnShuffle
module ArrayExtension
def own_shuffle(...) = clone.own_shuffle!(...)
def own_shuffle!(random: ::Random)
highest_index = size - 1
(0...highest_index).each do |i|
j = random.rand(i..highest_index)
self[i], self[j] = self[j], self[i]
end
self
end
end
refine ::Array do
import_methods ::OwnShuffle::ArrayExtension
end
end
This passes almost all tests of both the Ruby/Spec suite and the YARV test suite. The only failing tests fall into four categories:
- calling
shuffle!
on a frozen array should raise
a FrozenError
- highly technical ones that test for pathological random number generators
- some that test for 100% identical behavior with the C implementation of
Array#shuffle
in YARV
- one test that tests that calling
shuffle
on a subclass of Array
returns an Array
It would be possible to get those tests to pass as well, but it will mostly involve adding lots of checks and raising
the appropriate errors or performing the appropriate conversions. But that will just make the code more ugly and obscure the algorithm.
For example, here I have added a few checks which silence some (but not all) of the errors and failures:
def own_shuffle!(random: ::Random)
raise(::FrozenError) if frozen?
highest_index = size - 1
(0...highest_index).each do |i|
j = random.rand(i..highest_index).to_int
raise(::RangeError) if j.negative? || j > highest_index
self[i], self[j] = self[j], self[i]
end
self
end
I added a check for a frozen array, I convert the return value of rand
to an Integer
, and I check for various out-of-range conditions on the return value of rand
. All of these checks are part of the test suites, but, to be honest, I am not happy about at least some of them.
We explicitly asked rand
for a number in a specific range, so we shouldn't need to check for a number outside of that range. That would indicate a bug in rand
, not in our code.