Code Speed
#Code Speed
I recently checked out the Ruby Monday google hangout run by Code Newbies. If you’re new to development, or just to a language you are trying to learn, I suggest giving it a try.
In the hangout, the topic of optimizing code for speed came up, and I mentioned that there are some basic practices that Ruby developers can use to help produce faster code. I’m not an expert on this topic, but I can suggest a few basics I’ve learned from some of my old homework assignments.
##The long version
One area that can be particularly illustrative is when you are manipulating a large amount of data.
For example, let’s assume we want to write a program that takes a string of characters, and returns every possible way to sort those characters into something that is recognized by a database of dictionary words.
Fortunately, we can do that by utilizing the dictionary on our computer.
If we don’t care at all about how long it takes our code to run, it’ll look something like this…
class Reader
def initialize(statement)
@statement = statement.chars
end
attr_reader :statement
def permutations
statement.permutation.to_a.uniq.map do |array|
array.join
end
end
end
class Scanner
def initialize(statement, filename = "/usr/share/dict/words")
@statement = statement
@filename = filename
end
attr_reader :statement, :filename
def run
list = Reader.new(statement).permutations
hits = Array.new
list.each do |word|
File.foreach(filename) do |line|
if line.strip == word
hits << word
end
end
end
end
end
statment = ARGV.last
puts Scanner.new(statment).run
This program can get incredibly bogged down quite easily. The main reason is because of our run
method.
What it is doing is taking every permutation of the word we feed it and checking it against the entire dictionary, one line at a time. That can be relatively quick for very short words, like ‘rat’.
In my build, it takes the computer a not-great .231 seconds to figure out that ‘rat’, ‘art’ and ‘tar’ are all in the dictionary.
But when I run the code for a longer word, like ‘desserts’, it took my system 1 minute and 52.929 seconds to return ‘desserts’ and ‘stressed’ as valid words. This isn’t going to make our users very happy. This is 2015. We can’t have programs this simple taking that long to run.
##A little shorter
What if we refactored that run
method a little bit? Instead of having it check the entire dictionary every time we wanted to see if a permuation was valid, what if we only checked the dictionary once, but checked each word in the dictionary against our array of permuations?
Checking a massive dictionary against a relatively small array is going to be a lot faster.
Our new run
method might look something like this…
def run
list = Reader.new(statement).permutations
hits = Array.new
File.foreach(filename) do |word|
if list.include?(word.strip)
hits << word
end
end
return hits
end
When I swap out those methods and try ‘rat’ I get a runtime of .073 seconds. Much faster than ‘rat’ in our original version. But where the real advantage comes into play is with ‘desserts’. The new version now takes 4.432 seconds. That’s a serious improvement.
##Even faster
But what if we wanted to further improve our run-time? To get signficant improvements over what we already have, we are going to have to start making some serious changes to our code.
What if we created a new database out of the dictionary that was optimized for speed?
When we ran the program, we could check if such a database existed. If not, we could create it and store it in our repository. That wouldn’t be very fast on that initial run, but every time we ran the program after that, we should get some serious speed improvements.
Here is our new version.
DATABASE = "database.txt"
class Builder
def initialize
@database = DATABASE
end
attr_reader :database
def create_database
File.open(database, "w") do |f|
import.each do |key, value|
f.puts "#{key}=>#{value.join("|")}"
end
end
end
def import(filename = "/usr/share/dict/words")
dictionary = Hash.new
File.foreach(filename) do |word|
word.strip!
normalized = word.chars.sort.join
dictionary[normalized] ||= Array.new
dictionary[normalized] << word
end
return dictionary
end
end
class Searcher
def initialize(user_input)
@user_input = user_input.chars.sort.join
end
attr_reader :user_input
def descramble
if File.exist?(DATABASE)
find_in_database
else
Builder.new.create_database
find_in_database
end
end
def find_in_database
File.foreach(DATABASE) do |line|
entry = line.split("=>")
if @user_input == entry.first
return entry.last.split("|")
end
end
end
end
user_input = ARGV.last
check = Searcher.new(user_input)
puts check.descramble
So, what is this code doing exactly?
Like the others, it takes the user’s input when the program is run, but this time when the user’s input is passed to the Searcher
object, the characters are sorted alpha-numerically and then joined back into a new string.
When the program prints to the command line, it calls Seacher.descramble
. This method first checks to see if the database file exists.
Let’s assume, for the sake of argument, that this database doesn’t exist (It won’t the first time you run this program). If not, then it runs the Builder
and calls the create_database
method on it. This method writes to ‘/database.txt’ using a very special format.
The create_database
method uses the import
method to read every entry in the computer’s dictionary file and creates a new dictionary
variable of its own. This new dictionary is a hash. The keys of the hash are every entry of the dictionary sorted alpha-numerically, and the values are an array of the unsorted entries. So, every time the import
method spots a new key that hasn’t been created yet, it does so and adds the corresponding value. If the key has already been created, then it just appends the value to the existing value array. Once our hash has been constructed, it is printed to ‘/database.txt’.
Now that the database is built, what you have is a much shorter dictionary of key=>array
pairs.
The Searcher.descramble
then proceeds from where it left off by calling find_in_database
. It opens your new database and looks to see if there is a key that matches the user_input
. If it finds one, then it prints to the command line the contents of the corresponding array.
##Checking final speed
If there is not a ‘/database.txt’ file in our directory, and we run the code with ‘rat’, it takes my system .756 seconds to build the new database and check it for our matches.
That’s not very fast, but if we run it a second time with our new optimized directory, it takes about .050 seconds. That’s pretty good.
When we try ‘desserts’ with the built directory, it takes almost exactly the same amount of time. With our new database, the size of the word we are checking for is no longer a serious limiting factor. The number of permutations are effectively irrelevant, because we are only searching for the sorted key value.
##Conclusions
So what have we learned? Here are a few key takeaways.
First, try to limit the number of times your program calls upon a database. Every time it has to load up a database and iterate through each line, we’re wasting precious resources.
Second, if possible, optimize the data for search. There are a number of ways to do this, and developers should be keeping an eye out for opportunities. If you have an app that has to build a search-optimized database when it boots, that’s going to add some time to the initial load, but it might save a ton of runtime in actual use if the database is being called repeatedly.
This post is just touching the surface of what can be done. I focused on database interaction, because it is such an obvious example. If you want a much more detailed look at writing faster Ruby, I suggest checking out what the Pragmatic Programmers have on offer. I haven’t read this particular book…yet, but my experience is that the Prag Prog publishers have some of the best material available.
##Upcoming
There are a lot of ways to optimize speed, and Rubyists should always be on the lookout for relatively painless opportunites to improve.
One example that came up in the google hangout Monday night was search-optimizing algorithms. Algorithms, such as binary search, when utilized in the appropriate situations, can improve the time it takes to search a large database by orders of magnitude. Some blog post in the near future, I’ll focus on binary search in particular, focusing on what situations it can be used, and why it is so much faster, on average, than a linear search.
I’m also going to do a post on character encoding in Ruby. What is it, and why is it important?