XST: a Ruby Experiment
Getting Started
It’s probably time to see whether any of the Ruby IDEs out there are worth using, as the state of the art has advanced a lot since last I worked with the language. The language itself is up to version 1.8.2, and the marvelous Pickaxe book is twice the size of the old one. And my copy is autographed by Dave Thomas himself!
For now, I’ll continue to work in TextPad. I have a macro defined that allows me to require in project files, all defined in a conventional target file, “project.rb”, which the macro compiles and runs. The starting file looks like this for this experiment:
require 'test/unit' require'restricttest.rb'
My initial test is always the same, one that fails:
class TC_MyTest < Test::Unit::TestCase def test_fail assert(false, 'Assertion was false.') end end
Results are:
Loaded suite project Started F Finished in 0.09 seconds. 1) Failure: test_fail(TC_MyTest) [./restricttest.rb:4]: Assertion was false. <false> is not true. 1 tests, 1 assertions, 1 failures, 0 errors Tool completed with exit code 1
That’s good news … for now. Let’s move on.
My mission here will be to test a restrict operator on some kind of packed byte structure, which I expect to be a string. I expect to [re]learn a lot about Ruby before this is over. Ruby programmers: if you see me doing something really dumb, please drop me a note. Don’t forget to put [ron] in as part of the subject line, to get through my spam filters.
For my first test, I think I’ll just create an extended set containing a packed up string with a couple of records in it. If that turns out to be too hard, I’ll back off and try something simpler.
def test_restrict input = XSet.new(4, "123 234 132 342 ") select = XSet.new(1,"1") expected = "123 132 " result = input.restrict(select) assert_equal(expected,result.contents) end
OK. I am, of course, making this up. The new extended set class is XSet, and I’m supposing that the new method takes a record length and packed record. That will surely change, but it should be enough for now. Let’s fake it, with intention to fail rather than succeed. After a couple of tries relearning a little Ruby syntax, I get this class and associated expected failure:
class XSet def initialize(recordLength, contents) end def restrict(anXset) XSet.new(0,nil) end def contents "wrong" end end Loaded suite project Started F Finished in 0.11 seconds. 1) Failure: test_restrict(TC_MyTest) [./restricttest.rb:8]: <"123 132 "> expected but was <"wrong">. 1 tests, 1 assertions, 1 failures, 0 errors Tool completed with exit code 1
This permits me to code. I’ll add a few lines of code but keep the test broken.
class XSet def initialize(recordLength, contents) @recordLength = recordLength @contents = contents end def restrict(anXset) XSet.new(11, "still wrong") end def contents @contents end end
This fails as expected: still wrong. Now, with your kind permission, I’m going to try to program restrict by intention. This could be too big a bite, but I think I know what I’m doing. (Remember the redneck’s last words: “Watch this!”)
Yes, well. That didn’t work. I had to code a lot. Thought I was doing OK, but the test fails. Here’s what I wrote:
def restrict(anXset) resultContents = "" for myBase in recordRange for hisBase in anXset.recordRange if match(record(myBase), anXset.record(hisBase)) result << record(myBase) end end end XSet.new(@recordLength, resultContents) end def recordRange 0..cardinality-1 end def cardinality @contents.length / @recordLength end def record(base) @contents[base..base+@recordLength-1] end def match(record1, record2) record1==record2 end
Fortunately, years of fouling up in public make me fairly willing to share my learning. Let’s write some tests against the little methods, to begin with.
def test_cardinality input = XSet.new(4, "123 234 132 342 abc ") assert_equal(5, input.cardinality) end
That one passes. That makes me think that recordRange should work, but I’m feeling less than confident, so I’ll check it.
Wait! I see the bug. In coding this up so nicely, I haven’t taken the record length into account. The recordRange is just an index from 0 to cardinality-1. I could fix it, but I need to learn from my mistakes, so I’m going to fill in some needed tests first.
def test_recordRange input = XSet.new(4, "123 234 132 342 abc ") assert_equal(0..4, input.recordRange) end
This passes also. I was sure it would. Now let me look at the code and see if it’s OK to fix the problem. Look at these methods:
def restrict(anXset) resultContents = "" for myBase in recordRange for hisBase in anXset.recordRange if match(record(myBase), anXset.record(hisBase)) result << record(myBase) end end end XSet.new(@recordLength, resultContents) end def record(base) @contents[base..base+@recordLength-1] end
I rewrote the restrict part way through. I had taken a break from a lovely Sunday breakfast and viewing of Sunday Morning, and was glancing through the Pickaxe book. That set me on a different way of expressing the algorithm for restrict. But I didn’t erase it and start over, I edited it. I left the name myBase in, intending it to mean the base byte index of a record in the set. But it’s not … it’s the raw index, 0-n. It needs to be multiplied by record length to be correct.
My programming by intention threw me off, through my choice of a bad name. Let’s fix it. Here’s what I wrote, but it still doesn’t work:
def restrict(anXset) resultContents = "" for myRecordIndex in recordRange for hisRecordIndex in anXset.recordRange if match(record(myRecordIndex), anXset.record(hisRecordIndex)) result << record(myRecordIndex) end end end XSet.new(@recordLength, resultContents) end def record(recordIndex) @contents[recordIndex*@recordLength..recordIndex*@recordLength+@recordLength-1] end
Well, I have to wonder whether record is working as I expect, so I’ll write a test for that:
def test_record input = XSet.new(4, "123 234 132 342 abc ") assert_equal("132 ", input.record(2)) end
That works! So I can see two more possibilities. Either == doesn’t compare two strings for equality the way I think it does, or the << operator doesn’t work the way I think it does. (Or I’m even more confused.) I think I’m getting decent tests in place, so I’m not going to back this all out. But I probably should. Maybe next, a test for “match”, which looks like this:
def match(record1, record2) record1==record2 end
Well, duh, that’s obviously not right. The two records, from the input set and the selection set, aren’t even the same SIZE!! The match method is flat wrong. What we want is that they match up to the length of the second record. Let me write a test: obviously I need more discipline here, and should have gone in smaller bites. Still, each test is taking me further, so I’m going to press on.
def test_match input = XSet.new(4, "123 234 132 342 ") select = XSet.new(1,"1") assert(input.match(input.record(2), select.record(0))) end
This fails, of course, and now I’ll make it pass:
def match(record1, record2) record1[0..record2.length-1]==record2 end
That finds the error in the restrict method, where I started the string with the name resultContents, and then started calling it result. Grr.
def restrict(anXset)
resultContents = ""
for myRecordIndex in recordRange
for hisRecordIndex in anXset.recordRange
if match(record(myRecordIndex), anXset.record(hisRecordIndex))
resultContents << record(myRecordIndex)
end
end
end
XSet.new(@recordLength, resultContents)
end
Whew! All the tests run, just as I expected. The initial implementation of the restrict method works. Time for some cleanup – there are things I don’t like. First, let’s look at the XSet code:
class XSet def initialize(recordLength, contents) @recordLength = recordLength @contents = contents end def restrict(anXset) resultContents = "" for myRecordIndex in recordRange for hisRecordIndex in anXset.recordRange if match(record(myRecordIndex), anXset.record(hisRecordIndex)) resultContents << record(myRecordIndex) end end end XSet.new(@recordLength, resultContents) end def record(recordIndex) @contents[recordIndex*@recordLength..recordIndex*@recordLength+@recordLength-1] end def recordRange 0..cardinality-1 end def cardinality @contents.length / @recordLength end def match(record1, record2) record1[0..record2.length-1]==record2 end def contents @contents end end
What’s not to like? There are a lot of “-1” adjustments in there, which are of course inevitable when you’re looping from zero to something. But I’d like to get rid of them. And I’m not entirely happy with using the range option to pick up the bytes from the records, in match, because that will create a lot of temporary strings. Match should be changed to work inside the contents buffers. And we need to keep it fast, since that’s the main point of this kind of operation. Finally, there is some other arithmetic that we can reduce by having the set cache things like its cardinality and such1
Red, Green, Refactor
Fist thing I want to fix is the match operator, and the fact that it runs on slices. I’ll begin by passing in the basic parameters, the two record numbers, and the second set. That requires me to change one of the tests. The complete changes are:
def test_match input = XSet.new(4, "123 234 132 342 ") select = XSet.new(1,"1") assert(input.match(2, select, 0)) end def restrict(anXset) resultContents = "" for myRecordIndex in recordRange for hisRecordIndex in anXset.recordRange if match(myRecordIndex, anXset, hisRecordIndex) resultContents << record(myRecordIndex) end end end XSet.new(@recordLength, resultContents) end def match(myRecordIndex, anXset, hisRecordIndex) record1 = record(myRecordIndex) record2 = anXset.record(hisRecordIndex) record1[0..record2.length-1]==record2 end
The tests still run. I haven’t resolved my issue, which is the copying of the records, but now it’s all centralized in this one method. So let’s digress.
I’m doing something here that I usually recommend against: I’m optimizing code for which I have no proof of needing an optimization. So maybe I’m doing something bad. On the other hand, the point of these operations is to be as fast as possible, and I also need to learn more about the best way to build and access the underlying bytes of the sets in question. On those grounds, I’m granting myself permission to push this a bit further. On the other hand, if I were pairing, my pair might hold me back. Premature optimization is definitely bad. Still, I’m going ahead, and I expect it to take less time to do it than it took to write this paragraph.
Wrong again! It took longer than it should have, perhaps ten minutes. I didn’t even set out to get done, just to grab the records from the set contents directly. I’ll show the code so far, but then I need to take a break: I’m making too many mistakes.
def match(myRecordIndex, anXset, hisRecordIndex) hisContents = anXset.contents hisBase = anXset.recordBase(hisRecordIndex) myBase = recordBase(myRecordIndex) length = anXset.recordLength - 1 record1 = @contents[myBase..myBase+length] record2 = hisContents[hisBase..hisBase+length] record1==record2 end def recordBase(recordIndex) recordIndex*@recordLength end
All that’s happening here is that I’m grabbing the contents buffer for the input set, calculating the offsets and lengths, then snarfing the records and comparing them. The next step will be to write the compare loop out longhand, so it doesn’t create any objects …
Or will it??? My plan is that I’m going to write, somewhere, something that compares two bytes from the strings, like
buf1[i]==buf2[j]
That’s going to create two strings of length 1, isn’t it? Odds are, that’s not more efficient, and it might be less, depending on what string == really does. If it’s a decent primitive, it should go faster. I could go ahead and test both these approaches but I’m sure that in Ruby, this isn’t an improvement. (In C#, it might be, because characters are value types and don’t have to be created as objects. We’ll leave that concern for another day.)So, what have I learned? Primarily, I’ve learned to listen to my own advice. “Optimizing” this method, at least this way, isn’t a good idea. I probably can’t beat the record-slicing approach I took initially. The case isn’t proven, so I could be wrong, but it seems clear that I’m way into the roundoff in any case.
See? You should listen to me. Even I should listen to me! I’m going to take a break, then revert this code, then look around to see what else needs “improvement”.
Back to a Decent State
Here’s where I reverted to, plus a tiny bit of renaming cleanup I still want to clean up a bit of that -1 code, if nothing else. I’m reminded that Ruby will help me with that! I’ve been using the “..” operator, which includes the end point. I can use “…”, which does not. Hold on, I’m going to put that in before I show you the code.
class TC_MyTest < Test::Unit::TestCase def test_cardinality input = XSet.new(4, "123 234 132 342 abc ") assert_equal(5, input.cardinality) end def test_recordRange input = XSet.new(4, "123 234 132 342 abc ") assert_equal(0...5, input.recordRange) end def test_record input = XSet.new(4, "123 234 132 342 abc ") assert_equal("132 ", input.record(2)) end def test_match input = XSet.new(4, "123 234 132 342 ") select = XSet.new(1,"1") assert(input.match(2, select, 0)) end def test_restrict input = XSet.new(4, "123 234 132 342 ") select = XSet.new(1,"1") expected = "123 132 " result = input.restrict(select) assert_equal(expected,result.contents) end end class XSet attr :recordLength attr_reader :contents def initialize(recordLength, contents) @recordLength = recordLength @contents = contents end def restrict(aSet) resultContents = "" for myRecordIndex in recordRange for hisRecordIndex in aSet.recordRange if match(myRecordIndex, aSet, hisRecordIndex) resultContents << record(myRecordIndex) end end end XSet.new(@recordLength, resultContents) end def match(myRecordIndex, aSet, hisRecordIndex) record1 = self.record(myRecordIndex) record2 = aSet.record(hisRecordIndex) record1[0...record2.length]==record2 end def record(recordIndex) @contents[recordIndex*@recordLength...recordIndex*@recordLength+@recordLength] end def recordRange 0...cardinality end def cardinality @contents.length / @recordLength end end
Wrapup for Now
The tests are green, and the code, while far from great, is fairly decent. It’s been a long off-and-on day, time to wrap this up and hang with Ricia. Next time, I think a bit more cleanup, probably focused on the asymmetry of the way match gets its own record implicitly and the other set’s explicitly. I think I’d rather pass both records in, though that will make match reference no member variables of the class. Anyway, that’s for another day. Let’s put this one to bed.
- The word “cardinality”, by the way, is set-guy talk for “number of items in the set”. I was trained to use these words, and will probably use them in the code. I’ll try to remember to define them.