Faithful readers will recall that, as a proof that estimation is difficult if not evil, I’ve undertaken to move to This has taken longer than anticipated (where “anticipated” is something less than “estimated” but more like “guessed”).

The events have given me a greater – although surely still very weak – understanding of “The Mangle of Practice”, Andrew Pickering’s notion that the actor and the work are working on each other. That is, the work forms the worker as the worker forms the work. What this means to me is “there’s always another bloody thing”.

The Scraper

The “Scraper” is a Ruby program we’re writing that rips apart a WordPress export and splits it out into a static website that a human might be able to understand and maintain. Some of the stories for Scraper include:

  • Pull out each article and page and put it in its own folder, under some base folder on the site;
  • Leave articles in a form such that they’ll look decent within the new site;
  • Change the links in the article that link within the old site to link to assets on the new site, corresponding to assets on the old site;
  • Convert articles from raw HTML to Markdown (Kramdown), to make editing them easier in the future;
  • Convert things such that the Jekyll site generator can generate them.

Truth be told, that’s about all it has to do. There are many details, some of which we brought on ourselves.

For example, the pattern for the new site is that assets (pictures, whatever) will be kept in the article that uses them. If I put a picture in this article, the file for that picture will be in the same folder as the article file. But WordPress assets tend to be in centralized locations. In the case of my site, there are two main folders, one called images and one called uploads, where things tend to be.

So we have a decision to make: preserve the old structure for the old articles, which will mean less file moving and somewhat simpler link updates, or go to the new structure, which may be easier to maintain, will be more consistent, and will require more complex file copying and link updates.

You’d think we’d just decide. Somehow that isn’t what happened, and we found ourselves earlier this week working on a version of Scraper that would be able to do these things both ways, controlled, G_d save us, by flags. Well, we hadn’t worked on Scraper since December 10, so perhaps we can be forgiven for going off track for a morning. We decided to roll back our work to the December 10 version (which isn’t as bad as it seems since there were no versions between then and January 19th or whatever day it was).

Nonetheless, I found it very frustrating because it seemed that Tozier and I were on entirely different pages if not different planets. I couldn’t understand why, and I still don’t. Still, we figured out what to do and started Thursday and Friday.


We spent a lot of time talking about what we were going to do, how we might do it, and why. We reached some agreements on the objectives, and definitely agreed on the approach of having lots more tests. Let me digress:

Hard to test, not much value?

You can think of Scraper as doing two main things. It grabs each article and puts it into a suitable folder, and it then does a raft of regex-global replaces on the article to get it properly linked in the new location. There are details but they’re not important.

In the beginning we were just exploring the rats-nest that is a WordPress export, observing things and testing how to fix them. “OK, here’s a link to a picture. If we leave it where it is we just have to either change ‘’ to ‘’ or remove the site link and just use slash.”

We’d then write a little scan or gsub, add it to our program, and do some more learning. We debugged each replacement until it worked, but we didn’t write tests for it, because we (nearly) couldn’t. Why? Because an article is a huge nest of XML/HTML and we didn’t have and couldn’t see an object of our own to do things.

I blame myself

I know better. We were writing a “Scraper” object, which was a big outer loop that rips the guts out of the WordPress XML and runs gsubs on it until it submits, then writes it out to a file. Come on! An object named “Scraper” is almost as bad as one named “Manager”. Ripping the guts out of a dumb object and pushing other guts back into it is absolutely counter to every object-oriented thing I know.

What you do in that situation is anything at all that gets you an object to talk to. Call it Article. Call it anything. Push all the behavior into it. Observe what the behavior is about. Separate out behavior that’s about something else. Use the behavior to tell you what the objects’ names should really be.

We didn’t do that. We were discovering, and when discovering, I am inclined to tolerate weird procedural code that does stuff and prints stuff. Perhaps I am too tolerant.

We also broke the Fundamental Law of Spikes, which I just made up, namely: “THROW YOUR SPIKES AWAY”. Instead, we built on our experiments, because they were successful and useful.

Anyway we built a small mound of crap, and it wasn’t tested and wasn’t testable. We knew what we were doing. We went on a little too long, but we’re talking very few working days, although quite a few elapsed days.

So let’s stop

Right. We agree that we need a better structured thing, and that these buzz-saws running over the articles need some testing. This intuition was reinforced when we found one buzz-saw that did the wrong thing in an obscure situation that could never occur but in fact occurred all the time.

So we began anew.

Difficult Pairing

Over the past week or so, Tozier and I had begun to write rspec specs expressing things we thought we had to do. Mine were mostly about priorities:

  describe "musts" do
    it "should put each article in a folder xprog/slug-name" do

I had lots of specifics, mostly listed under prioritizing headings.

Tozier got into that a bit, editing my rspec to include some items we should discuss. All good stuff.

So we started to pair on my list. Here’s the first test we worked on:

    describe "homes for articles" do
      it "should put articles into folders directly under /xprog and named by slug name" do
        factory ='current.xml')
        article = factory.get_article("dbcsimpledesign")
        expect(article.storage_path('xprog')).to eq('xprog/dbcsimpledesign')

As I recall things – which is subject to being mistaken – it took us a long time to agree on that test. We already had the object WpPost (WordPressPost, sorry for the bad name), which I had written over the weekend, based on some tests Tozier had done earlier. Basically I just extracted out an object that wrapped the XML thing we had, with a few accessors and very little behavior. I had finally realized we needed a place to stand to do our work.

So to me, since a major priority is for the articles to go to the right folder, an article should know which folder to go to. So the guts of that test is to ask the article where it plans to store itself and determine that it has made the right decision based on the article slug. (It’s not just copying the input string to the factory get, it’s computing it based on the article’s slug.)

There’s some fiddling and refactoring not shown, of course.

The “big” question at first was how we could get something testable. The WpPost object wraps some hideous thing that comes back from nokogiri, with all kinds of XML-fetching methods not unlike jquery or such. No real brains, it just rips and stitches up XML. So we talked at what seems like “length” about how to create objects we could write sensible tests for. We can’t really create a legal WpPost, because it’s some horrendously complex XML thing, and so if we created one there’s no reason to believe it would be like a real article. I was inclined to posit an Article class, that the WpPost would have, and that the Article would contain the article text (which would still be XML/HTML but less structured) and that the Article would know how to transform itself.

Tozier objected that there was no test that required an Article (if I am fairly characterizing his position) and asked how we could justify creating a thing for which there was no need. Was that not speculative? Were we not doing TDD? Did TDD not require us to write only code that is called for by a failing test?

Those things are all true. It is speculative, TDD was what we were doing, TDD says we can’t create an object not called for by a failing test. And my internal response, doubtless spilling out into visible behavior was “so bloody what?”.

TDD is my servant, not my master

When I use TDD, I use it to build what I want to build. I let the tests and the building tell me what I should want to build. But I decide: TDD does not decide.

If, of a morning, I want an object called Article that does something, I’ll write a test that says something like

  describe Article do
    it "should know its name" do
      article ="Difficult Pairing")
      expect( eq "Difficult Pairing"

Voila! My tests require me to create Article. Letter and spirit of the law are followed. Carry on making Article do something I want.

Now of course, outside this test, there’s no reference in the system to Article. There’s no reason to have written that test. I have in mind that it’ll be useful inside WpPost, but right now, there’s no test on WpPost that requires it. Tozier objects to this.

At lunch, I said “I get it, you want to do TDD top down”. Tozier said no, and went on at some length, after which I said “I get it, you want to do TDD outside-in”, with which he agreed.

Beginning to have a model

Along the way, we spoke of end-to-end, top-to-bottom slices. This begins to help me understand other things Tozier wanted. For example, he wanted a test that actually wrote a file out to the path shown in the test above. I would almost never write such a test. One reason is my not very well known Rule 38: Trust your I/O, and another is that of all the things I’m uncertain about in this application, writing the file out is not one … particularly because we already have a running spike that puts the files where we want them. I’m quite sure that I can figure out how to open a file in the right folder and write stuff on it. So when the object knows where to go, I’m confident that I can make it go there when the time is right.

Why isn’t the current time “right”? Tozier thinks it is. In fact, I think he wouldn’t have written the test to check that the WpPost knew the path until he had a test failing that was trying to get it to write to the path, which would have required it to know the path, etc etc.

It’s something about Acceptance Tests, ATDD, and perhaps the new BDD, which focuses on what look like acceptance tests to me.

Tozier argues that in the context of some overarching big test, we always know where we’re trying to go. I agree with that. However, if we really did that, we’d always have at least two tests failing, the one we were really working on, and the one that was failing for the larger reason. His thought is, yes, well, when the little one works, we just look at the big one and it’ll tell us what’s wrong and we’ll know what to work on next.

I don’t buy it. First of all, I don’t want to have to look at having no green bar for days on end. Second, whatever stupid test we wrote at the beginning (apparently “There must be a file in this folder”) doesn’t inform what I want to do as well as what I just did does. I can get a bloody file in the folder any time I want to. What I can’t do is figure out what the link should be in this particular src="blahblah" right here. Why do I care about that? Because I just got some other src= to work.

This is weird

Tozier’s model makes perfect sense, at least to him, and I can see his point if in fact I begin to understand it. He wants BDD/TDD/whateverDD to maintain more of the “driving”, in the sense that there’s always at least one “larger” test that fails, telling us what to do next. In that mode, if you had the right “larger” tests, you’d never wonder what to do next, the test would tell you. I can see that, and I don’t care.

To me, the design emerges, as from the fog. This bit shows up, gets clarified, then that bit. I trust that it’ll all go together, because I know what we’re trying to do, because I know how to refactor my fuzzy objects to be more like what I need, and because I’m sure I can always come up with the next requirement.

In this case, I’ve got most of them written down already, in specs that are not filled in. And, by the way, they keep printing out in the rspec run and it’s really distracting to scroll all through them to see what I was actually working on. When I figure out how to turn that off, I will, or I’ll comment them out. I don’t need a bunch of hungry tests mewing at me. I’ll feed them when I’m damn well ready.


Well, I don’t have one. I think I have a better understanding of where Tozier is coming from. I don’t know what to do about it but to carry on programming and discussing.

The conflict is interesting and very unlike the difficulties I’ve encountered in pairing before. Were it not obvious that Tozier is smarter than I am, I might attribute it to “he just doesn’t get it”. As it stands, I continue to play my usual card: I will explain myself in different ways until I am understood. I don’t care what you do with what I say, but I do care that I am understood.

Thanks for reading!