Before I report on our initial estimates, here’s the input we got from the world:
Kelly Anderson wrote. He reminded us that productizing our product could be very time-consuming, and that he was assuming we were just talking about estimating how long it would take to write it. (That was, of course, correct.) He estimated a couple of eight hour days at the outside, to read the file and get the center of mass. (He also said that he thought he could do it in four hours, because he has a lot of image processing experience.) Good thing he said that, because that’s about how long it has taken us. He thinks a week for clustering (and says there are algorithms out there to find), and a couple of weeks for programming.
Alan Wostenberg wrote with a very unique estimation technique. He counted the squares on my web page background, covered by each story, and used that as the estimate. He also counted the nouns, verbs, and adjectives, and then did conventional abstract points of 1, 2, and 4. He reports that all three techniques tracked pretty well. Story estimation by physical size … I like it!
Andrew Parker wrote with estimates as well. He felt that reading in the data was difficult, and much of the analysis wouldn’t be. He was also pretty optimistic about reporting, thinking that there were libraries out there to do it. I guess he was like us, having not found the examples on the web of how to read things in.
Creating Our Estimates
This is what Chet and I came up with before we wrote any code at all.
We talked about how to calculate the center of mass, which we don’t exactly know but it’s clearly some kind of two-dimensional averaging thing and we figure we can get the formula from the Internet.
Similarly, we can divide the pattern up into regions, calculate the number of holes in that region.
Probably we can do other kinds of calculations. I imagined a kind of radial picture that was dark grey where shot density is high, light where low, with an eye to seeing any asymmetry in the pattern. And so on.
Uh, the Digitizing?
One issue we talked about early was the digitizing. We thought we might put the pattern sheet (about 4 feet square) against black paper, photograph it digitally, and write a program to find the holes. Then we thought it might be better to have the target be black paper, and put it on a light panel so light shines through from the back for better contrast.
The result will be a JPEG or GIF or some kind of file. I vaguely recall that high end digital cameras have a format called RAW which might be interesting. Now we have no idea what the format of that file will be, though I vaguely recall that GIF files are a lot like a simple array of color, with a few header fields about rows and columns and stuff. We can’t possibly estimate the digitizing story.
Except that we can. Whatever the format of the file is, we’ll ultimately get a file that amounts to a two-D array of integers representing colors (actually gray scale, as we’ll surely shoot in black and white). Every element will be a pixel. We’ll “just” identify black or white, paper or hole, and there we are, a two-D array of holes in the paper. Chet suggested taking a picture of a piece of paper with just one hole in it, to see if we can find it. I said yes, but take a picture with several holes too, because it won’t take long to find one.
I drew a picture of an array of numbers:
0100010000 0101000110 0001011000 0010010010
However we do it, however big the array elements are (8 bits, 24 bits, who knows), there will be low values representing black and high values representing white (or the other way around, we have no real idea how colors are represented, and we don’t even know if we’ll be shooting black paper or white). Either way we expect that all the values at one end will be Miss, and at the other end Hit. We expect to look at the file, figure out its format, convert it to a printout about like the one above only more complex, and decide with our eyes where the holes are. Then we’ll code up a program to do the same. It’ll be whatever the opposite of a band-pass filter is called, which I can’t remember right now. Anyway, low values get through, high values get through, middle values don’t.
My guess is that in one Session1 we’ll be able to identify the holes in a sheet of paper with fair reliability. I think Chet might think it’ll take two Sessions: he looked a bit uncertain when I said one.
We talked a bit about the other stories. The center of gravity thing is interesting but it is surely a simple calculation based on the Hits x,y coordinates. (Note to self: we didn’t talk about it, but converting to a radial chart might make this easier. Elementary trig, none of which I remember but it must be on the Internet. x = r cos theta or something like that. Hmm. R is the obvious sqrt(xx+yy), theta is arctan(x,y). I bet that’s it. Anyway, that easy.)
Densities in various segments of the chart: Easy. Partition the coordinates, map to the sheet array, count the holes.
Statistics of all kind should all be easy, they’ll come down to a Session each after we do the first couple, which might take a little longer.
Graphical output. Now that one is interesting. We’re picturing a printed multi-sheet report on your gun’s pattern, put in a nice folder of some kind, justifying the incredible expense of the whole process. Much of what the sheets will want to look like will be graphical, or partly graphical displays of the target sheet, as analyzed from various viewpoints.
We didn’t talk about it but it seems clear that the output will have to be PDF, or something very like it. We could possibly produce JPEGs or GIFs and put the output in an HTML page, but we want to get it neatly on paper somehow. Can Crystal Reports or some tool like that help us? No idea.
We did produce a nice-looking check in PostScript for the C3 project.2 No real recollection of the details, but we made it draw nice lines around the check and even put a logo on it if I recall. Basically how I’d approach it (this not yet discussed with Chet as I am trapped in Adrian Michigan) is to lay the report out as a series of boxes, and each box is filled in with the script for whateve text or picture goes in there. It would just be a matter of figuring out the basic layout, and coding up each of the pictures or paragraphs.
How hard is that? PostScript is a language, kind of nested and LISP-like if I recall, but whatever it is, it’ll look like a bunch of “Go to this location; print this stuff; Go to that location; print that stuff”.
How long will it take? The reports will be easy once we have the basics in place, surely not more than a Session. Graphs have two parts, calculating what’s in the graph, and displaying it on the page. Gray scale will be somewhat interesting, but we’ll negotiate that when we know more. Once we have the technique in hand, I’m thinking two Sessions, maybe one, per graph. One for the math, one for the picture.
Initially, we’ll have to do some work to decide whether it’s PostScript or Word with JPEGs embedded in tables, or HTML, or what. One or two Sessions to try each approach, plus, I don’t know, wish Chet was here, four or five sessions to get the basic layout in place for the approach we pick.
I’d propose ten sessions (a lot, that would be a half a work week for normal people) to pick an approach and spike it. Assume two sessions per graph you think of, and figure that the first couple might run over but wait and see.
A half hour or so of talking, a little thinking on the uncomfortable couch I’m sleeping on in Adrian, and an hour to write this up. Where are we so far? Let’s sum up. I’ll wait to get together with Chet for a real estimate, leaving him this much to read before we get together.
Here are my numbers. These are unofficial until the whole team looks at them:
- Digitizing. A couple of sessions to get the holes. Possibly a little more to clean up, detecting artifacts in the picture, who knows. We don't have stories for that yet.
- Statistics. Two sessions for the first one, one therafter.
- Reporting. Ten sessions to a choice of technology and basic report layout.
- Graphs. Two sessions each.
- Text Reports. One or two sessions each, depending if complex. Probably one.
- Photography. A few sessions, trying white paper, black paper, light behind, etc.
So. Assume two sheets of paper, nine squares of output on each, some text, some graphical. Eighteen squares. Thirty-six sessions. Ten for general reporting spike and layout. Call it four sessions for digitizing to be conservative. Add in two sessions for the first statistic, probably center of gravity.
Fifty-two sessions. One hundred to one-fifty pair hours, for what is described here, including more or less arbitrary reports and graphs. Doesn’t really consider gray scale, nor color, which will take a few sessions to learn about and then probably add very little to the individual graph or report estimates.
This assumes digitizing, plus 18 individual reports, a mix of text and graphics. Additional stories will need to be estimated of course.
My cut at this much: four or five pair weeks for two programmers who aren’t writing up the thing as a book.
Are We Crazy?
We’ll leave that to our readers. We surely haven’t considered a million things, including the architecture, design, or even the choice of programming language. Which ones put chills up and down your spine? What have we forgotten that could turn this one-month project into two, three, or more?
Please write us (ronjeffries at acm dot org, chet at hendricksonxp dot com, subject [ron] Simple Design) and tell us your concerns. We’ll write up the questions and as we go forward, keep track of what happens. We’re especially interested in what you see that makes these estimates crazy, what we’ve skipped that you think we should have done, and what would have prevented you from estimating as we did, in a couple of hours.
Session: The amount of work Chet and Ron can get done in a morning, before breaking for lunch around 11. A session is about two hours, three at the outside.
C3: Apologies for mentioning this project, but it happens to be where we did the PostScript thing.