Wherein we consider some advice given to GeePaw Hill at a recent FGNO session.

Last Tuesday evening, at the Friday Geeks Night Out (FGNO) zoom meeting, my brother GeePaw Hill was showing us the very ambitious curriculum management idea he is working up for a group he is working with. One of the aspects of this thing, working name “curry”, don’t blame me, I didn’t name it, is that there are a few hundred pages of rich text content to manage. This will doubtless turn into a few thousand. Each page will have various identification tags associated with it. We can imagine some of them: name (slug), author, date, topic, status (working, released), and so on. We cannot imagine all of them: we’re sure that the tags will evolve as the curriculum evolves.

There are various views of this curriculum, including a view for each student’s path through and position in the curriculum, and so on and so on.

The question that Hill asked was something like “what’s a good key-value store for these tagged pages?” I think someone not on the zoom had already suggested something like an S3 instance, or a Lambda instance. The FGNO group did not agree.

Led primarily by Chet and me, we hammered advised a much simpler solution. I think my most telling words on the subject may have been “how many characters are there in a file name?”

Just use files!?!? What?!?!

Yes. The solution we were proposing was to store the pages in a file folder (somewhere) with the tags all encoded in some trivial way into the file name. slug-refactoring_level-1_date-20240126T073905Z_status-released with the contents being that version of the text.

I will not be offended if you consider that idea to be absurd. It will not surprise me if you’re wanting to tell me about how easy it is to set up (S3, Lambda, Git, MySQL, NoSQL, …) which would be just right for this problem. And it will not surprise me if you want to tell me how ridiculous the idea is of storing this information in a bunch of files in a folder.

I won’t even disagree with the last part, although I will at least raise an eyebrow at the notion that your favorite solution is actually easy to set up and maintain. But we’re here to consider this weird idea, and why we offered it.

Our brother Hill has a tendency to think really hard about things. He freely grants that one of his vices is over-thinking things, and the fact is that he’s incredibly smart and can keep a lot of ideas in his head and can manage a amazing amount of code as he works. If I were that good at thinking, I’d do more of it.

But I am not. In fact, I choose not to be. I’ve spent much of the past quarter-century trying to see how simple I can make things. I’m still not very good at it: things get away from me quite often, and I, too, enjoy a really tasty complicated idea. (I do not, however, enjoy installing software and trying to make it work, which helps me avoid just tossing a database into the mix every time I have to save a bit of information.)

Along the way to our possibly ludicrous save it all in a folder solution, we asked “does your language have a dictionary?”, because of course a key-value store is pretty much the same as a dictionary or map as it is sometimes called.

The point of our hammering was to help the team (Hill) focus on the core issues around storing the pages, in particular, the keys, and not to focus so much on the many details and perfectly valid concerns like versioning and such.

A Little Story
Some years ago, when Borders was still in business, Chet and I were working on some program in Ruby. It might have been our ill-fated shotgun pattern idea: I don’t really recall. We were working incrementally (did you know that everyone always works incrementally? It’s true!) and we needed a code manager to keep our code in. It might be that this predated Git, but there were Subversion and other things out there.

We wrote a tiny program named cm. It watched a single folder on whichever machine we were using, and whenever a file foo changed in that folder, cm wrote a copy of that file to another folder, with a name like foo_20240126T075831. The name and time stamp.

We were done with code management. We rarely ever go back to look at prior versions: we tend to program forward from wherever we are. But a few times we did pull back the most recent version as a form of undo. And I do mean just a few times: we tend to program forward from wherever we are, as I may have recently mentioned.

The folder filled up with more and more files. And no one cared, because even then we had a lot of storage on our laptops, compared to the size of even a moderately complicated Ruby program. Today, on my laptop, I have practically nothing left: 806.67 GB available of 994.66 GB. Good grief, people, I have 16 GB of main memory! And you want to store your silly program on a database owned by Microsoft or Amazon? Are you mad?

But I digress. We saved our code in files with simple tags as their names.

So we wanted Hill to have the opportunity to think about what matters with his pages, namely the meta-information (tags) associated with it, not with whether to use S3 or Lambda to store it. We knew that with some stupidly simple idea in mind like “file name leads to page”, he would generate a sensible interface between the program, which would think in curriculum terms like author and student and version and subject and status, and the information store, which thinks in terms of file names. In between there, he would probably wind up with a nice abstraction of a key as a collection of tag names and tag values, pointing to a file containing the rich text for the page.

Are you even serious?

Well, yes, I pretty much am. Kent Beck used to ask us “what’s the simplest thing that could possibly work?” I translated that, for myself, to “try the simplest thing that could possibly work”. And it has served me well. Often that simplest thing will bear the weight of the entire problem. Other times it needs to be extended. Once, I actually put something on a remote MySQL database. But always, taking the big solution out of consideration left more room to deal with the important issues, which are generally what the data and its keys wants to look like to the important side of the program, the part that presents a sensible picture to the users.

I think what I might do in upcoming days is to actually build a little example of this kind of thinking. Maybe I’ll keep it free-standing, or maybe I’ll build it into the Asteroids-Invaders game. Surely we need a database of high scores …

But yes, we seriously did suggest tags as file names and pages as data, and yes we really meant it. I hope you can see why, and that it wasn’t quite as insane as it may have sounded. Either way, I’d like to hear from you. I don’t use Xitter any more, but you can find me on mastodon.

See you next time!