Herein, some comments on what I’ve learned playing with Wordle.
Table of Contents
- Fast Tests Matter
- Always Create Specialized Objects
- Tests are Great for Learning
- Learning = Practice
concurrentmodule makes using multiple cores straightforward, and seems to work quite well.
Moving large data structures back and forth between processes is costly and eliminates much of the benefit.
Jason Brownlee’s site has lots of useful free information on threading and multiprocessing in Python.
concurrent library, a one-line change to the program allowed it to use multiple cores on my M1 MacBook Air. Rather than reduce the time by a factor of 8, which one would have hoped for, since I allegedly have 8 cores, I gained more like a factor of two. I’m not sure what was going on, but in my initial tests, I attributed the less than stellar results to the overhead of starting and stopping the processes. That includes “pickling”, which is the serialization process that Python uses to pass objects to and from processes.
Since the process we started with, building the SolutionDictionary, is absolutely CPU bound, I’m fairly convinced that the issue is overhead of some kind.
Yesterday, I tried a streamlined approach to calculating just the expected information from each guess, passing just lists of strings back and forth. That was much faster than calculating the SolutionDictionary, mostly because there’s just less work to do. In that experiment, parallel processing again saved me about a factor of two.
Then I did a quick experiment measuring the pickling time for the data being passed back and forth to the processes. That time was quite small, and has me convinced that pickling is not the bottleneck. The time isn’t very sensitive to chunksize, which also suggests that pickling isn’t a big issue with the current test.
I can think of some more experiments to run. We could measure processes that essentially do nothing, to get a sense of the overhead. We could pass them more or less information, to get a sense of the impact of data transfer. We could create longer processes to see if the parallelization ratio improves.
For now, I think we’ll defer any more experimentation with parallelism. We have enough of a taste of it to be able to revisit it if and when we work on something that would benefit from multiple cores.
pickleserialization is easy to use and rapidly creates compact serialized data structures.
Pickle could be a very convenient way of saving large, slow-to-compute data structures to files.
When underlying objects change, Pickle files (of course) become obsolete and will generally need to be rebuilt.
Generally speaking, I have little need for saving much information to files, so this capability may be more useful to others than it is for me. Still, it’s good to know that it’s there to be used when I need it.
Fast Tests Matter
- When my unit tests take even 15 seconds to run, it’s too damn long.
I have PyCharm rigged to run my tests every time I pause typing. If there are no fatal syntax errors, they just run. I generally pay no attention to them until I reach a point where I think some new test should pass, but they are harmless running all the time. Occasionally, I’ll notice that they’ve gone more red than I expect, but usually I ignore them until I’m ready.
When I’m ready, I can just glance down and in a tiny span of time, the new test results come up, and I know whether I’m good to go or I’ve broken something. The latter often happens. Gremlins, I think.
But while working on Wordle, I had some tests that took 15 or even 30 seconds. Fifteen seconds is a long time to wait to find out whether some new test works. So I developed the habit of marking the slower tests
skip, to the point where right now there are 45 passing tests and 10 skipped ones. That would be fine, except sometimes, one of the skipped tests is the only one that will find an error.
Yes, that argues for a smaller tests, perhaps using less data, that also finds the problem. But that requires a different way of working, where I have to notice that a test is slow and instead of just marking it
skip, I have to think about whether I should divert from whatever I’m doing to write a smaller test. First of all, I’m not going to think about that: I have some other test and code filling my mind buffers, and second, even if I do think of it, I’m not going to do it, because I have that other stuff on my mind.
So far, the learning is just that long-running tests are far less useful to me than blindingly fast ones. If I were a better person, and who knows, at some future time I might be, I would be more careful always to have really fast tests for everything, so that turning off long-running tests is more harmless.
It could happen.
Always Create Specialized Objects
- Learning (for the eleven-millionth time):
- Avoid using native collections of native objects.
Always create my own collections and small objects covering the native ones.
Be aware: in Python this can be costly.
This program has a dictionary that points to a dictionary that points to a dictionary of lists. At least I think that’s what it is. After more confusion than I really needed, I now have some objects that cover the native objects, including Word, WordCollection, SolutionDictionary, ScoreDescription, GuessDescription, and Statistic.
Those objects stick in my mind a bit better, and they partition the work that has to be done so that I can more readily find where something is and where changes should be made.
However, when helping Ken optimize his code, I suggested caching a member variable fetch in a temp, outside a loop, changing the code inside to access the temp. It sped up the loop by a factor of two. Why? Because Python looks up every member in a dictionary, every time. It has no compiler, no JIT optimization. So if speed is an issue—which it rarely is—we need to be a bit careful creating objects.
I find that, for me, creating them works better than not. One time, in someone else’s code, removing an access isn’t a good argument against objects.
I learn this lesson every day, because every day it seems easier to just create a list and move on. Quite often, I get away with it, Also quite often, it gets me in trouble. I keep trying to create more little objects and rarely if ever regret it.
I think there is a Python module for creating simple data record kinds of things, and I mean to look it up and try it.
Perhaps someday I will do that. This, too, could happen.
Tests are Great for Learning
- Learning (also for the umpteenth time):
- Writing little tests to try language features or calculations really helps me.
Python has lots of nooks and crannies, odd features like generators and comprehensions, and it’s not always clear just how they work. I find it useful to write a test to check them out rather than just trying things at a REPL prompt. I find that the extra time it takes to write the test causes me to reason a bit about the idea I’m testing, and that drills it into my head a bit more than just typing something in the REPL and moving on.
Rarely, I look back at those tests, so I save them. Ideally, I save them in a separate test file from other tests, both to keep them out of the way and to make them accessible if and when I want to check something again.
Learning = Practice
It should be clear to frequent readers that I seem to need to learn some things over and over again. Frankly, I think that is natural and appropriate. I should probably rename those ideas in terms of practice. Musicians practice. Athletes practice. Actors practice. And so on. In all those cases, the purpose of practice is to make the activity more natural, more automatic, more built in to the practitioner’s mind and body.
When we program, we are making innumerable decisions all the time. We do not—in my opinion cannot—actually stop and reason about every decision we make. We need to reflexively choose the better things to do, reflexively avoid the poorer things to do. I do that by thinking about what I’ve just done, looking back over a few minutes, a programming session, a few days, or a whole project, so see what I’ve learned.
The most effective of those reflective times, for me, seem to be the shorter-interval ones. Yes, sometimes a longer period brings out important larger lessons, or hammers home what should be a short lesson, but often reflecting immediately, and correcting immediately, pays off immediately as well as over the longer term.
When I was doing T’ai Chi (oh how I miss that), we would often practice a short series of moves, even stances, over and over. Our instructor would often walk up to someone and adjust their hand position just a little bit. Then do it again, and again, and again.
Practice is one of the most powerful ways of learning, and embedding that learning deeply.
Or so it seems to me.