Python Wordle on GitHub

I think yesterday we were limited by pickling. Who knew? First try: Debacle. Second try: Can’t work, don’t even try.

I gather from my reading that Python’s concurrent module passes data to processes via “pickling”, which comes down to packing up a byte format of some kind and unpacking it on the other end. Since yesterday’s concurrency experiment was passing big chunks of two datasets, that probably explains why it only worked with smallish chunks, and why it never pegged the cores like my primes experiment did.

So today, I plan to go a different way.

Actually, I just came up with a different different way, while typing the previous sentence. The human mind, if I may so style mine, is a strange and wonderful thing. I now have two ideas:

  1. The one I dreamed up while not quite getting up: read the data needed in each thread and then process it. The reading is quite fast and the results should be interesting;
  2. The data we use is read-only. What would python do if we were to put it in globals somehow?

There is more than one way to spawn processes with concurrent. The convenient one is with executor.map, which magically slices and dices input iterables and passes the slices to separate processes. There is a less convenient way, submit, where you pass in a function to be executed. In that case, you would submit as many processes as you wanted, 8 in our case.

I would prefer to use the map solution: it seems more clean to me. And what I like about the existing experimental solution is that the working method doesn’t even know it might be called in parallel:

    def create_dict(self, guesses, solutions, concurrent=False):
        if concurrent:
            with ProcessPoolExecutor(8) as executor:
                # chunk_size = ceil(len(guesses)/8)
                guess_descriptions = executor.map(self.guess_description, guesses, repeat(solutions), chunksize=500)
        else:
            guess_descriptions = map(self.guess_description, guesses, repeat(solutions))
        return {desc.guess_word: desc for desc in guess_descriptions}

    def guess_description(self, guess, solutions):
        guess_desc = GuessDescription(guess)
        for solution in solutions:
            score = guess.score(solution)
            guess_desc.add_word(score, solution)
        return guess_desc

The working method, guess_description, just processes one guess, and will be called repeatedly, by either a normal map or executor.map.

In today’s scheme, what I would like to do is to use the map approach, but instead of passing in a collection of guesses, simply pass in a short range of values that will tell a new mapping function which elements to slice from the list. I have a vague idea for what will happen. Let’s write a test.

TIL
Today I learned that a iterator, such as a generator or map, can only be used once in Python. This test passes:
    def test_iterator_once_only(self):
        result = map(lambda i: i*2,  range(3))
        vals = list(result)
        assert vals == [0, 2, 4]
        again = list(result)
        assert again == []
Debacle
I started about 0900. It is now 1300. I have thrashed and thrashed and still don’t have anything good. I have erased all the thrashing and will start again either later today, or tomorrow. Most likely tomorrow.

I’ve made some additions and changes to the code, but it’ll turn up in what we do later, if it’s germane.

Off for a break. See you next time.


Later that day …

Clearly what I can build in parallel is lists of guess descriptions, given slices of the guess words and all the solution words. However, those lists will be long, so even if I do read the list in the process, the return values will have to be pickled and unpickled. That will be costly. If we do eight slices there will be about 1500 words in each slice, and thus 1500 guess descriptions, each one of which includes a list of around a few hundred words.

This approach is feasless1. Belay the whole idea.

Darn.


  1. No, but it should be.