Last week, I wrote about our zip-code-specific COVID-19 data and a little bit about how that data helps and to some degree doesn’t help us to position ourselves with regard to the data. I mentioned chaos theory, and I thought that it might be enjoyable to talk about the origins of chaos theory and how it can help us to understand the limitations of data.
Chaos theory is based on the idea that in a closed system with multiple variables that interact, small anomalies can (and eventually will) magnify so that the results belie predictability. This may not sound revolutionary, but it is a distinct break from the Newtonian approach to science, which draws generalizations and conclusions based on the idea that small differences in these environments will tend to disappear.
It all started in 1961 when something surprising happened to Edward Lorenz, a meteorologist who was working on computerized weather models. The model that he was working with used 12 variables like barometric pressure, temperature, etc. and would run the model simulating the passage of days and week so that researchers could understand how differences in one or more variables would change other variables over time. One day, Lorenz got an interesting result from the simulator that he wanted to double-check. So he printed out a page with the data from the variables a few simulated days back, re-inserted that data into the simulation, and went for a cup of coffee.
When he returned, he was shocked to find out that the simulation has entirely different results than it had produced the first time through. He assumed that he must have made an error when inputting the data. So he went back through it. There were no mistakes made. After working through the possibilities, he realized that when the model ran originally, it produced numbers for the daily simulations that went to the sixth decimal place. But it rounded those numbers to three decimal places for the print-outs. The differences were infinitesimal—in the ten-thousandths—and by Newtonian principals, should have disappeared over time. Instead, the tiny inaccuracies begat somewhat larger, but still tiny, inaccuracies. As the simulator ran the day models over and over again, simulating the passage of weeks and weeks, those inaccuracies grew eventually to the point where the model looked entirely different than it had the first time around, with data that extended to the sixth decimal place rather than the third.
Lorenz realized that he had happened upon something important—a principle that he would flesh out into a paper called, “Does the Flap of a Butterfly's Wings in Brazil Set Off a Tornado in Texas?” That paper outlined his new theory, distilled into the idea that complex systems have a sensitive dependence on initial conditions. That is, the further you get from your original data, the more inaccurate your predictions will be as time passes. Because even if your read of the data was accurate, tiny inaccuracies in a complex system create turbulence and inaccuracy over time and lead the system in unpredictable ways.
Lorenz’s original area of expertise—weather—is a great system to use to understand chaos theory. With more sensitive instruments and the ability to crunch more and more data using ever more powerful computers, our ability to predict weather has increased a lot over the past few decades. We can consider a greater variety of data, and different groups model weather differently, leading to somewhat different predictions from time-to-time, but generally speaking, our predictions are more accurate than ever. That’s the case with only widely dispersed weather stations monitoring conditions and reporting data. If we sank the money into having one weather station every cubic mile of our atmosphere, our predictions would become much, much more accurate. Right now, the Weather Channel provides predictions 14 days ahead, and even then, the last few days are pretty inaccurate, generally speaking. Well, if we could measure conditions cubic mile by cubic mile, we might be able to very accurately measure weather for three weeks. But a few weeks in, the small anomalies within that cubic mile would start to skew our model and inaccuracies would arise. Three weeks on, our model from three weeks before would no longer reflect the current conditions—the errors would have manifested themselves, and we would need to re-measure and reassess to give ourselves another few weeks of accurate predictions.
Now imagine that we used nanotechnology to create microscopic weather stations that would gather data cubic inch by cubic inch from the entire troposphere. We might be able to crunch that data to create a highly accurate weather model that could predict weather for an entire month—or two, or three. But even then, the tiny anomalies within each of those inches—a barometric pressure read that is different for this centimeter of atmosphere that isn’t necessarily entirely representative of the whole cubic inch. Little by little, bit by bit, those teensy inaccuracies would grow as time passed, and a month or two later, the predictions that we made from way back when—that month or two earlier—would have led our model, very gradually and very eventually, to general inaccuracy. Our model had a sensitive dependence on initial conditions, and the data become inaccurate as those initial conditions faded into the past.
So how does that apply to Green Acres in COVID times? Well, given the data that we are gathering and analyzing, I think it’s quite relevant, and interestingly so. We could, for example, use national data to make our decision. There is a degree of accuracy to that data insofar as it incorporates all of our staff and families, but for obvious reasons, the data isn’t very sensitive to our situation because it covers such a broad swath of territory. Using Maryland data rather than U.S. data increases its sensitivity to our situation significantly, and using Montgomery County rather than Maryland would further increase the quality of our data and our ability to use it to analyze our situation. Zip codes bring in something of an extra complexity since our families live in different zip codes, but it narrows the breadth of data that we take in by indexing the zip code data family by family, which would seemingly add sensitivity to our data set.
And so zip code data can be interesting and informative. But of course, there are limitations to that data, too. Rockville, Maryland, for example, has ten zip codes. But it has 41 neighborhoods. Getting neighborhood data (which I don’t believe is available) would increase our accuracy and our ability to model and understand the risk of community spread making it onto campus. Then, of course, each of those 41 neighborhoods has, probably, 5 or 6 blocks with a handful of houses each. Narrowing the data down to blocks would even more greatly increase the sensitivity of our data.
At any rate, it’s just interesting to think about the ability of data to inform us—and the limitations therein. Chaos theory is a provocative lens that we can use to best understand our situation. It is worth mentioning that we are starting to pivot away from the community data entirely in favor of more closely considering our internal data—data that we now have access to because of weekly testing. That data provides a very sensitive understanding of our current situation, and re-measuring that data every week keeps us current. If we are concerned with a sensitive dependence on initial conditions, this pivot to hyper-local data and the weekly data gathering will serve us, and our community, well.
The above analysis puts an interesting angle on looking at our community data, but let’s have a look nonetheless. What you will see is that the data is fairly muddled. Some days are quite high, other days are quite low. After continuing growth in the 7-day averages for over a week, we’ve arrived at a place where the 7-day averages are fluctuating—some days they go up, other days they drop. Overall, the trends still suggest growth, as the percent change suggests, but I am personally hoping that the variability of the data suggests that our post-holiday surge is cresting, and we’ll be headed back down soon.