"Data Soup" Blog

Peter Klam

Last week, I wrote about our zip-code-specific COVID-19 data and a little bit about how that data helps and to some degree doesn’t help us to position ourselves with regard to the data. I mentioned chaos theory, and I thought that it might be enjoyable to talk about the origins of chaos theory and how it can help us to understand the limitations of data.  

Chaos theory is based on the idea that in a closed system with multiple variables that interact, small anomalies can (and eventually will) magnify so that the results belie predictability. This may not sound revolutionary, but it is a distinct break from the Newtonian approach to science, which draws generalizations and conclusions based on the idea that small differences in these environments will tend to disappear.  

It all started in 1961 when something surprising happened to Edward Lorenz, a meteorologist who was working on computerized weather models. The model that he was working with used 12 variables like barometric pressure, temperature, etc. and would run the model simulating the passage of days and week so that researchers could understand how differences in one or more variables would change other variables over time. One day, Lorenz got an interesting result from the simulator that he wanted to double-check. So he printed out a page with the data from the variables a few simulated days back, re-inserted that data into the simulation, and went for a cup of coffee.  

When he returned, he was shocked to find out that the simulation has entirely different results than it had produced the first time through. He assumed that he must have made an error when inputting the data. So he went back through it. There were no mistakes made. After working through the possibilities, he realized that when the model ran originally, it produced numbers for the daily simulations that went to the sixth decimal place. But it rounded those numbers to three decimal places for the print-outs. The differences were infinitesimal—in the ten-thousandths—and by Newtonian principals, should have disappeared over time. Instead, the tiny inaccuracies begat somewhat larger, but still tiny, inaccuracies. As the simulator ran the day models over and over again, simulating the passage of weeks and weeks, those inaccuracies grew eventually to the point where the model looked entirely different than it had the first time around, with data that extended to the sixth decimal place rather than the third.  

Lorenz realized that he had happened upon something important—a principle that he would flesh out into a paper called, “Does the Flap of a Butterfly's Wings in Brazil Set Off a Tornado in Texas?” That paper outlined his new theory, distilled into the idea that complex systems have a sensitive dependence on initial conditions. That is, the further you get from your original data, the more inaccurate your predictions will be as time passes. Because even if your read of the data was accurate, tiny inaccuracies in a complex system create turbulence and inaccuracy over time and lead the system in unpredictable ways.  

Lorenz’s original area of expertise—weather—is a great system to use to understand chaos theory. With more sensitive instruments and the ability to crunch more and more data using ever more powerful computers, our ability to predict weather has increased a lot over the past few decades. We can consider a greater variety of data, and different groups model weather differently, leading to somewhat different predictions from time-to-time, but generally speaking, our predictions are more accurate than ever. That’s the case with only widely dispersed weather stations monitoring conditions and reporting data. If we sank the money into having one weather station every cubic mile of our atmosphere, our predictions would become much, much more accurate. Right now, the Weather Channel provides predictions 14 days ahead, and even then, the last few days are pretty inaccurate, generally speaking. Well, if we could measure conditions cubic mile by cubic mile, we might be able to very accurately measure weather for three weeks. But a few weeks in, the small anomalies within that cubic mile would start to skew our model and inaccuracies would arise. Three weeks on, our model from three weeks before would no longer reflect the current conditions—the errors would have manifested themselves, and we would need to re-measure and reassess to give ourselves another few weeks of accurate predictions.  

Now imagine that we used nanotechnology to create microscopic weather stations that would gather data cubic inch by cubic inch from the entire troposphere. We might be able to crunch that data to create a highly accurate weather model that could predict weather for an entire month—or two, or three. But even then, the tiny anomalies within each of those inches—a barometric pressure read that is different for this centimeter of atmosphere that isn’t necessarily entirely representative of the whole cubic inch. Little by little, bit by bit, those teensy inaccuracies would grow as time passed, and a month or two later, the predictions that we made from way back when—that month or two earlier—would have led our model, very gradually and very eventually, to general inaccuracy. Our model had a sensitive dependence on initial conditions, and the data become inaccurate as those initial conditions faded into the past.  

graph

So how does that apply to Green Acres in COVID times? Well, given the data that we are gathering and analyzing, I think it’s quite relevant, and interestingly so. We could, for example, use national data to make our decision. There is a degree of accuracy to that data insofar as it incorporates all of our staff and families, but for obvious reasons, the data isn’t very sensitive to our situation because it covers such a broad swath of territory. Using Maryland data rather than U.S. data increases its sensitivity to our situation significantly, and using Montgomery County rather than Maryland would further increase the quality of our data and our ability to use it to analyze our situation. Zip codes bring in something of an extra complexity since our families live in different zip codes, but it narrows the breadth of data that we take in by indexing the zip code data family by family, which would seemingly add sensitivity to our data set. 

graph

And so zip code data can be interesting and informative. But of course, there are limitations to that data, too. Rockville, Maryland, for example, has ten zip codes. But it has 41 neighborhoods. Getting neighborhood data (which I don’t believe is available) would increase our accuracy and our ability to model and understand the risk of community spread making it onto campus. Then, of course, each of those 41 neighborhoods has, probably, 5 or 6 blocks with a handful of houses each. Narrowing the data down to blocks would even more greatly increase the sensitivity of our data.  

At any rate, it’s just interesting to think about the ability of data to inform us—and the limitations therein. Chaos theory is a provocative lens that we can use to best understand our situation. It is worth mentioning that we are starting to pivot away from the community data entirely in favor of more closely considering our internal data—data that we now have access to because of weekly testing. That data provides a very sensitive understanding of our current situation, and re-measuring that data every week keeps us current. If we are concerned with a sensitive dependence on initial conditions, this pivot to hyper-local data and the weekly data gathering will serve us, and our community, well.  

graph

graph

The above analysis puts an interesting angle on looking at our community data, but let’s have a look nonetheless. What you will see is that the data is fairly muddled. Some days are quite high, other days are quite low. After continuing growth in the 7-day averages for over a week, we’ve arrived at a place where the 7-day averages are fluctuating—some days they go up, other days they drop. Overall, the trends still suggest growth, as the percent change suggests, but I am personally hoping that the variability of the data suggests that our post-holiday surge is cresting, and we’ll be headed back down soon.  

Peter Klam

Several times over the past several weeks, we’ve gotten requests to look at the COVID data by zip code rather than county-wide. While we have in the past looked at the zip code data more broadly, I decided towards the end of the Winter Break to do a deeper dive into that information to see what it might yield.

If you look at our map-based zip code scatterplot, Green Acres families and staff mostly live along the 270-corridor from Bethesda to just outside Gaithersburg. We also have a contingent who run across past Kensington to Silver Spring, parallel to the beltway. Other families live further away, but they tend to be more isolated, one to a zip code. I didn’t consider those who live in the District, Virginia, Prince George’s County, or Frederick County since the MoCo website doesn’t provide data for their zip codes, and these students are quite few and isolated to a single family per zip code.

The county only provides zip-code-based data for the “Positive Cases per 100K Residents” metric. So obviously, that’s the one I’ll concentrate on today. There are geographic trends to this data that you can draw some conclusions from. Zip codes from closer in to D.C. (so Bethesda and close-in Silver Spring) tend to have fewer positive cases per 100K, while those that are further out tend to have more. There are exceptions to this, but it is broadly true. This is more interesting than it probably at first appears. Areas with greater population density tend to have more cases of COVID-19 (as I mentioned in my previous blog post); and the closer to the city, the greater the population density. I would surmise (though this is more an educated guess than anything else) that the demographic piece in play here is property value, which has a tendency to be higher in areas closer to the city and a tendency to gradually decline the further out you get. I fully admit that this is conjecture, so take it with a grain of salt. But I thought it worth mentioning since it appears that one demographic trend (population density) that is often correlated with COVID-19 rates seems in this case to have been superseded by a different demographic trend (property value/wealth).

Using our school database, I was able to ascertain how many Green Acres students and staff live in each zip code, and then I simply calculated our school data by averaging the cases per 100K using the zip-code-specific data, family by family. What I found was that Green Acres, indeed, displays a lower overall case count per 100K than the county as a whole. For Green Acres families, the number dropped by just more than ten cases overall, and for Green Acres staff, the metric dropped by a little less.

I gathered this data last Saturday, January 2. On that day, the countywide metric per 100K residents was 38.9 positive cases. Our family data by zip code averages to 27.8 cases per 100K residents, and our staff data averages to 31 cases per 100K residents. I will update this data periodically (it is rather more time consuming than updating the graphs, so I will only do it every week or two), and if the data yields interesting information, I will share it forward with you.

While this data looks favorable for us, I do want to consider an important caveat; frankly I find it difficult to consider the data as more than just an interesting tidbit. The first caveat is that while our overall average, when compared with the countywide data, is favorable, it’s also true that our families live in wide-ranging zip codes with very different case counts. While many of our families live in zip codes with some of the lowest numbers in the county, we have a healthy handful of families and staff who live in zip codes with more than 50 cases per 100K residents. What is the impact of this data if a classroom that is populated by a teacher with 14 students if that classroom has 12 individuals from zip codes with relative few cases and 3 individuals from high-case-count areas? I just don’t know the answer to that question, and honestly I’m not sure there is a good answer to it.

It also brings to mind the weather modeling that was the foundation for the seminal chaos theory paper entitled, “Does the flap of a butterfly’s wings in Brazil set off a tornado in Texas?” by Edward Lorenz. But I’ll save that exposition for one of my posts next week. It’s a pretty interesting story, and a good introduction to chaos theory!

I have been waiting for today’s metrics so I can walk you through them, but they hadn’t been updated as of the time this went “to press.” Since my previous post updated the data until Wednesday and there is only one more day’s worth of data since them, I’m going to forego that part of the blog for this posting.

The COVID-19 Task Force will meet next Wednesday to make a recommendation about our hoped-for return on January 21. I expect to post to the blog earlier in the week with updated data, then again in the aftermath of our Wednesday afternoon communication about the 21st with a closer look at the metrics that drove our decision.   

Peter Klam

It is a tale of two counties—a study in contrasts. While Montgomery County has seen the metrics rise over the past six weeks, it has been nothing compared with what is happening in L.A. County. In Montgomery County, our positivity rate has crept up to over 8%. This means, as I’ve mentioned in earlier posts, that we’ve crossed the threshold into the category of “higher risk for schools.” But it doesn’t compare to L.A., where the positivity rate right now is at 20.5%. We are currently hovering around 40 cases per 100K residents. In L.A., they’ve only just recently dropped below 130 cases per 100K.  

The results of this contrast are striking. Our hospitals in Montgomery County are quite full, more so than in other years, but far below the level deemed alarming by the County Health Department. In L.A. County, articles describe ambulances making the rounds from hospital to hospital trying to find a vacancy for desperately ill patients. A prominent piece in yesterday’s news cycle highlighted a directive by the county Emergency Medical Services. As a result of the stress on the system from the COVID-19 surge, they have told ambulance drivers not to transport cardiac patients who aren’t able to be revived in the field since they have a very low survival rate. That doesn’t account for a lot of people, but it is striking in refusing to serve patients who have at least a small chance of survival because they simply can’t serve all of those who need emergency services in their hospitals at this point.  

Now there are reasons for this contrast that go beyond just dumb luck or community efforts to fend off the virus. Montgomery County is fairly suburban, with a reasonable degree of sprawl, while L.A. County has some of the densest neighborhoods in the country. The CDC “Social Vulnerability Index” for L.A. is more than double than that of Montgomery County—0.7881 to 0.3114. This is a statistic generated from census data using community characteristics like population density, poverty rate, and access to transportation to determine how vulnerable a population is to a virus outbreak or other natural disaster. So while we can see some of the same trends in the data from our two areas, those shared tendencies are amplified in L.A. by the particular demographical and geographical challenges of the county. So while both areas saw a post-Thanksgiving bump, theirs was much greater.   

Now my point here is not to scare anyone. I just think that in our COVID exhaustion and frustration, we can forget how quickly things could unravel if we aren’t careful. And I think that we need to always remember how serious this disease can be—without even knowing the long-term effects of the disease (a recent study, for example, has connected the characteristic loss of smell with brain damage that has farther-reaching effects). More than 1,000 people have died of COVID-19 in L.A. County in the past week.  

graph

We adhere to the research that suggests that schools do not overly spread the disease, and we believe that with reasonable community spread and strong mitigation, we can return to school safely. But we will remain vigilant and careful, despite our own exhaustion and frustration, to make sure that the circumstances are appropriate for reopening. No place is safe from COVID-19 in an absolute sense; we are dealing with degrees of safety here. Schools can be relatively safe places, under the right circumstances and with appropriate mitigation. We are working hard to finding that line of relative safety so that when circumstances prevail, we can make a quick and effective return to campus.  

I would like to clear up one point of confusion before continuing to the analysis of the data. We are committed to returning to full days as soon as we can; that will require open-air, sheltered spaces where students can take off their masks and eat under supervision. We have purchased tents for this purpose, and we are actively working to hire a company to raise the tents and to make it through the county’s approval process. As soon as those tents are ready, we will transition to full-day learning. That could be as soon as our return on January 21. I am quite confident that if we’re not quite ready on that date, we will be ready shortly thereafter. But until the tests are erected and approved, we will continue in a half-day learning model with virtual classes in the afternoon. We are also looking at possibilities for Pre-K students who need to nap because the county mandates that students cannot nap with masks on, and Pre-K students are not allowed in tents. As developments arise in these areas, we will communicate them to you.  

graph

graph

As for the metrics, we are looking at a post-holiday bump right now. You can see that the 7-day average for cases per 100K has passed 40, a number it hadn’t hit since mid-December. You can also see that the 7-day average of the positivity rate continues at over 8%. Though these averages are meant to protect against the volatility of the day-by-day data, it is at times interesting to look at the daily numbers to understand how things are going. With regard to cases per 100K, it is interesting to note that yesterday’s number was quite a bit lower than most recent days, and the daily number rebounded today. The same thing happened last Tuesday and Wednesday. Perhaps this is a feature of Tuesdays that fall after weekends following a Friday holiday—the anomaly doesn’t seem to carry backwards to previous Tuesdays. But it’s an interesting feature to note when next Tuesday’s data arrives. As for the daily positivity rate, the pattern we’ve been in the last ten day or so of rates generally above 7% is an unwelcome bump from what we were seeing previously. This rate really needs to settle back down into the 5-6% range. I keep looking for a correlation between this statistic and the number of tests administered, but I just don’t see a connection. It doesn’t help that the timing for these two metrics is unclear: is the positivity rate the result of tests from that day, the day before that, or two days before, and how does that line up with the reporting of tests administered? Finally, the percent change continues to suggest that the virus is still growing in the community, thought the low rates suggest low growth, which is good and will hopefully give way to an extended period of virus recession.  

graph

graph

As always, we will continue to follow the metrics and communicate our thinking. We are still excited about our potential return on the 21st, and eager to get back onto campus. I’ve been digging into our community’s zip-code-based COVID data, and I will share a full analysis by the end of the week—probably on Friday morning.   

Peter Klam

As we creep towards the new year and the re-start of school, the metrics creep along with us. As I’ve mentioned before, the COVID committee will meet on January 4 to make a determination about starting up the following week. Many schools are slated to make the same decision at around the same time, while other schools will continue with another week or two of virtual classes before considering a return to physical classrooms. The question is whether schools will actually return or not. So far, the December metrics haven’t supported in-school learning, at least by any of the standards I’ve researched. As I mentioned a few posts back, I suspect that schools who were in person in December have discarded the use of metrics altogether and have simply decided that their mitigation is enough to keep folks at their schools reasonably safe. The question is, is this true? Is mitigation enough regardless of the metrics?

This question relates to the themes that you will see frequently in these posts—the themes of risk assessment and risk tolerance. Schools that have decided to forego consideration of the metrics assess that the risk to their population is low. School-age children rarely have serious cases of the disease, and mitigation has proven to help teachers to avoid contracting the disease when they hold to the strategies and rules in place. These schools are tolerant of the risk regardless of the metrics because they don’t see their populations as being in serious danger of bad cases. For other schools, high metrics suggest that the disease will make it onto campus, and even with mitigation and the favorable demographics of their populations, that risk is too much to bear.

What is the correct approach? I don’t know the answer to that any more than anyone else. For my part, I am in favor of avoiding situations with metrics so high that the disease is likely to make an appearance on campus. And it didn’t take the death of a 41-year-old congressman-elect from Louisiana to convince me of this. We don’t know many things about this disease. In particular, we don’t know the long-term effects of having it, which to me seems significant—especially with a vaccine on the horizon. We know that it is less risky for younger people, but we also know that there are notable exceptions to this rule, and young people with no pre-existing conditions do get quite sick from this disease on occasion.

I believe that we as we get to know the disease better and as we refine our mitigation strategies, we can tolerate higher levels of community spread. We are currently talking about whether to adopt different standards for a receding virus than we have had for one whose presence in the community was growing. I will explain this more fully if we do decide to adopt a different set of numbers for this. But I also believe that knowing and following the metrics is prudent institutionally, ethical in terms of the risk we ask teachers and families to take, and a responsible choice when it comes to doing our part for the community.

The metrics these days have turned somewhat less favorable than they were last week. You can see that the number of positive cases per 100K has remained somewhat steady, with daily blips that go higher or lower than other days, but a fairly steady number that has increased slightly over the past week or so. It is interesting to note that the positivity rate has really grown over the past two days, and I was curious as to the cause of this. So I updated the data for my “Tests Administered” graph and found that the last two reported days were striking in the low number of tests administered. Now it’s important to recognize that because of the different response time for different health care facilities and different reporting timelines for the county health office, these statistics might not exactly line up. But it actually stands to reason that they do. It is logical to assume that the more tests administered in a day, the more tests will come up negative, but the lower the rate will be. This is because the sicker the person, the more likely they are to seek out a test, so the first few thousand tests will have a relatively higher rate of positives. Those with milder symptoms or with other reasons to take a test (work, visiting family or friends) will add a disproportionate number of negatives, and will bring the rate down. So high testing days will have a tendency to drive the rate lower while low testing days will have a tendency to drive it up. At any rate, it was interesting to see that correlation between unusually high positivity rates the last couple of days and the low number of tests administered in the most recently reported days.

Stay well, take care, and have a Happy New Year! I will write again after the decision has been made about our potential return the following week.

graph

graph

graph

graph

graph

 

Peter Klam

I hope that you are all well, warm, and rested a few days into our winter break. It is a downtime for us, but we’re still somewhat busy working to get things ready for our eventual return to campus. We continue to work on getting things ready for our tent-raising, and we continue to hone our mitigation and return strategies to maximize safety when we are back on campus. No waxing poetic today—I’m going to cut right to the metrical chase.

The long and the short of it is that things have noticeably been improving over the past couple of days. You can see in the accompanying graphs that positive tests per 100K has been under 30 for 3 of the past 5 days—a low mark that hadn’t been reached in the previous 14 days. This has led to a decline in the 7-day average to just over 30, again quite low relative to that metric over the past 12 days. The positivity rate has continued to be somewhat elevated, as it remains above 6, including a mark of 6.14% registered today. But it registered only 3.54% yesterday, its lowest level since November 11. And all of this easing has contributed to change that has been optimistically negative (meaning that the presence of the virus is shrinking in the surrounding community.

It certainly is nice to have things trending positively and good news to share. Let’s be hopeful that the community will take the necessary precautions around Christmas and New Year’s in order to keep things trending in the right direction. We’re hopeful that we can return soon, wishful for a joyous holiday for those of you celebrating, and thankful to have come safely through this challenging year. We will not be on email consistently throughout the break, but we will be checking periodically. Please reach out if you have any questions.

graph

graph

graph

graph

graph