Loss of Smell in Coronavirus - The seduction of numbers

Yesterday caught a brief moment of Professor Van-Tam at great pains to explain that loss of smell as a symptom made only a teeny-teeny-teeny-tiny difference to the number of those who could be predicted with #coronavirus.

Now, the thing is, studies have shown this sort of thing:


"For example, a British study released last week collected COVID-19 symptom data from patients through an online app. The data show that almost 60 percent of the 579 users who reported testing positive for the coronavirus said they’d lost their sense of smell and taste. But a significant portion of patients who tested negative for the virus—18 percent of 1,123 people—also reported olfactory and taste troubles."

itchy.png

At first glance this is very confusing - surely if 60% of coronavirus patients report loss of smell, it HAS to be a good predictor even if/especially since that number is much lower (18%) in general (for other conditions & non-conditions)?

Van-Tam seemed to be so adamant about the small predictive qualities of loss of smell, I figured I would think it through carefully and run some numbers.

I decided to imagine that "itchiness" was a new observation and plugged in some numbers to calculate how diagnosis plays out. On the left is a very simple Excel spreadsheet which calculates how many people are in each group based on general percentages. I've used some representative percentages that are in the right ballpark to help make the thing (hopefully) more realistic.

It turns out that even if 60% of covid sufferers report itchiness, it is still a lousy predictor of them having the disease.

So what's going on here?

This is in the same realm as Simpson's paradox, which I discussed the other day:

In this case: A high percentage of small number (itchy with covid) can end up being much more diminutive than a small percentage of a high number (itchy without covid).

When the above observations are taken as individual groups, already KNOWING which group a person belongs to, it's certainly intuitive to draw the conclusion that you have a good predictor in the itchy-with-covid group.

But that's only AFTER the fact.

In reality, to start with, you don't have these groups, you are looking for a predictor in order to actually form them amongst a general population. And that is a different problem.

In total many, many more people who are itchy will actually belong to the itchy-without-covid group, simply because the proportion who do genuinely have covid is a much smaller part of the population. (At least for now).

In my demonstration model, if someone reports being itchy, they are 5.7 times more likely to have something else than #covid19 even though 60% of those who have #covid19 report being itchy!

Notes

It doesn't matter what value to start the population at, it all works out the same. So you can treat "population" as "the number of people who report itchiness that day, or week, or who have done so in the last month" etc.

quote source at: https://www.nationalgeographic.co.uk/science-and-technology/2020/04/lost-your-sense-of-smell-it-may-not-be-coronavirus

also see https://www.the-scientist.com/news-opinion/loss-of-smell-taste-may-be-reliable-predictor-of-covid-19-study-67528

Occupational Risk in Relation to Coronavirus COVID-19

I written a fair bit (and analysed a whole lot more) of the COVID19 situation and data but not published here because, frankly, the minute it’s published it’s out of date. Moreover, even using official data sources such as John Hopkins University, there’s a kind of “data entropy” at work, where data volume increases over time, but quality reduces. I could do a whole post on that topic alone, but that’s for another day.

Meanwhile, the Office for National Statistics (ONS) published an intriguing data set that quantified the nearly 400 occupations in the UK and, amongst other things, classified the type of contact with other people that workers had:

  • proximity (ranging from touching to close distance to no close contact with people at all),

  • and exposure ranging from many times a day to weekly/monthly/yearly to never.

This data can be explored interactively on the ONS website but I’ve also tried to produce some static readouts here, although it’s quite a challenge to compress this amount of data into a one-page visualisation! So, you will see a number of variations.

As the debate intensifies over whether to start schools up or not, it’s interesting to note that teachers and classroom assistants are basically in the next tranche of most-at-risk workers, behind the healthcare, police, cleaning and delivery key-workers that have kept critical services running. Many questions still remain (at the time of writing) as to the level of risk posed by the children they will mix with. Children, although seen to be less susceptible themselves to the disease, are certainly not immune.

HOw does pay compare for those potentially most-exposed. Redder, larger, more-to-right = more at risk. Lower down = lower pay.

Coloured by risk and size by percentile (% figure means x% of workers have this risk or higher)

coloured and ordered by risk, Sized by number of workers in sector.

OccuPations sized, sorted and coloured by risk

Occupations sized and sorted by sector size, coloured by risk

Some cautionary notes come with this data:

  • Risk profiles were actually collated from American workers, so difference in process and work-style could mean UK workers have a different profile.

  • The risk profile was devised prior to COVID19 and doesn’t take account of any potential social distances or other safety approaches (e.g PPE) that may be applied to a given occupation. So, in some sense, the risk score indicates what degree of protection could be needed.