A study of word confusability and similarity for whole-word readers
This article doesn't claim to be a valid scientific study, none-the-less it was interesting to do, and, essentially, perform as a thought experiment.
One of the things I have noticed with my own son and lots of comments from other parents of early readers, gifted and potentially hyperlexic children, is that such children astonishingly read (recognise) long complex words (such as "galaxy" and "knowledge") with ease, yet sometimes (perhaps even often) get tripped up on short "simple" words, such as "one" and "many". The question is, what is the explanation for this, as it seems to defy logic?
I happen to have a background in the field of speech recognition (in computers) and there are factors of that field which boil down to the problem of recognising and distinguishing words from each other. So, I was eventually moved to perform some kind of analysis investigating this. I don't know if this is original or even valid research, but it was fun to do.
How do early readers, read?
The first thing to be aware of is two broad types of reading (and reading-teaching) methods: phonics and "whole word" (or whole language). Phonics concerns the systematic pronunciation of the component sounds of a word to reach the whole. Whole-word does what is says on the tin: the reader either memorises or deduces the whole word in one step. (As adults we tend to read like this).
My anecdotal conversations suggest that early readers are one or the other: some early readers display/develop/self-teach a phonic approach, and the remainder, it's the whole world. (In the case of my own son, it's "whole word"). In my anecdotal evidence, the most startling early readers are "whole word" because even at age 3 or 4, obscure words of 8, 10, 12 or more letters can be decoded instantly.
Since whole-word readers essentially memorise and recognise entire words, it begs the question: given that they handle complex words with ease, why do they sometimes get tripped up on short words?
It's possible to come up with lots of theories involving visual processing disorders, dyslexic conditions, motivation (laziness) and so on. However, I theorised about a more empirical factor: if children appear to recognise short words less-well, is it simply because short words are less memorable/more confusable?
(Confusability, in various forms, is a factor we have to deal with on a regular basis in speech recognition, which prompted my thinking.)
Mr. Levenshtein, meet Dr. Fry.
Before we get to the analysis, I need to introduce two things. The first is the Fry Sight Word list. I don't seem to be able to find out much about Dr. Fry directly on the internet, but many educational websites cite the fact he created a list of the most popular and common English words in literature, originally in the 50's but since updated.
If these are the most common words that a child is going to see, then it seemed to make sense to evaluate what levels of "confusability" exists among them.
Next we meet Mr. Levenshtein; or at least his algorithm, which provides a way to calculate the number of single character edits to transform one word into another. To put that another way, it gives a measure of word similarity - small Levenshtein distances between words means they are more textually similar than those with large distances.
We should note that Levenshtein distance only tells us about textual character difference (structure), which is certainly useful when computers are comparing words. It doesn't necessarily tell us how similar words are through the eyes of a child (e.g. geometry), but it's a good starting point.
To perform the analysis, I took a set of "sample words" and calculated the Levenshtein distance against between each of those words and every word in the "Fry Sight List".
I compared the sample words against the full Fry list (1000 words) and also against the top 150, and plotted the distribution of Levenshtein distances obtained.
What this effectively tells us is "how similar is the target word to the most common words in the language". We might postulate that the more similar a word is to others, the more likely it could be confused - i.e. the less likely to stand out as unique. Or conversely, a greater cognitive load required to uniquely recognise it.
I plotted the results for "one" "many" "who" (all identified as "trip up" words), plus "galaxy" and "knowledge" (indentfied as easily-recalled words).
To interpret the chart, the height of each bar tells you by what amount the target word differed from how much of the Fry's list. So, for example, a 50% at marker 3 means the word differed by 3 single-character transformations against 50% of the Fry list.
Compared against 1000 top words, we see that "one" "many" and "who" are clustered around the 3,4 and 5 mark for Levenshtein distance. Indeed, this level of "similarity" captures up to 80% of the top 1000 words. In contrast, "galaxy" is typically different by around 6 - 7 letters, and "knowledge" even more different around 8 - 9 mark.
The effect is even more pronounced when comparing the sample words against the top 150 Fry words. (Again, many websites reference the claim that just 100 words make up almost half of all written material). Indeed it's likely a child doesn't compare the word they are reading against their whole vocabulary, but will prune their recognition against a vocabulary that's filtered down to a smaller, similar set. Or to put it another way, they will most consciously compare a four letter words against the 3, 4 and 5 letter words in their vocabulary, and not the 8, 9, 10 letter words, which will be discarded subconsciously.
In this case the profile of the sample words is more pronounced - the short words compare against the top 150 mainly in the 2,3,4 range (anything in 1 and 2 is certainly highly confusable). And the long, complex words now stand out as being significantly different - and thus, we presume easier to recognise uniquely within the given vocabulary.
There are of course weaknesses to this analysis:
1) it doesn't consider word geometry or font, which may make some words look more similar than others irrespective of Levenshtein distance, which considers the text only
2) The Fry Sight list is really only a arbitrary representation of the vocabulary an early reader might know. To some extent, by definition, this list is insufficient, because the words that early readers surprise their parents, carers and observers by knowing, are the long irregular words.
3) It would be useful to perform the analysis against a bigger vocabulary but of words the same length as the sample word - this might better match the process a child follows when recognising the word (pruning out the obviously non-similar words)
Notwithstanding, the comparison of sample words against the Fry Sight Word list shows statistically significant disparity in similarity between the shorter words than the longer words. At 1000 words long, the Fry Sight list offers statistical significance to the comparison.
The result is not really surprising. As we might expect, there are more short words in the vocabulary, therefore more possibility of similarity and confusion.
Anyone that's tried a diet will soon enough probably hit some kind of "plateau" where the weight loss stops. Even if it's not for an extended period, it's still disheartening - so it's important to try and look at the causes and figure out whether you're sticking to your diet plan as closely as you should be.
In the case of Atkins and low carb diets there are various reports that caffeine and artificial sweeteners (namely Aspartame) can contribute to a slow down in weightloss. Certainly when I hit my plateau recently, both those things were still in my diet - probably in raised quantities because I was drinking as a replacement to eating.
Cutting out the caffeine is easy, but cutting out Aspartame is much harder, when natural fruit drinks are out of bounds and most soft diet drinks contain it. However sucralose based sweetener is ok, so the challenge is on to find diet drinks that contain that rather than Aspartame.
So far my search has thrown up two very palatable options:
- Tesco Diet Dandelion & Burdock
- Tesco Diet Cloudy Lemonade
I thoroughly recommend both drinks and I'm pleased to report my plateau is on the move again :-)
The recipe is based on "almond flour", sometimes known as "almond meal": though basically it's ground almond. I never knew you could bake with it, but if these come out as they are meant to for you, like me you'll realise you can indeed bake very successfully with almond instead of wheat/flour!
Some notes on modifications to original ingredients
- The original recipe contained salt. After trying it, the cakes were way too salty, so I have eliminated the extra salt. What’s more - I find that when on a low carb regime, I’m far more sensitive to salt. If you want to add it back in, so be it.
- The original recipe suggests that liquid sweetener is preferred - whether this really makes a difference or not, I'm not sure - as these muffins come out just fine with granulated sweetener. Personally I just go for what's simple.
- 200g Ground Almond (this is basically your flour)
- 100g dessicated coconut (this is optional - I add it to add sweetness & I like the texture. I think it's worth it!)
- 35g butter - I find regular butter too salty (esp. on Atkins) so I used Tesco "Soft Spread"(which says "perfect for cakes" on the side! - and I agree)
- 4 medium eggs
- 10g of sucralose-based sweetener - e.g. Splenda (though Tesco do their own brand for almost 1/3rd the price) - remember that 10g of this type of sweetener is basically equivalent to 100g of normal sugar.
- 75mls lemon juice (or you can use water, I guess, but for best flavour go with the lemon)
- 2 teaspoons baking powder
Flavourings/Optional (I add all off these)
- half to 1 capful of orange natural extract
- half to 1 capful vanilla flavour
- a sprinkling of something like chopped orange/lemon peel which you can buy in small tubs
Of course, you can vary the flavours and quantities to suit your own taste
- pre-heat the oven to a temperature 350F / gas mark 4 - 4.5 / 180C
- thoroughly mix all dry ingredients
- thoroughly mix in all wet ingredients
- spoon into the tin and bake - cooking time 15 - 20mins (if done in a mincepie/yorkshire pudding/muffin tray) - e.g tray with 12 individual portions
The recipe above is perfect to make 12 small muffins.
I'm standing for (re)election as a Public Governor of Cambridgshire & Peterborough NHS Foundation Trust - the NHS organisation that provides the mental health care for the Cambridgeshire region, with votes closing on the 31st May 2010. I have been a Governor for 2 years since the formation of the Trust as a Foundation Trust.
Below is my election statement. If you are a member of the Trust you'll have received a voting form and if you agree with my statement I would welcome your vote.
2010 Election Statement
If re-elected I’ll continue to represent those who feel unheard, misunderstood, inadequately cared for and isolated by mental illness, whether sufferers or carers. I’ll also focus on cost control and value for money as well as new ways to engage closely with users using Internet technology.I’ve been an active voice as a Governor, involved in anti-stigma, information provision and getting out on the street and online with direct action and communication.My experiences have brought me into close contact with the devastating effects of conditions such as Depressive Illness and related aspects, whilst highlighting the misunderstanding and stigma associated with mental health issues. I'm dedicated to doing everything possible to change this perception and improve the care available for sufferers and carers, who often go unsupported.I'm passionate about "levelling the playing field" for mental health, with greater provision, awareness and education, ensuring available funds are well spent.
If re-elected I’ll continue to represent those who feel unheard, misunderstood, inadequately cared for and isolated by mental illness, whether sufferers or carers. I’ll also focus on cost control and value for money as well as new ways to engage closely with users using Internet technology.
I’ve been an active voice as a Governor, involved in anti-stigma, information provision and getting out on the street and online with direct action and communication.
My experiences have brought me into close contact with the devastating effects of conditions such as Depressive Illness and related aspects, whilst highlighting the misunderstanding and stigma associated with mental health issues. I'm dedicated to doing everything possible to change this perception and improve the care available for sufferers and carers, who often go unsupported.
I'm passionate about "levelling the playing field" for mental health, with greater provision, awareness and education, ensuring available funds are well spent.
|% RDA for...||Berocca||Tesco "B-active"||Boots "Re-Energise"|
|Thiamin (Vitamin B1)||743||743||929|
|Ribofalvin (Vitamin B2)||850||850||938|
|Niacin (Vitamin B3)||252||252||278|
Given the recent high-profile campaign by ReThink, I was saddened, disappointed and perhaps even slightly disgusted to read an article in management today that demonstrates - with textbook accuracy - the prejudice and stigma towards mental health issues that still exist today. It's not clear whether it is the result of ignorance or arrogance on the part of the author, in an attempt to be humourous - but for me, it doesn't work. It's embarassing. It's a publication that you think could do better.
The article is a commentary on a BT Tradespace survey that suggests Stephen Fry would be the small businesses' dream employee. Unfortunately Management Today doesn't show the same insight and understanding as those small business, as it goes on to say:
But Fry has managed to trump them all, and by some distance too. Our only alarm is that – judging by his much-publicised walk-out from the West End a few years ago – he doesn’t have a great track record for dealing with pressure.
False Alarm. Whatever judgement is being exercised here, it is wrong. Fry's "much-publicised walk-out" was not a failure to deal with pressure - but by his own frank and moving admission, a result of a deeply dark period manifesting from his bi-polar disorder. Bi-polar is an illness, a malfunction - like a broken leg - not a character weakness, as the author implies. Hey, but never let the facts get in the way of a good story?
In fact, anyone following Fry's life in detail through his twitter "tweets" will be staggered at the man's ability to cram twice as much as the rest of us into his day - not only responding to the thousands of followers who "tweet" to him daily, but jet-setting left, right and centre between TV sets, riding stubborn mules up mexican mountains, amidst writing and recording and everything else he does. The man is an inspiration.
Thankfully UK small businesses seem to recognise this - while the hand that feeds them - Management Today - should know better.