If almost no one gives out scores below 6, then we have a Lake Woebegon effect, where all of the children are above average. 6 becomes more like a floor rather than an average.
If nearly all of the scores are 6, 7, 8, or 9, then we have a 4-point scale. I have a colleague who's research specialty is measurement issues. He once did a study where he had people rate the size of objects on a 4-point scale (1 = very small, 2 = small, 3 = big, 4 = very big). The particular objects rated were a penny, a nickel, the moon, and the sun. He demonstrated that with a 4-point scale, there were no statistical differences in the sizes of the objects!!! If a 4-point scale can't distinguish between a nickel and the sun, then we ought to at least entertain alternatives to the current KCBS practice.
Someone earlier pointed out that using the full 2-9 scale would cause a lot of variance. But I can tell you, as a professor who teaches graduate level statistics, that variance is exactly what you need. I am *not* talking about judge-to-judge variance of a given entry. If the appearance scores are, say, 9,9,8,8,8,4, that is a different issue (a question of what in statistics is called "reliability"--I also think that this might be a problem with the KCBS system--but it is a different problem than the one I am addressing). No, I am talking about entry-to-entry variation rather than judge-to-judge variation. If there is little to no variance among entries--which can happen when you have 4-point scales--then we have a "restriction of variance" problem, which makes it very hard to find meaningful differences in the quality of the entries based upon their scores. As an extreme example, suppose there were no variance at all (everyone receives 888, say). Then it is impossible to distinguish good from bad cue. There will obviously be more variance in a 4-point scale, but, as my colleague's study shows, you still have problems picking out the the large from the small or, equivalently excellent cue from average. Not impossible, and not always, but certainly full-range scoring would be a statistical improvement.
--frank in Wilson, NY