Is the KCBS BOD schizophrenic?

I've come to accept the difference in each judges opinion & taste. I understand that on chicken no 2 pieces are the same ( even if I spend 3 hrs trying to get them identical). I put 8 ribs in the box 4 from 2 racks (that are not going to be identical). A pork box full of chunks from 2 butts & different parts of the butt. I can see a 2-3 point fluxuation in tast & tenderness. BUT.........when I turn in my 7-8 slices of brisket and see scores from 5 to 9's I have a problem with that cause I know they are almost identical.:boxing: Just my 2C

I admit, I don't understand that either. I love good brisket, can't stand "mom's pot roast" brisket. So...yesterday, we had a brisket that I thought was quite tasty, while another judge deemed it pot roast. All I could say is if my mom had ever made pot roast that tasted as good as that brisket, I would love pot roast to this day!
 
I guess I don't know what the answer is... To be honest, I'm not sure if it is the the fact that we have a large judging scale or that some people can't write...

I for one would like to have meet the judge that gave us a 3 for appearance when all but one of the other judges gave us 9's and the remaining judge gave us an 8.

I've had the discussion on mandatory comment cards for any score 5 and down with others including Mike Lake and the response was always, "Then you would never have a score below a 6. Maybe the answer is to require comment cards for all scores.
 
You've hit the core of the problem (where we've been discussing the
result of the problem). Contrast that to other sanctioning bodies who
have all day training, tests, and then once a judge passes they dont
become a CBJ until they've successfully judged two sanctioned contests.
I do think KCBS is a bit of a victim of their own success/growth, but that
doesnt make the problem any smaller or excuse it.

I don't know if you need to spend all day, but I do think they should have backyard Q and Comp Q for tasting and actually review EACH score submitted and have them clarify exactly WHY they gave them the score they did good or bad. The table captains at the training classes should be CBJ's and not just volunteers passing out boxes and reading numbers. If they had a CBJ at every table they could take ownership of the table and say ok here is what you should look for when you judge appearance, then move on to taste, and move on to tenderness. Taste is and always will be subjective, but when a guy gets 8-9's on appearance and 1 judge gives that same box a 3 then something is wrong. Also as mentioned 1 guy got a comment card saying his brisket was over cooked and another saying it was under cooked? I thought spending 30+ minutes on what greens where acceptable was just a waste of time. I think it should be up to the table captain to determine if a box is acceptable or not. Then you could take those 30 minutes and teach the judges about how chicken skin should be cooked properly, what a perfect cooked rib texture should have and how to do a bend and pull test on brisket. Then as suggested have the first 2 comps that CBJ judges be reviewed by the KCBS rep. If either of the first 2 contest they don't judge consistintly then don't certify them and have them RETAKE the class.

I don't want to go on and on, but I just find it odd that the most important part of being a judge "training" get the least amount of attention.
 
New Ideas Committee - Merl Whitebook

"...when it appears that a CBJ statistically is inconsistent in scoring (+/- 2 from the mean of the overall contest results,) at two or more contest, in a 12 month period, the CBJ will be mentored by the CBJ Chairperson. ... The motion was seconded by Ed Roith."

I like the idea and think it's overdue. What's wrong with monitoring and mentoring judges? Judging is an area that I think has nearly unanimous support for improvement among cooks and judges alike. We don't necessarily agree on how to improve but most of us agree that improvement would be favorable.

The organization should track judges scores versus their peers and identify those who are consistent outliers. As has been mentioned there are valid reasons for being an outlier on occassion, but a rigorous tracking system should easily distinguish between someone who occassionally scores lower than the rest of the table and someone who consistently scores lower than the rest of their table. I would think that judges who take their craft seriously would want this kind of feedback.

I also think it's a mistake to assume that this type of scoring review would contribute to or lead to grade inflation. If that's going on, it's a separate issue but one that will be easier to fix with judges that score consistently on the same scale.
 
heck Paul.... We can't even get the full TOY points published for the members to see. Do you think we can actually track judges? :becky:


*** THis is no slight to the wonderful ladies at the KCBS that slave over doing the TOY program for our benefit.
 
Lies, Damn Lies & Statistics

Mark Twain, noted BBQ enthusiast and author once wrote, "There are three kinds of lies: lies, damned lies, and statistics." We need to be careful about the nature of the problem that we are trying to solve by statistical proof. The underlying problem is the competence of judges which brings into question the certification process. Fix the certification process and you will fix the statistical deviations.
 
If almost no one gives out scores below 6, then we have a Lake Woebegon effect, where all of the children are above average. 6 becomes more like a floor rather than an average.

If nearly all of the scores are 6, 7, 8, or 9, then we have a 4-point scale. I have a colleague who's research specialty is measurement issues. He once did a study where he had people rate the size of objects on a 4-point scale (1 = very small, 2 = small, 3 = big, 4 = very big). The particular objects rated were a penny, a nickel, the moon, and the sun. He demonstrated that with a 4-point scale, there were no statistical differences in the sizes of the objects!!! If a 4-point scale can't distinguish between a nickel and the sun, then we ought to at least entertain alternatives to the current KCBS practice.

Someone earlier pointed out that using the full 2-9 scale would cause a lot of variance. But I can tell you, as a professor who teaches graduate level statistics, that variance is exactly what you need. I am *not* talking about judge-to-judge variance of a given entry. If the appearance scores are, say, 9,9,8,8,8,4, that is a different issue (a question of what in statistics is called "reliability"--I also think that this might be a problem with the KCBS system--but it is a different problem than the one I am addressing). No, I am talking about entry-to-entry variation rather than judge-to-judge variation. If there is little to no variance among entries--which can happen when you have 4-point scales--then we have a "restriction of variance" problem, which makes it very hard to find meaningful differences in the quality of the entries based upon their scores. As an extreme example, suppose there were no variance at all (everyone receives 888, say). Then it is impossible to distinguish good from bad cue. There will obviously be more variance in a 4-point scale, but, as my colleague's study shows, you still have problems picking out the the large from the small or, equivalently excellent cue from average. Not impossible, and not always, but certainly full-range scoring would be a statistical improvement.


--frank in Wilson, NY
 
I took a CBJ class recently and the person who was supposed to cook for the class had a last minute personal emergency and replacement cooks were recruited from throughout the community. The replacements did not have to be competition cooks. Every piece of meat that I tasted that day, I would have sent back to the kitchen had it come from a restaurant. It was the worst BBQ I had tasted in my life.

I'm pretty sure I was at the same training class and I can say that was problem #1.


San Diego, August 7th?

If so you can blame me for the chicken, at least the breast...I did all the breast and about, 60 drumsticks and 7-8 thighs...Sorry, did not think they were that bad...I tried the ribs, they were a 4, the butt was way over cooked, but had a decent flavor I thought, did you like the pork balls...the brisket, well your comments are very kind for them...
 
Wait a darn minute

Scoring in KCBS is not comparative. Each entry is scored on it's own merits and should not be compared with another entry. So how does all this statistics crap come into play. That can only happen if there is comparison.

So if Lotta Bull, Smokin Triggers, Cool Smoke, Quau, Pellet Envy and Habitual Smokers all hit the same table and they all think they cooked real good BBQ I'd say that no score would be below 8 and that's as it should be.

Now if 6 newbies all hit the same table and one slices brisket with the grain, one cooks it to 165, well you get the idea, then what scores would you see at that table. Probably 3-6 depending on the judges. That is to be expected.

What's important to me is that scores are consistent for an entry, especially in appearance. So in the first example if the average is 8.33 and one judge gave a 5 in appearance then I think it's something to look at. First the table captain should ask the judge about it and maybe the rep should also talk to the judge. If it happens consistently the judge is just not in tune with the norm and maybe needs to be educated. And I don't mean another CBJ class as it doesn't teach what's good.

We need to revisit the class and have pictures of all 9 boxes, pics of boxes with flaws and ask them to identify the flaws, etc. Also need books or a proxima projector to show examples of different boxes. Just some of my thoughts.

And for the record I have talked to man judges and there are some that have very preconceived ideas of what they want to see in appearance. I've been told that if there are not 2 layers of ribs then the box won't score a 9 ever. As a single layer guy not what I want to hear.
 
If almost no one gives out scores below 6, then we have a Lake Woebegon effect, where all of the children are above average. 6 becomes more like a floor rather than an average.

If nearly all of the scores are 6, 7, 8, or 9, then we have a 4-point scale. I have a colleague who's research specialty is measurement issues. He once did a study where he had people rate the size of objects on a 4-point scale (1 = very small, 2 = small, 3 = big, 4 = very big). The particular objects rated were a penny, a nickel, the moon, and the sun. He demonstrated that with a 4-point scale, there were no statistical differences in the sizes of the objects!!! If a 4-point scale can't distinguish between a nickel and the sun, then we ought to at least entertain alternatives to the current KCBS practice.

Someone earlier pointed out that using the full 2-9 scale would cause a lot of variance. But I can tell you, as a professor who teaches graduate level statistics, that variance is exactly what you need. I am *not* talking about judge-to-judge variance of a given entry. If the appearance scores are, say, 9,9,8,8,8,4, that is a different issue (a question of what in statistics is called "reliability"--I also think that this might be a problem with the KCBS system--but it is a different problem than the one I am addressing). No, I am talking about entry-to-entry variation rather than judge-to-judge variation. If there is little to no variance among entries--which can happen when you have 4-point scales--then we have a "restriction of variance" problem, which makes it very hard to find meaningful differences in the quality of the entries based upon their scores. As an extreme example, suppose there were no variance at all (everyone receives 888, say). Then it is impossible to distinguish good from bad cue. There will obviously be more variance in a 4-point scale, but, as my colleague's study shows, you still have problems picking out the the large from the small or, equivalently excellent cue from average. Not impossible, and not always, but certainly full-range scoring would be a statistical improvement.


--frank in Wilson, NY

Frank, thanks, and yes, I was talking about the judge-to-judge variance.

To the above, if KCBS went with it, they'd need to do much closer to
an MBN style scoring, where after the scores are in, they then rank order
the entries at the table, with 1 and only 1 of the entries getting their
10 (or in this case, their 9). The others would get a fractional point, so
if the 2nd best that day was extremely close to the best one, it might
get the 8.9 and the third best, not being as close, might get an 8.5, etc.

Ultimately, ties happen. They do now, and they will with any system.
I do like KCBS's tie-breaker rules. They're probably the best out there,
frankly.
 
Mark Twain, noted BBQ enthusiast and author once wrote, "There are three kinds of lies: lies, damned lies, and statistics." We need to be careful about the nature of the problem that we are trying to solve by statistical proof. The underlying problem is the competence of judges which brings into question the certification process. Fix the certification process and you will fix the statistical deviations.
(UNRELATED) One of my all time favorite quotes!
 
I had a judge come up to me this weekend, and told me that I need to become a CBJ and judge some. He said it would help me with ideas on how to make the boxes look, etc. He told me that it'd pick up a lot of ideas too on what people were turning in.. ( i didn't ask for any of this advice, btw).. I guess when you have a small rusty cooker and no banners, very unassumming site, people make assumptions. With this shoot-from-the-hip assumptions about me, I wonder how he judges BBQ ?
 
I had a judge come up to me this weekend, and told me that I need to become a CBJ and judge some. He said it would help me with ideas on how to make the boxes look, etc. He told me that it'd pick up a lot of ideas too on what people were turning in.. ( i didn't ask for any of this advice, btw).. I guess when you have a small rusty cooker and no banners, very unassumming site, people make assumptions. With this shoot-from-the-hip assumptions about me, I wonder how he judges BBQ ?

You need to hire that guy from Chicago as your media relations specialist:p
 
Statistics are very interesting. I agree with the gather info for a year and see what you have, post hoc.

My pork scored- 666 777 989 998 979 988

For me it was an 888. I get where people are pizzed about the outlying scores. Often "honest" people are really just jerks. Wish the outliers had to qualify the quantification.
 
Statistics are very interesting. I agree with the gather info for a year and see what you have, post hoc.

My pork scored- 666 777 989 998 979 988

For me it was an 888. I get where people are pizzed about the outlying scores. Often "honest" people are really just jerks. Wish the outliers had to qualify the quantification.

See, how on earth can you have appearance scores vary from 6 to 9?
That is absolutely, without any shadow of a doubt, a judging problem going
to training and oversight. Still, taste scores (in pork, not chicken or
ribs that can be from a different chicken/pig and taste different) shouldnt
vary this much either. HUGE problem. Taste, certainly is subjective, but
a six is basically next akin to *yuck* where 2 other judges say it's
fan-damn-tastic? 7's, 8's, I get it. Either the 6 or the 9's were wrong,
IMHO.


I am reminded of the time where in MIM scoring we had the first 4 judges
turn in their cards; everything looked great. Then the 5th card was
turned in, rating this one particular entry 7's in taste (which is the lowest
you can get). When asked why, he replied "it smells and tastes like
lighter fluid". We went back to the remaining pieces, and sure enough
there was a very distinct smell and taste of lighter fluid. Go figure.
I suppose some judges just like the taste of lighter fluid.... Scoring
systems can't fix this.
 
Last edited:
Mark Twain, noted BBQ enthusiast and author once wrote, "There are three kinds of lies: lies, damned lies, and statistics." We need to be careful about the nature of the problem that we are trying to solve by statistical proof. The underlying problem is the competence of judges which brings into question the certification process. Fix the certification process and you will fix the statistical deviations.

A favorite quote of mine as well. That being said, a study of data gathered over the course of a year would be the most objective way I can think of to truly determine what problems, if any, are present.

I'd like to know how many judges score high or low, often enough to stand out. I'd like to see how many judges routinely score everything virtually the same, etc....

At this point I think all we have is suspicion, anecdotal evidence, etc... If we have factual evidence of the problem, we can then determine the cause and proper corrective action.

While I understand your point, I wouldn't want consider trying to recertify every CBJ to some new standard unless it was the only option available. The chaos it could potentially wreak on contests is pretty ominous, especially if it wasn't done properly. I'm convinced that some of the scoring issues we have today are a result of different generations of CBJs, including some that are still trying to apply either the old start at 9 or start at 6 systems etc.... If KCBS records are intact, and accurate it would be pretty simple to determine if that's the case once a good set of data is available.

I'm not discounting your proposed solution, but I don't think now is the time to implement ANY solution until we really understand what problem(s) we need to solve.
 
If almost no one gives out scores below 6, then we have a Lake Woebegon effect, where all of the children are above average. 6 becomes more like a floor rather than an average.

If nearly all of the scores are 6, 7, 8, or 9, then we have a 4-point scale. I have a colleague who's research specialty is measurement issues. He once did a study where he had people rate the size of objects on a 4-point scale (1 = very small, 2 = small, 3 = big, 4 = very big). The particular objects rated were a penny, a nickel, the moon, and the sun. He demonstrated that with a 4-point scale, there were no statistical differences in the sizes of the objects!!! If a 4-point scale can't distinguish between a nickel and the sun, then we ought to at least entertain alternatives to the current KCBS practice.



--frank in Wilson, NY

As a student of Psychometrics .... I would love to see what they listed as possible confounds and mediators.

Without adequate training, no scale will measure with any real precision.
 
The tracking and "penalization" of outlier judges will skew the judging pool. No judge will want to be "labeled" an outlier and will worry about how his fellow judges are going to score and may hide his/her true opinion.

The very strong fact remains in all of this: There are very familiar faces at the top of every contest.
 
You need to hire that guy from Chicago as your media relations specialist:p


I keep trying to get Dr. BBQ to help me as well. Or at least treat me to a Boston's Italian Beef the next time he is in town... :becky:
 
I think it would be interesting to see judge's performance across contests. If the judge that gave a "4" in the 9,9,9,8,8,4 scenario is handing out 4s and 5s left across multiple contests, maybe the problem is the judge and not the Q.

I have mentioned in other threads that we got a couple of fours in pork at one event last year, and had two comment cards saying the pork was cold. Every CBJ class tells people not to score on temp, so when two judges out of six put that down in writing, you have to wonder about the training and CBJ mix of the judges.

Lowest score dropped helps, but what if you have a collection of newly trained judges? Many states in the northeast have one or two contests a year. There are a few judges really dedicated to Q that drive hundreds of miles to contests, but a lot of the time you get new judges or judges that judge one contest a year. Statistical analysis won't fix this.
 
Back
Top