Jump to content


Photo

Lies, damned lies, and statistics


  • Please log in to reply
40 replies to this topic

#1 Darren Galpin

Darren Galpin
  • Member

  • 2,331 posts
  • Joined: April 00

Posted 05 June 2001 - 07:05

Firstly, does anyone know who was the source of the quote in the subject line?

Anyway, back to the matter in hand. I thought that the e-mail below, which I received yesterday, may interest those of you of a more statistical nature, or those of you who simply like a laugh. The new values for Fangio and Prost are wrong though, as they do not include shared drives or half points awarded for shortened races.......

-----------------------------------------------------

Dear Darrin,

"It's not so much the crook in modern business that we fear, but honest man who does not know what he is doing." -Owen D. Young

Never apply the arithemetic mean when occurrances under evaluation are weighted.

The "average" column on your "List of drivers who have ever scored WC points" website is defective; it is plagued by kurtosis, and must be revised as soon as possible to reflect the geometric mean (formula provided postscript).

The arithemtic mean is defective as a proxy for rank ordering race drivers when weighted occurrences are placed under statistical analysis. You need to revise it, or perhaps include a disclaimer that the ordinal ranking of these drivers is admittedly defective.

We don't use the arithemetic mean for evaluation purposes when observations are weighted. That went out back in the 1970s (the Lucas Critique). We don't even use it to generate "a priori" assumptions. The arithemetic mean is an inappropriate test-statistic for which to ordinally rank weighted occurrances. It's not your fault: they probably never told you that.

Racing's evolution, in tandem with the increased frequency of occurrances (more Grand Prix these days) and FIA's revision to the F1 point system, relegates to medeocrety point average as barometer for driver evaluation. The geometric mean is a better test statistic, but only under a consistent reward system. Before you can use the geometric mean, you must "deflate" the occurrances. 10 or 9 points, either way, but not 10 for Schumacher and 9 for Clark for their victories. Economists do the same thing for inflation.

The arithmetic mean is acutely sensitive to outliers (violates OLS normality assumptions). Fangio's average finish does not equate to 5.941 points. That's wrong. Nor did Schumacher average 4.484 points per start.

Correctly reweighting Fangio's victories by ten points, then using the geometric mean, Fangio's average finish equates to 4.86375 points.

By comparison, allocating 10 full points to Schumacher for every victory, his geometric mean average becomes 2.9756. Stewart: 2.8226. Senna: 2.7204 . Damon Hill: 2.06. Mario Andretti: 0.9815.

Contrast the variance between Senna and Schumacher, above, versus the variance you provide on your "List of drivers who have ever scored WC points" website. Your arithemetic average inflates the statistical significance between Schumacher, at Senna's expense. Sadly, it inflates Schumacher's statistical significance at the expense of Fangio.

Looks as though the revision to FIA's F1 points system has inflated Schumacher's point avarage. Weighting his victories nine points (as opposed to 10) reduces his ratio to 4.21 (to 7th from 4th). Unfortunately, revision to f1's point system artifically deflates the performance of yesteryear's drivers. Weighting Fangio's victories with the same 10 points allocated to Schumacher elevates his average to 6.3137.

Even with a discounted, arithemetic mean of 5.941, Fangio's record is sure to stand the test of time. But, your arithemetic mean is defective. Fangio's true mean is not 5.9 points. The geometric mean is the one we specify. If you would like me to recalc your spreadsheet using the geometric mean, then I would be most happy to do so free of charge, so long as your website is diligently maintained (despite the fact that a sufficient amount of time has elapsed, Barrichello's victory at Hockenhiem has yet to be recorded). Otherwise, I have included the formula you need in a numeric example that you'll easily follow:

... "the square root of quantity of the squared sums of the observations, divided by the number of occurrances equals the true mean of a weighted cross-sectional data trail." -Gujarati

Courtesy,
[name]
Graduate Econometrics, Cal Poly

PS- Have a look at what happens when we apply the Geometric Mean to Alain Prost (2.805):

(sqrt [(51*10)^2 +(35*6)^2 +(20*4)^2 + (10*3)^2 +(5*2)^2 +(7*1)^2])/ 199 = 2.805

... now, look at the one we derived for Jackie Stewart above, then compare this result to your arithmetic mean. Your website doesn't only inflate Prost, it incorrectly inverts Prost ordinally over Stewart.




-------------------------------------


I am constructing a reply to send later......



Advertisement

#2 unrepentant lurker

unrepentant lurker
  • Member

  • 347 posts
  • Joined: October 00

Posted 05 June 2001 - 07:23

Churchill, IIRC.

#3 unrepentant lurker

unrepentant lurker
  • Member

  • 347 posts
  • Joined: October 00

Posted 05 June 2001 - 07:56

yeah, what he said.

The good news is that he has definately found his calling in life.

#4 Felix Muelas

Felix Muelas
  • Member

  • 1,216 posts
  • Joined: November 99

Posted 05 June 2001 - 07:57

Darren

IMHO the best part of the "theme" is the name you've chosen for it.:lol:

As for the contents I have to confess that, even if I was supposed to find it captivating (which probably it is) I failed to get the central idea...Are you sure you have not deleted some words or paragraphs whilst editing it? :lol: :lol: :lol:

Thanks, my friend

Felix


#5 Darren Galpin

Darren Galpin
  • Member

  • 2,331 posts
  • Joined: April 00

Posted 05 June 2001 - 08:12

No words deleted or paragraphs removed! (apart from the name of the person)


The idea behind it is to try and remove the distortions caused by the drivers in the '50s driving fewer races. When calculating a normal average, you divide total x by the number y. The smaller x and y are (x say being total points, y the number of races), the more a small deviation or change in either number changes the value of the average. The guy is using a statistical fiddle to make changes matter less, as he calls it "OLS normality assumptions" (in my words, making the odd extreme/rogue result change the end result by a smaller amount), and couches the entire idea behind statistical jargon. Once I had got my dictionary out last night, I had a very good chuckle......

#6 100cc

100cc
  • Member

  • 3,178 posts
  • Joined: December 00

Posted 05 June 2001 - 08:56

I think what he said could've been said in 1 sentence with the same effect.

#7 Croaky

Croaky
  • Member

  • 193 posts
  • Joined: May 01

Posted 05 June 2001 - 09:14

I really should have kept count of how many words I read before my eyes glazed over:)

#8 SteveB2

SteveB2
  • Member

  • 228 posts
  • Joined: November 99

Posted 05 June 2001 - 12:55

I always understood that the quote was from Mark Twain.

#9 Don Capps

Don Capps
  • Member

  • 5,933 posts
  • Joined: May 99

Posted 05 June 2001 - 13:00

It was Benjamin Disreali, PM from 1874 to 1880, who authored the line in question. Here is the full quotation:

"There are three kinds of lies: lies, damn lies, and statistics."



#10 Darren Galpin

Darren Galpin
  • Member

  • 2,331 posts
  • Joined: April 00

Posted 05 June 2001 - 13:12

Thank you oh wise and knowledgable leader!

#11 Don Capps

Don Capps
  • Member

  • 5,933 posts
  • Joined: May 99

Posted 05 June 2001 - 14:17

As a researcher, this line is a constant source of 'inspiration.' :lol:

#12 Rainer Nyberg

Rainer Nyberg
  • Member

  • 1,768 posts
  • Joined: October 00

Posted 05 June 2001 - 21:12

Yes, I also always believed it was a Mark Twain quote.

#13 Leif Snellman

Leif Snellman
  • Member

  • 1,142 posts
  • Joined: February 00

Posted 05 June 2001 - 22:04

He is correct of course. It is the WAY he is saying it that makes us :lol: :lol: :lol: :lol:

#14 Vitesse2

Vitesse2
  • Administrator

  • 43,403 posts
  • Joined: April 01

Posted 05 June 2001 - 22:31

What gets me is that this guy(?) is probably using this as part of some thesis or dissertation and he can't even spell "arithmetic"!!

Or is that just me being pedantic again???

Whatever, I lost the will to live about half-way down the second paragraph .... and please note that my usual signature does NOT apply in this case only!!!:lol::lol:

#15 Pete Stanley

Pete Stanley
  • Member

  • 486 posts
  • Joined: February 99

Posted 05 June 2001 - 22:33

Samuel Clemens used it in "Following The Equator". He attributed it to Disraeli.

#16 Roger Clark

Roger Clark
  • Member

  • 7,570 posts
  • Joined: February 00

Posted 05 June 2001 - 23:47

Is there something wrong with the formula? If you use it to calculate the geometric mean of the numbers 4 and 6 you get

(sqrt[(4*1)^2+(6*1)^2])/2

= (sqrt[16+36])/2

=(sqrt[52])/2

=7.2/2

=3.6


a mean which is less than both the numbers in the sample doesn't seem right.

#17 Don Capps

Don Capps
  • Member

  • 5,933 posts
  • Joined: May 99

Posted 06 June 2001 - 04:11

You mean this was a quiz?!?! :eek:

#18 Kuwashima

Kuwashima
  • Member

  • 330 posts
  • Joined: January 00

Posted 06 June 2001 - 04:50

Racing's evolution, in tandem with the increased frequency of occurrances (more Grand Prix these days) and FIA's revision to the F1 point system, relegates to medeocrety point average as barometer for driver evaluation.

OK, so my speling's not perfekt, but i think his is realy medeocer!!! :cool:

#19 Roger Clark

Roger Clark
  • Member

  • 7,570 posts
  • Joined: February 00

Posted 06 June 2001 - 05:05

Originally posted by Don Capps
You mean this was a quiz?!?! :eek:



No it's an exam. No talking and don't write on both sides of the paper at once.

Advertisement

#20 Leif Snellman

Leif Snellman
  • Member

  • 1,142 posts
  • Joined: February 00

Posted 06 June 2001 - 05:39

Originally posted by Roger Clark
If you use it to calculate the geometric mean of the numbers 4 and 6 you get (sqrt[(4*1)^2+(6*1)^2])/2 =3.6
a mean which is less than both the numbers in the sample doesn't seem right.

You know Roger in statistics ANYTHING can happen! :drunk:

#21 Marcel Schot

Marcel Schot
  • Member

  • 5,459 posts
  • Joined: November 98

Posted 06 June 2001 - 05:49

Originally posted by Roger Clark
a mean which is less than both the numbers in the sample doesn't seem right.

Which is probably why the topic is named as it is :lol:

#22 Croaky

Croaky
  • Member

  • 193 posts
  • Joined: May 01

Posted 06 June 2001 - 09:01

How do you write on both sides of the paper at once?:)

#23 Roger Clark

Roger Clark
  • Member

  • 7,570 posts
  • Joined: February 00

Posted 06 June 2001 - 17:43

Originally posted by Croaky
How do you write on both sides of the paper at once?:)


Statistically, 25.2% of the populatioon do it 14.9% of the time, I'm told, and they have done since 1066.

#24 oldtimer

oldtimer
  • Member

  • 1,291 posts
  • Joined: October 00

Posted 07 June 2001 - 01:18

Did Fangio get the tail of his Maserari so far out of line on the downhill sweeps at Rouen because he was pre-occupied with the question of whether his geometric meam average was 4.86375 or 6.3137? ;)

Or was he just enjoying himself?

BTW folks, statistics can be used sensibly and informatively, but the practicioner needs to be more interested in extracting usable information than grandstanding.

#25 Don Capps

Don Capps
  • Member

  • 5,933 posts
  • Joined: May 99

Posted 07 June 2001 - 03:01

When I started grad school in the dark days of the mid-20th century, my traditionalist education as a historian got thrown out the window when I went to work as the TA/RA (Teaching Assistant/ Research Assistant -- we 'Taras' were in theory the 'Best & the Brightest' and how I ended up one was a perpetual mystery to one and all) for My Professor. On the wall were two samplers -- one was the Disreali/ Clemens quotation and the other proclaimed 'If It Is Not A Number -- It Is Not Important.'

I found that whatever I thought about research was 13th Century University of Paris stuff. Indeed, my grad school education allowed me to survive the tyranny of statistics during the years and years that I toiled in the bowels of The Pentagon.

Our first question -- among those of us who survived long enough to be Knights of the Grand Order of Cynics with the Grand Double Cross with Oak Leaves, Swords, and Diamonds -- what do you want the answer to be....?

Math and statistics are wonderous things, objects of great joy and confusion when you know how to use them...

:lol:

#26 Marcel Schot

Marcel Schot
  • Member

  • 5,459 posts
  • Joined: November 98

Posted 07 June 2001 - 05:32

Originally posted by Don Capps
When I started grad school in the dark days of the mid-20th century

You got me reading "When I started grad school in the dark days of the mid 20s" :eek:

Bottom line of the story is that with statistics, you can make anything look the way you want it to be. Most simple question in our field is the answer to the almighty question "Who's the best F1 driver of all time". Want Prost? Say he's got the most wins. Want Senna? Say he's got the most poles. Want Fangio? Say he's got the most titles. Want Clark? Go figure out the number of wins per starts related to the number of past/present/future world championship on the average grid in his races or whatever :) Hell, Pedro Diniz is the recordholder of displacing square feet of grass in a single Grand Prix :lol:

PS Does it show that I've been diving into baseball stats again recently?

#27 Barry Boor

Barry Boor
  • Member

  • 11,557 posts
  • Joined: October 00

Posted 07 June 2001 - 21:28

Slightly OT, but HEY, Oldtimer, I walked down from the start to the Nouveau Monde Hairpin at Rouen a couple of weeks ago and I swear Fangio passed me 9 times on my way down.

I'm happy to say that Dan passed me 11 times!

#28 Roger Clark

Roger Clark
  • Member

  • 7,570 posts
  • Joined: February 00

Posted 07 June 2001 - 22:21

Originally posted by Barry Boor
Slightly OT, but HEY, Oldtimer, I walked down from the start to the Nouveau Monde Hairpin at Rouen a couple of weeks ago and I swear Fangio passed me 9 times on my way down.

I'm happy to say that Dan passed me 11 times!


Yes, but I passed you 12 times in kpy's 406!

#29 Barry Boor

Barry Boor
  • Member

  • 11,557 posts
  • Joined: October 00

Posted 08 June 2001 - 06:27

Oh dear, I think you probably did! Trouble is, half those times you were going the wrong way! Ickx just missed you twice!

#30 david_martin

david_martin
  • Member

  • 1,989 posts
  • Joined: October 00

Posted 08 June 2001 - 13:38

Originally posted by Roger Clark
Is there something wrong with the formula? If you use it to calculate the geometric mean of the numbers 4 and 6 you get

(sqrt[(4*1)^2+(6*1)^2])/2

= (sqrt[16+36])/2

=(sqrt[52])/2

=7.2/2

=3.6


a mean which is less than both the numbers in the sample doesn't seem right.



That is wrong.

from Handbook of Mathematical Function by Abramowitz and Stegun, p10:

The geometric mean of n Quantities, G
G=(a1 x a2 x .... x an)^(1/n) ak>0, k=1,2,...,n


so the geometric mean of 4 and 6 = (4 x 6)^(1/2) = sqrt(24) = 4.899 :)



#31 Timekeeper

Timekeeper
  • Member

  • 74 posts
  • Joined: April 01

Posted 08 June 2001 - 14:13

I've really enjoyed this thread. Seeing David's avatar reminded me of another quote from Einstein that my boss has pinned up on the wall at work which sums up what some of you have said.

"Not everyting that can be counted counts, and not everything that counts can be counted".

Just to turn my own pedant mode on for a moment. Pete Stanley was almost right with the source of the original quote ie. that it was attributed to Disraeli by Mark Twain (aka Samuel Clemens). But it is not from "Following the Equator" but from his autobiography. It comes from a discussion on how long it took to write his books. The quote with the previous paragraph, which I think you'll find relevant, is:


I was deducing from the above that I have been slowing down steadily in these thirty-six years, but I perceive that my statistics have a defect: three thousand words in the spring of 1868 when I was working seven or eight or nine hours at a sitting has little or no advantage over the sitting of today, covering half the time and producing half the output. Figures often beguile me, particularly when I have the arranging of them myself; in which case the remark attributed to Disraeli would often apply with justice and force:

"There are three kinds of lies: lies, damned lies, and statistics."




#32 Don Capps

Don Capps
  • Member

  • 5,933 posts
  • Joined: May 99

Posted 08 June 2001 - 14:21

How many other racing fora have discussions like this?

#33 Croaky

Croaky
  • Member

  • 193 posts
  • Joined: May 01

Posted 08 June 2001 - 14:26

These formulae are overly complex anyway. Many science students use the simple method of multiplying by Fiddler's Constant. This is the number that you must multiply the *actual* answer by to get the answer you really wanted. It works every time :)

#34 Darren Galpin

Darren Galpin
  • Member

  • 2,331 posts
  • Joined: April 00

Posted 11 June 2001 - 07:12

It gets better! Here is my reply, and his answer to that. Anyone for game theory????

As an aside, I entered a geometric mean calculation into the database I use for generating the statistics, and there was very little change in the scheme of things. All of the mean values were lower, and people changed by one or two positions at most.


--------------------------------------------------------------

Dear Sammy,

"There are Lies, Damned Lies, and Statistics....." - Benjamin Disraeli

Thank you for applying such a rigorous statistical analysis to the issue of driver points and races. It's been a while since I have used statistical analysis (last performed during many-body theory during my Masters in Physics), but it seems to me that the Geometric mean is a weighted Root-mean-square. But first, I need to clarify exactly what I have done.

The race results are calculated using a re-weighted points scoring scheme (such as the purchasing-parity methods quoted in The Economist magazine), such that for each driver the points scoring used is a 10-6-4-3-2-1 basis, with shared drives being calculated by dividing the number of points available by the number of drivers, and the points shared out equally. This ensures that the lower points scoring available in the 1950s is re-balanced. However, I admit that they had fewer opportunities to score points, hence the division of the points by the number of races to get the arithmetic mean. As they contested fewer world championship races (but not races in total, as they contested far more non-championship events), the aim with the average was to provide a basic way of comparing the drivers against each other which people can easily understand. Admittedly it has its short comings. It is not rigorous from an academic point of view, and in no way does it cater for the performance of the car available. For instance, Fangio always competed in the best machinery for the top teams. Someone like Stirling Moss did not compete in the best machinery all of the time.

However, there is a flaw in your analysis, in that you assign full points to Fangio for races where he only scored half points due to a shared drive. So, Fangio's results are as follows:-

points square pnts
1st (full points) 22 220 48400
1st (half points) 2 10 100
2nd (full points) 8 48 2304
2nd (half points) 2 6 36
3rd 1 4 16
4th (full points) 4 12 144
4th (half points) 2 3 9

RMS 225.8517213
Geometric mean 4.42847

This lowers Fangio's results even further. You also need to take into account races where half points were awarded, such as Monaco 1984 and Australia 1991, where the races were stopped before 75% distance (both due to rain in this case).

I have included a MS Works database which accurately records all drivers points and top six finishes to the end of 2000, and calculates the arithmetic mean using the above mentioned method for calculating shared drives. I would be more than willing to include the geometric mean of these drivers in the statistics page, and to explain the difference between these two approaches.

Regards,

Darren


---------------------------------------------------------------

Hello There Darren!

>However, there is a flaw in your analysis, in that you assign full
>points to
>Fangio for races where he only scored half points due to a shared drive.
>So,
>Fangio's results are as follows:-
>
> points square pnts
>1st (full points) 22 220 48400
>1st (half points) 2 10 100
>2nd (full points) 8 48 2304
>2nd (half points) 2 6 36
>3rd 1 4 16
>4th (full points) 4 12 144
>4th (half points) 2 3 9
>
> RMS 225.8517213
> Geometric mean 4.42847

But, you get the gist of my contention: the geometric mean better reflects the true mean than does the arithemtic mean. Fangio's career average finish does not equate to 5.941 points. That's wrong. The arithemetic mean is defective. Kurtosis pulls the mean to the right of the standard normal curve... Way to the right.

True... half points could be taken into account, consistent to F1 policy, but perhaps we not be so quick to blindly apply a similar methodology without question to the equitibility of that policy.

Suppose we apply half-point credit, consistent with f1 policy, for shared drives. Certainly, no one would question whether or not this should be done, in that we simply apply the concept of equity, consistent to f1 policy. However, in the vernacular, the mistake we make in doing so is we validate of what could ultimately be imperfect reward system. To apply half points to Fangio for shared drives, 4.5 points, is to rank order that particular occurrance equilivant to the reward that lies midpoint of second and third place. What we ultimately wind up with is a "cultural" dilemma (I'll need to explain what I mean by this)."

In game theory, we evaluate reward systems from the basis of a conflict matrix, differentiated by fact and value. With a computational problem, fact and value agree, so all we have to do is calculate the solution (computers are terrific for this). Next highest up in ordinality is "legal conflict," were value agrees, but facts do not, so it's a mere matter of verification before we apply a remedial measure (the kid who blew up the building in Oklahoma City perhaps come to mind). With political conflict, facts agree, but value does not. Cultural dilemmas (highest up the conflict matrix), neither fact nor value agrees. Cultural conflict breeds paradox, irony, and contradiction.

For example, let's suppose F1 implements a point reward for pole position and fast lap. We will now find ourselves in a cultural dilemma, in that we must now reconcile whether or not the single point implementation equates equitably, and consistently, without contradiction, to similar outcomes.

Externalities notwithstanding (would implementation of a fast lap point increase the incentive for teams to run ever light fuel loads, softer tyres?), does a point reward to Michael Schumacher for pole position equate to a 6th place finish by Michael Schumacher? Are they the same thing? If so, then does transitive preference logic imply that a 6th place finish for Schumacher is equal to a fast lap point rewarded to Alesi driving his hopeless Acer-Prost? Would a fast lap by Alesi at Monaco have been as astonishing as his 6th place finish? Similarly, does a shared drive equate to a third place finish? A second place finish? Right in the middle? Who can say? Left each to their own devices, two people could not possibly derive an identical reward without collusion and compromise. The result would be arbitrary.

Blindly applying f1 policy without question in generating ordnality is a de facto endoursement of what may be an imperfect, but irreconcilable reward system. You are quite right: Inflation of Fangio's arithmetic or geometric mean was likely at the expense of his many teammates, but the arithmetic mean not only procyclically compounds the contradictions as a function of "cultural" conflict... it breeds opportunity for rendering spurious ordinalities (the Prost-Stewart contradiction comes to mind; perhaps many more).

>I have included a MS Works database which accurately records all drivers
>points and top six finishes to the end of 2000, and calculates the
>arithmetic
>mean

The occurrances are not normally distributed. The arithmetic mean is biased. It's defective, even for generating an a priori assumption. The geometric mean better reflects the true mean than does the arithemtic mean. The arithmetic mean yields biased results (Kurtosis: skewed far to the right), and the geometric mean becomes the principle tool we need to apply to generate appropriate a priori assumptions, and an unbiased ordinality, as a function of the many outliers that plauge your cross sectional data trail.


Thanks,
-Sammy



#35 Marcel Schot

Marcel Schot
  • Member

  • 5,459 posts
  • Joined: November 98

Posted 11 June 2001 - 07:34

Originally posted by david_martin
so the geometric mean of 4 and 6 = (4 x 6)^(1/2) = sqrt(24) = 4.899 :)


Which proves 2 things:
1) geometric mean is crap :)
2) there's no way to effectively compare drivers from different periods (ok, this is not directly because of that quote, it just bring the thought back up)

I propose a new definition of statistic:
a. lie consisting of numbers
b. fun way to spend time and always be right in a discussion

:)

Darren : that last post of you had a geometric mean average of 4.392 syllables per word :stoned:

#36 Barry Boor

Barry Boor
  • Member

  • 11,557 posts
  • Joined: October 00

Posted 11 June 2001 - 18:24

Aaaargh!!!:drunk: :stoned: :confused: ENOUGH ALREADY.

#37 Schummy

Schummy
  • Member

  • 1,027 posts
  • Joined: February 01

Posted 11 June 2001 - 22:32

I really wish that guy is not a graduate! His(her?) email is funny and nonsense. Actually it is more fun if you knows about statistics! :)

As it was said earlier, geometric mean is not what he says, he uses a sort of bizarre variety of cuadratic mean. Moreover, if you have zeroes in data you can't use geometric mean as it always gives zero! :eek:

It's funny the reference to kurtosis, also, in this contest, to choose between arithmetic and geometric mean.

Maybe I've taken it too seriously ;) but anyway I've had good fun :)

BTW, Darren, I'd like to know your calculations. I love NUMBERS, I would prefer numbers to Heidi "Coulthard"! :blush:

#38 oldtimer

oldtimer
  • Member

  • 1,291 posts
  • Joined: October 00

Posted 12 June 2001 - 01:33

Darren, anyone undertaking a statistical analysis who writes 'the geometric mean better reflects the true mean (my emphasis) than does the arithemtic(sic) mean' clearly has a partial understanding of the subject. This partial understanding then opens the door for him/her to take liberties with both statistics and language, as we have seen to our amazement and entertainment.

It seems to me that you are looking for a factor that will allow ranking of performance. Unfortunately, you have used the word 'average', and apart from lighting a ferocious gleam in 'Sammy's' eye, you may have put yourself in Mark Twain's position: "Figures often beguile me, particularly when I have the arranging of them myself." (thank you, Timekeeper). You are engaged in looking for the average of a fruit salad. I suggest you drop the word and the concept.

There is a story about averages. It is about the statistician who drowned in a pool with an average depth of 6 inches.



#39 Wolf

Wolf
  • Member

  • 7,883 posts
  • Joined: June 00

Posted 12 June 2001 - 02:02

Oldtimer- that reminded me of similar observation, which will prolly loose something in a translation. :(

If half of population eats meat and other half eats cabbage, statistically looking everybody's eating stuffed cabbage rolls (very popular dish over here).

Advertisement

#40 Roger Clark

Roger Clark
  • Member

  • 7,570 posts
  • Joined: February 00

Posted 12 June 2001 - 05:21

Statistically, what is the probability that the next person who walks into this room will have more than the avarage number of legs for the population as a whole?

Answer: it is almost certain.

think about it.

#41 Marcel Schot

Marcel Schot
  • Member

  • 5,459 posts
  • Joined: November 98

Posted 13 June 2001 - 08:34

Roger : (999*2+1*1)/1000 = 1,999, which leaves 99.9% of the population with a number bigger than that. Figure out the chances that that happens in any given statistical comparison :)

Sticking with the original subject and especially oldtimer's mentioning of a factor, that's indeed the most probable thing. Yet this will always cause disapproval among some people, simply because there's no single way to go. For the simracing league I run in, I've made overall rankings which combine various classes. Based on previous years, I've developed a formula, which to my insight shows the best comparison, but I'm fully aware that it could have been differently, with different results.

A full comparison simply isn't possible, because there are so many factors influencing the results, in real racing even more than in simracing. A simple example is last season. People might say that Michael Schumacher's performance was actually worse than it appears, but that he was lucky enough to have more than the average number of wet races, in which he's regarded as a specialist. So you'd want to include a wet race factor? Was it a full wet or partially? Or even just a drying track or wet qualifying which allowed him to start higher on the grid than he usually would? Even in full wet there's differences. Was Barcelona'96 as wet as Monaco'84? Did the amount of rain influence the results? Does the influence of rain at all vary from track to track? We're looking at a model more complex than those for macro-economic predictions, I fear :)