Fisher& Schwartz Defence Files: Statistical Opinion | bridge

march 8 2016 The second public hearing at the Special Ethics Committee (SEC) of the Israeli Bridge Federation was held on March 8th, 2016. The hearing examined the ‘board theory’. This refers to the handling of the bidding tray in such a manner so that information was imparted regarding the optimal opening lead.

We publish here a document penned by Revaz Jinjikhashvili: “Statistical Opinion”.

I am Revaz Jinjikhashvili. I analyze global market data and make statistical algorithms as my daily job for more than five years. Workedprivately, In GH Financials and now in Trend (R.I.M) Finance as a researcher (see short CV attached).

A.

I will try to check the evidence presented by Mr. Kit Woolsey, Ishmael Delmonte and some other from the statistical point of view and find out if the evidence given is sufficient enough to have ANY statistical value as a proof. I do not evaluate hands from Bridge stand point. I do not try to prove innocence or guilt of Lotan Fisher and Ron Schwartz.

I DO NOT (and am not willing) to receive any financial or non-financial compensation for this.I do this because I believe that if now or in the future statistical evidences will be concluded this way, it is highly likely that we will begin to prosecute the most talented players based on their success.In the current case the career and good name of two young people are at stake on what seems to me as absolutely insufficient basis.

B.

To find any pattern/code in behavior, whether it is market price movement, finding cure to illness etc. or in this case illegal signaling, there are certain steps to be taken, and they should be taken VERY CAREFULLY, because it’s very easy to get carried away and over fit the theory to data or manipulate the data conclusions to theory. Two primary ways of research have been used throughout history:

(1) Come up with a theory which makes scientific sense and verify it through statistical evaluation;

(2)Gather a huge amount of data under different circumstances and different types (explained below in headlines) in an attempt to come up with theory through data.

In our case neither Delamonte nor Woolsey came up with an independent theory (i.e. a theory that was not concluded from the same data). Each of them was edited the theory along with collecting the data (this is called data optimization). So the same data that was used for suggesting the theory was used for proving it, and therefore their theories are not independent.

So they should have work their theory by the second methodology, having a huge amount of data. But as can be seen easily they did not have a huge amount of data, and therefore they did not concluded a valid research.

C.

The Necessary steps for any research (and mainly for the second methodology) should have three differentiated steps:

(1)Samplingdata (usually largest and most accurate. In case of medicine, data on animals could be a possible example);

(2)Out of Sample data. This should be data previously unseen by the researcher, smaller in amount and generally less accuracy can be observed. Since the theory is fit to sampling data (apes are many times these kinds of candidates in medicine);

(3)Actual real dataor real-time data(in case of medicine it’s human: placebo vs possible cure).

Additionally to this we need as much data as possible to make sure that our measurements are not a statistical error (e.g. from bridge world: if we open weak 2 with ♥876542and get a top score twice in a row it still doesn’t make this opening correct).Last but not the least we need to have a VERY STRICT SUCCESS PATTERN (e.g. medicine cured/not cured or significant improvement/insignificant or no improvement in patient condition).

The presented accusations includefour matches of 16 boards each, a grand total of 64, out of which 37 are relevant (on other boards opponents are on lead). On seven of them the lead was too fast. And here we are left with 30 boards! Does it seem like enough data? On eight of them the tray stays, since we do not know if tray stays since the lead was fast or if any signal was involved(I would assume that answering non-bridge question like “what do I want my partner to lead” doesn’t take less time than knowing what to lead, in absence of STRICT DECISION MAKING RULES) furthermore in no bridge legal signaling system I know of, we can observe a “no preference” signal(e.g. Lavinthal) what usually is done is that absurd suit is signaled 1 that partner surely knows not to lead, Woolsey had often times used this excuse when suitable to the theory but neglected it when not. Thus I believe not counting these boards in statistics goes in favor of allegations since most of them are seemingly wrong anyway!

Although 22 boards is a very slim amount of data, at work I wouldn’t even try to make any statistical conclusions from it, still here I will try to measure things with what we have.

D.

As stated above we need three types of data to make any conclusions, which we don’t have. The proof of it can be found in Kit Woolsey’s words “Fisher places the board well over on Schwartz’s side of the table. This was the only signal we have for a spade lead this set. The hypothesis about the spade signal had been something different, but I’m pretty sure that what I am seeing here is the true spade signal. Others can perhaps verify or discredit this by looking at other videos.” If you are allowed to change/edit your theory the data is considered sampling (theory finding) data! We only have sampling data which already makes this theory VERY UNRELIABLE. Thus let’s assume that the very first match published chronologically is the sampling data and try to check it.

So first we have to find a pattern in behavior, generally it could be anything, any random behavior variable. Even here I will assume that we already know it is board placement only. Since any place could mean any suit I need to find four different places, thus the first four boards in case of virtually endless places on table and endless possible rotations and angles are not part of the statistics; they are samples that need to be confirmed statistically. We have 1♠, 4♥, 1♦, 2♣ alleged signals, four of them, one from each suit, is for measurement purposes, other four checking the theory. So what are the odds of four out of four lucky shots? 1/4^4=1/256 less than 0.25%. But this means that on a given match I can prove that 1 out of 256 pairs is cheating, even if they are not. All this is assuming that the match was picked randomly, but what if it’s not? What if it was picked out of 15 matches? What happens then? It already gets very close to 5% which means one out of 20 pairs can be proven to be cheating this way even if they are not. All of this is assuming that we have strict success pattern, meaning we know exactly what’s going on in the head of the “signaler”. Kit Woolsey claims that he “knows what they want” without introducing any rule how he “knows” other than subjective and not necessarily “bridgistic” logic. What if he is mistaken sometimes? Is it possible that he made one mistake in his judgment? How can we assume that he doesn’t make mistakes in the judgment on “what went on in their heads” at that moment? So if he made one mistake the odds change to: (1+(3*4))/4^4=13/256 this is a little over 4.5%. This figure is close to proving one pair out of every 21 pairs is cheating. What if we combine this and picking from 15 possible matches (in reality there probably are much more to pick from, e.g. ACBL matches)? In this case the calculation is 1-(1-13/256)^15=0.542.This means that the odds are over54%!In this way we can prove that more than one out of every two pairs is cheating!!!A given pair is more likely to be cheating than not! So do we have enough sampling data?

E.

What about out of sample data? We have 12 relevant boards on whichfour of them Kit is 100% wrong (he himself agrees or was proven so!). Again no rule is provided of how it was checked! Again a very small data sample, and again subjective opinion. Before going on further, think about it! With subjective opinion involved of a sole person, not even a panel, you can prove that virtually just about any pair is cheating, regardless of whether they really are or not!

Let’s first assume again that Kit Woolsey doesn’t make any mistakes, and let’s try to ignore the vast majority of either no signal or no time to signal board in out of sample data which is at all not consistent with sample data. But still what are the odds?3^4*12!/((4^12)*8!*4!)=3^4*12*11*10*9/((4^12)*24)=0.0024 (plus all previous odds which are ~0.0001).So it’s only 0.25%. This is again really low. What if Woolsey made (only)one judgmentmistake (note: mistakes above are not judgment mistakes)? The odds grow: 3^5*12!/((4^12)*7!*5!)=0.0115(plus all previous odds which are ~0.0025). This is already 1.4% which is one out of every 87 pairs. What if he makes judgment mistakes at the same rate as assumed in sample data (note: by the claim of defenders the rate is much higher!)? 3^6*12!/((4^12)*6!*6!)=0.04(plus all previous odds which are ~0.014). This is already 5.4%. So one out of 20 honest pairs can be proven guilty this way!

What if the set of matches isn’t picked randomly? There were at least four sets available to the best of my knowledge. Thus the calculation is as follows:1-(1-0.054)^4=0.2.This is 20%!Meaning one out of every five pairs can be proven guilty this way!

We cannot assume it was picked randomly since all the matches were available! And all of this is assuming judgment had been honest throughout, and not clouded by the fact that once we have this “brilliant” theory we want to fit the judgment of results to the theory even if it doesn’t really fit. Many people have lost HUGE MONEY in trading because of these kinds of low quality checks and low amount of data!

F.

Finally, we don’t have any real data. Real data would be a real time prediction by Kit where would Lotan or Ron place the board before it was placed. This by the way is THE most important part in any statistical algorithms, cures, medicine or anything like that!

Would you want to get a pill that has good odds on rats but has never actually been checked on humans? Would you trust that? If not, I do not see any reason to trust this!

Sincerely,

Revaz Jinjikhashvili

Tel-Aviv, 8/3/2016

***

(Visited 1,123 times, 1 visits today)