The Standby Problem
by Jonathan Palfrey (1976)
To anyone who runs or rates Diplomacy games, the use of standby players is an infernal nuisance. I propose to discuss here only the rating aspect of the problem, as that's my area of experience. Fundamentally, the problem is that the standby philosophy is not consistent with the compilation of ratings. If you want to assess someone's ability it's obviously ridiculous to go about it by getting other people to stand in for him. The more widespread the use of standbys, the less meaning there can be in any rating system which relies on results from standby-infested games.
If a player drops out or resigns, he has given up his chance to win and should be rated as having lost. This seems quite clear; and the rating of dropouts thus poses no problems whether standbys are used or not. The problems that occur are related to the rating of the other players, who may be affected by and even beaten by the standby player of a country which is supposed to be dead.
Given this anomaly, the proper course of action would be to classify standby using games as variants and not to rate them together with pure (non-standby using) games. Personally I am somewhat tempted to do just this; so far I have refrained in the belief that there would be widespread opposition to such a move, but it might be worth taking a vote on the subject sometime. The trouble is that, a large number of games would have to be thrown out. Anyway, assuming that these games are to be rated somehow, how is one to go about it?
I classify standby-using games into two categories - affected games, in which standbys appear only as losers, and spoilt games, in which a standby wins or shares in a draw. In the case of affected games I think it is reasonable to ignore the effect of the standby on other players, as being a kind of random hazard. In the case of spoilt games, however, the problem cannot be glossed over in this way. If the original, dropout player of the successful country is to be rated as having lost (as is proper), then the game result as given must be altered in some way for rating purposes.
My method of doing this, which I believe to be fair and in most cases objective, is to rate the game as a draw between all players who didn't drop out at any time, less any player who would clearly have been eliminated even in the absence of standbys (i.e. players who were in fact eliminated before any standby joined the game). If all players are ruled out by these criteria (which is rare but not unheard of), I consider the game to be won or drawn by default by the player or players who were last to drop out.
So much for my method. I have heard of two other methods of rating standby using games, the American method and the Sharp method.
As I understand it, the American method consists of rating the original, dropout player if the standby loses, but rating the standby if he wins or draws. The consequence of this absurdity is of course that standby players are more overrated the more they concentrate on playing standby positions. A player who put his name down for every advertised standby position would inevitably top the rating list.
The Sharp method consists of rating the original player according to the standby performance. The argument for this is that the dropout is unlikely to remain active in the hobby, so it doesn't matter if he's overrated. This is rather an over-simplification: firstly, many active players have dropped out at some time (and some chronic dropouts remain active); secondly, in a rating system using an opposition factor, such as my own new Bayesian system (and the Dolchstoß system itself), if the dropout plays in more than one game his rating will affect the ratings of other people, so it is of some importance to get it right even if there is no interest in it for its own sake.
As you may conclude from the above, the use of standbys not only causes problems but is fundamentally subversive of the whole idea of ratings. I believe that a game should be a competitive contest between specified players, and not a meaningless process enacted by an unspecified number of interchangeable agents.