System developed with Ken Butler, Gary Hatfield and Bob Stagat
URL for this frameset: http://elynah.com/tbrw/tbrw.cgi?2000/kpairwise.2.shtml
In order that the committee can have accurate pairwise comparisons on which to base delicate selection and seeding questions, we propose replacing the Ratings Percentage Index with a rating which more reliably combines won-lost-tied record and strength of schedule, and replacing simple winning percentage with a similar schedule-adjusted measure in the criteria of record vs other teams under consideration, record in recent games, and record vs common opponents. We propose no modification to the fifth criterion, head-to-head results.
Before describing a replacement for the Ratings Percentage Index, it is worthwhile to consider the goals of RPI itself. The RPI, like the rest of the selection criteria, considers only the outcomes (won, lost, or tied) of games between tournament-eligible teams, without regard to details such as home-ice advantage or margin of victory. It seeks to adjust a team's winning percentage by the strength of their schedule. If it were applied to the results of a league playing a full balanced schedule, it would rank the teams in precisely the same order as winning percentage. Another rating system which has these properties is the Bradley-Terry system, which has been applied over the years to such diverse fields as ranking chess players and modelling taste tests, and is known in college hockey circles under the name Ken's Ratings for American College Hockey (KRACH). The KRACH ratings for a pair of teams can be used to predict the percentage of games a particular team would be expected to win in head-to-head competition. For example, if team A has a KRACH of 300 and team B has a KRACH of 100, team A would be expected to win three times as often as team B head-to-head, i.e., accumulate a Head-to-Head Winning Percentage (HHWP) of .750. The actual KRACH ratings are those which predict an overall winning percentage (average of the HHWPs against each opponent, weighted by the number of games played against that opponent) for each team which is equal to their actual winning percentage.
In practice, finding the KRACH ratings requires a computer program and a knowledge of mathematics roughly associated with an undergraduate degree in the physical sciences. However, verifying that a set of ratings is correct is much simpler, probably moreso than calculating the Ratings Percentage Index, when the details of excluding head-to-head games from the latter are taken into account. To check that the KRACH ratings are correct, one need only add up each team's expected probability of a win (HHWP) for each game played and verify that it matches the total number of wins (with a tie counting as half a win) that that team recorded.
Since the KRACH ratings provide a predicted HHWP (head-to-head winning percentage) for any team against any other team, they can be used to calculate the winning percentage a team would accumulate if they played each other team an equal number of times. This expected Round-Robin Winning Percentage (RRWP) will agree with the actual winning percentage if a team has actually played a full round robin schedule.
There are of course other numerical ranking systems for college hockey, but all of the others in common use either consider additional factors not used in RPI and KRACH (e.g., CHODR, CCHP, and Massey), or they do not reproduce the familiar ranking by won-lost-tied record when applied to a balanced schedule (e.g., HEAL and RHEAL). KRACH is thus the most natural replacement for RPI, as it shares the same design goals. However, it achieves those goals more successfully; this can be seen via theoretical arguments (in brief, the nonlinearity of the system--the ratings themselves are used to judge the strength of a team's opposition--allows it to make more use of the most significant results), and also in practice by its response to the MAAC situation the past two years. The highest-rated MAAC team, Quinnipiac, was 11th out of 54 teams according to RPI this season, but only 44th according to KRACH. This is consistent with the argument that while Quinnipiac accumulated a 20-5-3 overall record, neither they nor any other MAAC team defeated a member of an established conference in nine games this year. The eight tournament-eligible members of the MAAC went 1-2 against Niagara and 4-13-2 against Army and Air Force, the other tournament-eligible members of CHA.
While the RPI does not always adjust for strength of schedule in an accurate way, three of the other selection criteria do not consider the strength of a team's opponents at all. A team's performance in its last 16 games is judged only by the won-lost-tied record it amassed in those games, even if another team may have achieved only a slightly worse record against substantially tougher competition. Similarly, the record against other teams under consideration does not account for which TUCs appeared on a team's schedule. Even record against common opponents doesn't consider the fact that two teams may each play their common opponents different numbers of times.
All three criteria take a subset of a team's games and look at the team's winning percentage in those games. We propose modifying this to adjust for the strength of the opposition in those games. That strength can be measured using each opponent's ordinary KRACH ratings. Given the set of games related to a particular criterion, and the KRACH ratings of each of the opponents, we can associate a record in those games with a hypothetical KRACH rating and vice-versa. For instance, suppose a team played four games against teams under consideration, two of them against team A, with a KRACH of 300, and two against team B, with a KRACH of 100. Since a hypothetical team with a KRACH of 300 would be expected to win half of their games against team A, and three-quarters of their games against team B (three times the KRACH means 3-to-1 odds), for a total of 2x1/2+2x3/4=2 1/2 wins, a hypothetical KRACH of 300 would correspond to a won-lost-tied record of 2-1-1 (or 1-0-2) in those four games. On the other hand, a record of 1-2-1 (or 0-1-2) in the same four games would be associated with a rating of 100 (since such a team would have an even chance of winning each game against team B and only a one-in-four chance against team A for 2x1/4+2x1/2=1 1/2 expected wins).
Rather than looking at just the won-lost-tied record for each team in the games related to a criterion, we find the hypothetical KRACH rating the team would have needed to achieve that record and use that for the comparison. So suppose team X's only games against teams under consideration were two each with teams A (KRACH 300) and B (KRACH 100) and they went 2-1-1; as described above, that would correspond to a criterion rating of 300. Suppose further that team Y also played two games each against two TUCs, team C and team D, each with a KRACH of 450, splitting the four games. It's easy to see that that gives them a criterion rating of 450. While the selection criteria now in use would give team X the edge in games against teams under consideration, for having a 2-1-1 record to team Y's 2-2, the proposed modification would take into consideration the fact that the TUCs team Y played were stronger and award them the criterion for having a higher criterion rating, 450 to 300. Note that the ordinary KRACH of team X and team Y, based on all their respective games, is never used in the calculation of the criterion rating.
If two teams have played all the same opponents the same number of times in games relevant to a criterion, the team with the higher criterion rating will also be the one with the better winning percentage in those games.