Analysis of the National Residency Matching Program


This is a mathematical analysis of the algorithm used by the National Residency Matching Program to assign prospective medical residents to hospitals. It was adapted from a lecture by Steven Rudich at Carnegie Mellon. The algorithm is described on the official web site, but this gives a slightly different characterization that is more convenient for analysis. It then proves some properties of the algorithm that may be reassuring to some students entering the match.

The Matching Algorithm

For simplicity, assume that each student ranks all of the hospitals and each hospital ranks all of the students. Each hospital has a certain number of positions they would like to fill. Assume the sum of these is at least as large as the total number of students. Given these assumptions, each student can be assigned to some hospital. Otherwise, some students may be unmatched, which just makes the analysis a bit more complicated, but its actually fairly simple to deal with, as I explain below.

You can think of the matching process as occurring over a series of days. It's all simulated on a computer, but in effect this is what happens: Each morning the students go to the highest hospital on their list that has not been eliminated on an earlier day. On the first day, each student will go to their top choice. Each hospital sorts the students who show up according to the hospital's preference.

If a hospital has 5 slots, then the top 5 students are told, "You can come back tomorrow." This is the hospital's candidate set for the day. Those students will definitely show up again the next day because that hospital remains at the top of their list.

The other students are told, "You can never go here." They must cross the hospital off the top of their list. The next day they will go to the next-most preferred hospital. This continues until the day on which no student gets shafted. At that point, wherever the students are is their final hospital.

Properties of the Algorithm

The first thing we can prove is that the algorithm is guaranteed to stop. This is pretty clear. Each student begins with a list containing a finite number of hospitals. Each day, at least one student gets shafted, so the total number of entries on all the lists goes down by at least one. The maximum number of days the algorithm could last is the number of students times the number of hospitals.

The next thing to prove is that a hospital's candidate set can only remain the same or improve as time goes on. That is to say, the worst student in the candidate set on one day will not be better than the worst student in the set on a later day. This is because any student in the candidate set one day will definitely return on the next day. The only reason they will ever be booted from the candidate set is if a better student shows up. So the set can't get worse.

Next we need to introduce some more terminology. A pairing is an assignment of a hospital to each student. A pairing is unstable when student X is assigned hospital A but prefers hospital B while hospital B prefers student X to some other student, Y, that was given hospital B. A pairing is stable when there is no such situation.

We can prove that the matching algorithm produces a stable pairing. Let's assume that this is not true and that an unstable pairing results, such that student X is assigned hospital A but prefers hospital B and B prefers X but was assigned Y. Since X prefers A to B, X would have visited A before visiting B. Since X didn't end up at A, X must have been booted from A at some point. But somehow, A ended up with student Y who is worse than student X. This violates the theorem that A's candidate set cannot get worse. Therefore, we have a contradiction so the assumption must be false. Thus, the final pairing is stable.

Incidentally, since the algorithm always produces a pairing and the pairing is always stable, we can conclude that given any set of preference lists, there always exists at least one stable pairing. Now, there may be many stable pairings. For each student, we can define the possible hospitals as the hospitals the student is matched with in at least one of the stable pairings. A student's optimal hospital is the one they prefer the most of their possible hospitals. The pessimal hospital is the one they prefer the least. Similarly, each hospital has a set of possible students and an optimal and pessimal student.

A matching algorithm is said to be student-optimal if every student is always matched to their optimal hospital. This is the best of all possible stable matchings for every student. We will prove that this algorithm is student-optimal.

If it does not produce a student-optimal pairing, then at some point a student must have been rejected by his optimal hospital (since the student works down his list by preference). Assume X is the first such student and A is X's preferred hospital. Let S be the set of students that A chose over X because A prefers all students in S to X. Since X was the first student to be rejected by his optimal, no student in S has been rejected by her optimal. Therefore, the optimal for each student in S is either A or worse than A (as far as that student is concerned). We conclude that no student in S can ever be matched with a hospital better than A.

Now, because A is X's optimal hospital, it must be in X's set of possible hospitals. Therefore, there must exist at least one stable pairing that does match X and A. But in that pairing, there must be at least one student from S who is not paired with A (because X replaced her). That student, Y, must be paired with a hospital worse than A because of what we just concluded. Therefore, Y and A form an unstable pair. Y prefers A to its hospital and A prefers Y to X. Therefore, the pairing is unstable, which is a contradiction.

Thus, the assumption that there can be a student who is rejected by his optimal hospital was false. As a result, the pairing is student-optimal. Incidentally, the original centralized matching system was started in 1950, but the algorithm did not always produce stable pairings. In 1952, they switched to a hospital-optimal version of the algorithm. In January 1998 the algorithm changed again, apparently to the current student-optimal form.

One can also prove that the pairing produced by the algorithm is hospital-pessimal, meaning that a hospital is always assigned the students from its set of possible students that it least wants. I won't go into the proof in depth. But for those who want to play along at home: start with a student-optimal pairing, assume that there exists a stable pairing in which a hospital gets a worse student than they did in the first pairing, we can show that the second pairing has an instability, therefore the assumption was false.

Strengthening the Model

To make the analysis easier, we made the assumptions that every student ranks every hospital and vice-versa and that there are enough spots for all the students. In reality, of course, this is not the case.

However, we can make reality fit the idealized model if we assume that there exists one extra hospital. This is known as the You-Want-Fries-With-That Center For Cardiovascular Research, or YWFWTCCR. It goes on a students' preference list just below the last hospital that they actually listed and above all hospitals they failed to list. The nice thing is that YWFWTCCR has unlimited spots. So if your top choices don't work out, you're still guaranteed a job. And maybe even your very own uniform and visor!

Conclusions


Douglas Rohde
Last modified: Mon Mar 15 17:53:29 EST 1999