Recently, I procrastinated by trying to build a dead-simple “age”-structured model of the N. American tenure track (TT) ecology faculty job market. “Age” is in scare quotes because it’s not structured by chronological age of job seekers. Rather, “age” in the model is years post-PhD.
I did this because I was curious if I could use the model to infer anything interesting about the job market that’s not directly observable in the available data. I turned out I could–but not in the way I originally anticipated. I hoped I’d be able to fit the model to the observed data and then infer other, unobservable features of the job market from the model parameters. But actually, the model couldn’t fit all the data, which revealed an important feature of the job market that I’d omitted.
So here’s my model, and the data I tried to fit it to. See if you can spot the crucial omission from the model before I reveal it. 🙂
The model
It’s a discrete time model with annual timesteps. Every year, 1000 people who might seek TT ecology faculty positions in N. America get PhDs in ecology. That number is pretty arbitrary, but its exact value doesn’t much matter for our purposes. Those people are the age 0 people.
People of every age can do one of three things: they can leave the faculty job market forever without obtaining a faculty job, they can leave the faculty job market forever by obtaining a faculty job, or they can remain on the faculty job market for another year, “aging” a year in the process. Each of those three possibilities has some associated age-specific probability, and those probabilities sum to 1 for people of a given age. You can think of the age 0 people who leave the faculty job market as people who decide never to enter the faculty job market in the first place. (Aside: yes, some people with TT jobs do re-enter the TT job market, but they’re sufficiently rare that to a first approximation we can ignore them.)
The maximum age is 11 years post-PhD. Every age 11 person who doesn’t obtain a faculty job leaves the market forever. I picked 11 as the maximum age because my dataset doesn’t include any recently hired TT ecology faculty who were >11 years post-PhD at the time of hiring.
It was a deterministic model. For instance, I just multiplied the number of job seekers of age 0 by the age 0 probability of leaving the job market to determine the number of age 0 job market leavers, even though that resulted in a fractional number of age 0 job market leavers.
The age-specific probabilities of obtaining a faculty job, and of remaining on the job market for another year, are the parameters that need to be chosen so as to generate outputs that match the observed data. If you pick parameter values and simulate the model for many years, you eventually reach a stable age distribution, that produces some stable number of new hires out of age class each year. Those are the model outputs that we can compare to observed data.
Note that you need to interpret the model parameters carefully. For instance, the model does not assume, or imply, that faculty search committees select directly on age! Rather, variation in age-specific hiring probabilities from one age to the next is the model’s way of implicitly accounting for everything that might differ between applicants of different ages. Same goes for age-specific probabilities of staying on the faculty job market–they’re a way of implicitly accounting for lots of things.
The data we want to fit
We know from my various data compilations, and other sources, that:
- Somewhere between 200ish and 300ish ecologists are hired as TT asst. professors in N. America every year.
- The average ecology postdoc is between 4 and 5 years post-PhD. Let’s take that as a rough estimate of the average “age” of an ecology faculty job seeker.
- There’s a characteristic distribution of “ages” of recently-hired TT ecology faculty at the time they were hired. The mean is between 3 and 4 years post-PhD, but anything from 2-6 years is fairly common, and anything from 0-11 is observed. The annual number of hires of each age is a datum we’d like the model to reproduce.
- In any given year, about 42% of ecology faculty job seekers receive at least one offer. Let’s assume that everyone who receives at least one offer accepts an offer. That’s not literally true but it’s probably not too far off.
- At least 33% of ecology postdocs eventually obtain a TT job. The fraction of ecology faculty job seekers who eventually obtain a TT job must be at least a bit higher than that, because some ecology postdocs never seek an ecology faculty position. If we interpret people who left the faculty job market at age 0 as non-faculty job seekers, and everybody else as a faculty job seeker, then at least 33% of faculty job seekers need to eventually obtain a faculty job.
Those are the data we want the model to reproduce when it reaches a stable age distribution.
Model fitting
I brute-forced the model fitting, because this is only a blog post and if you think I’m teaching myself approximate Bayesian computation just for a blog post you don’t know me very well. 🙂 I randomly generated hundreds of thousands of parameter sets from uniform distributions, constrained so that the age-specific probabilities of various fates summed to 1, and compared the outputs to the observed data.
Then I realized that this was dumb, and that I needed to impose some additional assumptions to restrict the search to plausible regions of parameter space. Those assumptions seemed plausible to me, but I freely admit they’re based on nothing but gut instinct. I assumed that at least 25% of age 0 ecology PhD holders leave the faculty job market. I assumed that age-specific probabilities of getting hired can’t differ massively from one age to the next (say, by more than 20 percentage points in either direction from age X to X+1). I assumed that at least 50% of age 0 and age 1 faculty job seekers who don’t get hired stay on the job market for another year. And I assumed that, after age 1, age-specific probabilities of leaving the job market can’t differ massively from one year to the next. Then I randomly generated hundreds of thousands of parameter sets satisfying those constraints and checked whether any reproduced the data well.
Results and discussion
The best fitting parameter sets reproduced almost every feature of the data quite well (we’ll come back to that “almost” in a sec…). You can reproduce the annual number of new hires, the mean postdoc age, the full age distribution of new hires, and the fraction of job seekers who eventually obtain a TT job with parameter sets that say the probability of remaining on the job market stays pretty high from year 1 on, that the age-specific probability of getting hired is very low at ages 0-1, and that age-specific hiring probability peaks somewhere around ages 3-5. But you can’t really estimate the age-specific probabilities beyond age 7 or so with any precision. The model mostly just chalks up the rarity of hires of age >7 to rarity of job seekers of age >7.
But you shouldn’t trust any of those results. Because no parameter set–not even one that reproduces other features of the data badly!–comes anywhere close to reproducing the observation that 42% of job seekers get at least one offer in any given year. Every parameter set misses very low on this.
Did you already guess why? Ok, obviously I don’t know for sure why. But I’m pretty sure I know*: my model falsely assumes that everybody who ever goes on the ecology faculty job market does so immediately upon obtaining a PhD. In reality, it’s common for people to wait one or more years post-PhD before starting to apply for faculty positions. Presumably because they don’t want to bother applying before they’re “competitive” (and for other reasons too, I’m sure). Or perhaps I should say, I infer that it’s common, because it nicely explains why the best-fitting parameter sets drastically underestimate the proportion of faculty job seekers who get an offer in any given year. My model is counting many people who aren’t yet on the job market as people who are on the job market but didn’t get an offer.
So I (tentatively) learned something here. Apparently it’s quite common–indeed, much more common than I had thought–for ecology PhD holders who will eventually go on the academic job market to wait a while before doing so.**
In principle, we could build a model that allows people to wait before going on the faculty job market. But that model would have too many free parameters to estimate with the available data (really, there are already too many). Many different parameter sets would be about equally consistent with the data.
This is a nice little illustration of the use of quantitative models as a sanity check. It also illustrates the value of a quantitative model that predicts many features of the data, so that when the model is way off you can identify the source of the error and correct it.
*I was totally stumped about it for weeks before it suddenly hit me. Feel free to make fun of me for not realizing it from the get-go.
**Whether they should do that is another question, of course, but it’s one that these data and this model can’t answer. In large part because the answer would be very person-specific; everybody has their own unique circumstances and desires.