Determinism Is Not Reproducibility: Stop Using Random Seed 42

Many people believe that fixing a seed at 42 is a good practice for guaranteeing experimental reproducibility. Oh dear ...
But Don't Panic! Let us explain some concepts first.
To begin with, 42 is the “Answer to the Ultimate Question of Life, the Universe, and Everything” in the classical The Hitchhiker's Guide to the Galaxy. It is a great book that I really recommend reading.
The number itself is not the problem. The problem is fixing a seed in an experiment. You can find it in many tutorials: a line with something like random_state=42. Even worse, you can find many “gurus” on the internet claiming that it is a good practice to guarantee “reproducibility”.
It does not guarantee reproducibility at all. It may guarantee determinism, but that is a completely different concept.
To make the subject clear, let us use a non-computer-science-related example, where we want to compute the average salary of the population of some country. Since it is often unfeasible to interview all citizens of a country, we can instead get a sample from the population and interview them. The problem is: how do we define this sample?
One simple idea is to randomly select, say, 2048 citizens and ask them their salaries. The idea is that we took many people at random, and they can be a good enough proxy for the salary distribution of the entire population. If this is true, then if we repeat the process (i.e., repeat the experiment) and take another 2048 citizens at random, we expect that the average salary will be approximately the same as before. This is a reproducible result.
But what does fixing the seed have to do with this experiment? Well, in this example, a fixed seed would mean that every time we sample the population, we get exactly the same 2048 people. It means that every time we run the experiment, we get the exact same average salary result or, in other words, the experiment becomes deterministic.
At first glance, this sounds good. But it is not.
If, purely by chance, the 2048 selected people include only the richest people in the country (Elon Musk, Bill Gates, Warren Buffett, …), the experiment will lead us to a highly biased result. If another person tries to estimate the country's average salary by repeating our process with another random sample of 2048 people, this person may get a very different result.
If you fix the seed and ask other experimenters to use the exact same set of people, you are just telling them to reproduce the same biased result that you got, which is not the point! The point was to estimate the population's average salary, not to get the same wrong result over and over again.
Going back to random_state=42 in computer science, the problem is that many algorithms, especially machine learning ones, rely on random initializations of variables. If you want to propose an algorithm that converges to the correct answer regardless of the initial parameters, you should experiment with different datasets and multiple random seeds. If you fix the seed instead, you are only showing that your algorithm reaches the result for one very specific initialization, which may be particularly well behaved purely by chance.
I have seen many people, including Ph.D. students, fixing seeds and running the experiments many times to check whether the experiment is reproducible. What do they expect? A deterministic experiment reaching different results? What are they actually showing with this?
Fix seeds only when you really want to reach the very same result when you run an algorithm (and I will call it “running an algorithm” instead of “running an experiment” because, again, experiments must be reproducible). I do not see many applications for this, but one is when you are debugging an algorithm. If you know more legitimate applications, let me know.
If you really want to evaluate robustness, run experiments with multiple seeds and report the average and standard deviation of the results.