Treating health problems of computer systems is sometimes not much dissimilar as that of humans.
You probably have experienced yourself a health problem (or if not you will know someone who had a health problem) that didn't want to go away and for which the doctor couldn't find a cause. It could have been a serious problem or it could have been one of those little annoying things that seem to come and go.
Computer systems sometimes have the same problems.
Recently we upgraded our Oracle databases and as a consequence one of the systems that ran for a long time without issues regularly stopped working. We knew that the problems was database related, but what? Log files analysis did not help much. We found also hanging locks in the database but how did they get there? The database itself did not reveal much of it's secrets. We found one hint in some blogs relating to foreign key indexes and created a few of the missing indexes. We asked Oracle whether the specific behaviour could have been caused by the missing indexes. But no clear answer from Oracle.
We expected that it was something like the missing foreign key indexes. Probably we did something that was tolerated by the older releases of the database but not by the new one.
Going back to the old release is not really a good option but this could be necessary if we wouldn't find a solution.
The difficulty with these type of problems is that it can depend on user behaviour. At one moment in time, people need to do a certain series of tasks and if a few people do certain things in parallel, then the problem can occur. But that might happen at one moment in time and before this happens again, it can be weeks or months. This is not much dissimilar as finding out whether a certain health symptom is caused by a food allergy or not.
But at least then you deal with only one person. In our situation it is difficult to go back to all the users and ask them exactly what they did and at exactly what moment in time.
Our problem has not occurred anymore for a little while and we just hope that it was caused by the missing foreign key indexes. Otherwise we can expect it come back again to bite us.