This morning, shortly after waking up, I got a text that a site was down and others were having problems all weekend. It took me minutes to diagnose the problem and minutes more to fix the downed site, but much of the day to deal with the repercussions. Somewhat stressful.
The problem involved a change to a Doctrine ORM entity we deployed Friday while forgetting to clear Symfony‘s cache. Apparently, Doctrine’s discriminator maps, used in an inheritance strategy, are stored in Symfony’s cache, but which entities are children of a parent are not, and get checked even if they aren’t being used. And of course, throws a fatal error if there is a mismatch. Granted, these sites are on an older version, so it may work differently in a newer one.
Our deployment system is fairly simplistic, and our testing of success is pretty manual: Just load a few sites and make sure they appear to be working, maybe check some logs. Obviously, that can miss things. I’m thinking of making a script that will loop through all sites on a given deployment version and do some quick gut-check tests to try to ensure that at least certain basic functionality is working. It would be relatively quick so that we actually do it.
Data was lost in some user transactions. I had already added an error listener to log submitted form data when errors occur, due to previous incidents, but data stored in sessions wasn’t being logged. I’ve already started working on logging more to hopefully capture everything we need when errors do occur.