https://us.forums.blizzard.com/en/d2r/t/diablo-ii-resurrected-outages-an-explanation-how-we%E2%80%99ve-been-working-on-it-and-how-we%E2%80%99re-moving-forward/28164blizz finally coming out and talking about outages and stuffIn staying true to the original game, we kept a lot of legacy code. However, one legacy service in particular is struggling to keep up with modern player behavior.
This service, with some upgrades from the original, handles critical pieces of game functionality, namely game creation/joining, updating/reading/filtering game lists, verifying game server health, and reading characters from the database to ensure your character can participate in whatever it is you’re filtering for. Importantly, this service is a singleton, which means we can only run one instance of it in order to ensure all players are seeing the most up-to-date and correct game list at all times. We did optimize this service in many ways to conform to more modern technology, but as we previously mentioned, a lot of our issues stem from game creation.
A note about progress loss:
The progress loss some players have experienced is due to the way we do character locks both in the regional and global databases–we lock your character in the global database when you are assigned to a region (for example, when you play in the US region, your character is locked to the US region, and most actions are resolved in the US region’s database.)
The problem was that during a server outage, when the database was falling over, a number of characters were becoming stuck in the regional database, and we had no way of moving them over to the global database. At that time, we believed we had two options: we either unlock everyone with unsaved changes in the global database, therefore losing some progress due to an overwrite that would occur in the global database, or we bring the game down entirely for an indeterminate amount of time and run a script to write the regional data to the global database.
At the time, we acted on the former: we felt it was more important to keep the game up so people could play, rather than take the game down for a long period of time to restore the data. We are deeply sorry to any players who lost important progress or valuable items. As players ourselves, we know the sting of a rollback, and feel it deeply.
Moving forward, we believe we have a way to restore characters that doesn’t lead to any significant data loss–it should be limited to several minutes of loss, if any, in the event of a server crash.
This is better, but still not good enough in our eyes.
What we are doing about it:
Rate limiting: We are limiting the number of operations to the database around creating and joining games, and we know this is being felt by a lot of you. For example, for those of you doing Pindleskin runs, you’ll be in and out of a game and creating a new one within 20 seconds. In this case, you will be rate limited at a point. When this occurs, the error message will say there is an issue communicating with game servers: this is not an indicator that game servers are down in this particular instance, it just means you have been rate limited to reduce load temporarily on the database, in the interest of keeping the game running. We can assure you this is just mitigation for now–we do not see this as a long-term fix.
so basically they come out ans say the servers suck our engineers cant get the job done we dont know wtf we are doing but were figuring it out
that sound super promising im very confident they are going to fix the servers /end sarcasm
This post was edited by dragoneth on Oct 15 2021 06:48am