Posted 5 years ago by Steve Mathias
iSigner is currently being demonstrated through a client of iCommunicator, and is even used on a site in that 3rd party's control, which we do not have direct access to, but must support.
It was discovered that, on one page in particular, iSigner would not function, or would function inconsistently. The exact nature of the issue varied not only by browser, but seemingly randomly. Sometimes the video player would be available. Sometimes it wouldn't. Sometimes videos would be signed. Sometimes they wouldn't. Sometimes clicking the button to bring up the player worked. Sometimes it didn't. Things appeared to be a mess, and one of the biggest frustrations for our developers, let alone the client, was that there didn't appear to be any consistency on exactly what was happening.
Troubleshooting / Gathering Data
The first thing we had to do is catalog the variety of states that would occur, and the environments in which those states could be reproduced. In a situation where code seems to behave “randomly”, there are almost certainly race conditions (explanation below). These can be tricky because their very nature can appear random, which makes reproduction a challenge. However, gathering as many of the known states that occur as we can helps us to better identify where potential race conditions may appear, and gives some insight into where we might resolve them.
Review the other scripts on the page. It is possible for other scripts in the page to change the function of our scripts. While this is not true in an ideal world, sometimes third party scripts may conflict with ours, and redefine the same variable or function, or change a library we were depending on by reloading it.
Examine all browser workflows. When we're supporting a variety of legacy browsers in addition to current browsers, it is often necessary to implement multiple paths through the same code. Some browsers are lacking in features we may otherwise require, such as AJAX support, canvas, or native HTML5 video playback. When we are required to build a workaround on these, it is important to ensure that we gather all information we can on any of the different paths that can be taken. Flow charting this is often helpful if the interactions are particularly complicated. Once we know all the potential paths, it is important to understand which paths any browser will take, based on the features it has available to it. http://caniuse.com is a great resource for what features are available in a variety of browsers.
Generating a Hypothesis
After gathering the data, we were able to form a hypothesis that the issue was related to multiple factors, and in fact multiple race conditions. We narrowed it down to the following:
jQuery was getting reloaded at some point in the page, and it was being reloaded by a client script. This is not uncommon, as they may include dependencies in the same way our script does. However, this reset also flushes out any libraries we may have attached to the version of jQuery that existed before reload, removing our video library in those conditions.
Parts of our code were executing before everything they used was in place. This was an internal race condition which meant that sometimes it had everything it needed, and sometimes it didn't.
Implementing a Solution
Creating a new closure by wrapping the entire plugin in a self-calling function.
When we created the new scope, we also created a local variable named “jQuery”, and set it to match what was in the window.jQuery variable. Technically, the window scope is accessible in the function, but now we have a local reference to a version of jQuery that can stay the same, even if the one at the window level changes.
Localizing jQuery reference
We now keep a reference to the version of jQuery that the video library has attached itself to. This means that, if a new version of jQuery is loaded after the library was attached to the original, we still have the necessary jQuery implementation referenced. Because it's localized to the plugin's closure, all calls inside that area to jQuery will use the localized version of jQuery, but any calls to jQuery outside that closure will use the version available at the window scope instead. This allows us to ensure we use the version of jQuery we intend to use without having to add restrictions to the client's ability to manage their site.
Setting window scope variables that need to be accessible outside this closure.
When we use a frame implementation to grab data from the client's server, instead of AJAX (namely, in Internet Explorer 8), the frame makes a call back to a window-scoped function when it has the response data. Because the iframe is technically not in the plugin's closure (it was created here, but it's written into the window, and only gets window scope) we had to explicitly set a window level access to the closure's function, instead of a local variable, in order for the frame to be able to call that function.
Reducing and eliminating race conditions
There are a variety of ways to resolve race conditions. In an ideal world, we don't even have an architecture that allows them to happen. This particular plugin is complicated because one known potential race condition will always be “Did our dynamically-included video library script finish loading yet?”. Creating a script tag programmatically, as we have to do, runs in an asynchronous process. While that tag is being requested and loaded, the rest of the script still runs. There isn't really a way around this, so the normal resolution is to use an event or flag to know whether it has completed yet, and watch for that trigger to occur.
Another race condition that is inherent to the nature of this plugin is video integration. If we're interacting with another player plugin, we can't guarantee in a straightforward flow that that content has loaded before we get to the code that wants to interact with it. In this case, the race condition requires using a flag of some sort, and we have to use something like a recurring check to watch for it. We just use an if condition, and keep looking for whether the stuff we expect done is done yet. If it's not, we try again later. If it is, then we set the flag and process.
Confirming the Solution
During the implementation, each step was checked, and then rechecked as new steps and solutions were added. This required testing across all supported browsers. Further, we needed to test it natively in older browsers, as the compatibility mode or emulated environments are not always 100% accurate. While time-consuming, this is a critical interface to implementation when dealing with race conditions or highly complex scripts, as there can be some hidden dependency that you may not find. It's also possible that resolving one race condition opens up the opportunity for another one to execute that was always missed in the past, so confirmation becomes an iterative effort.
After implementation is functioning in our controlled environments, the next step is to go through our normal code review and deployment process and release it in to the wild. At that point, it is still critical that all of the production environments get fully tested, once again, in every supported browser. This will allow us to confirm that no unexpected changes in any of the production environments have introduced new challenges to overcome before notifying the client of our success.