Posted 5 years ago by Caitlin Macatee
Improving traffic to any website is a constantly evolving process. Among the best, and most obvious ways to increase traffic is to make the website more prominent on search engines, like Google and Bing. Each engine has its own method of ranking results for any given search, but both search-engines agree that a better user experience should influence the order in which websites appear on the list of returned results.
Additionally, both search engines (and their users) agree that being greeted with a “login or sign up” message after clicking on a search result is a bad experience; and, can even be considered “cloaking,” where the search engine thinks the website is deliberately showing robots something different than actual users.
This can be a security concern for the search-engine users. As such, both engines will drop a page’s ranking, or apply a tag to the post, indicating to the user that the site requires a subscription (if the website seems exceptionally suspicious, it will outright remove them from the index).
Having a page’s ranking dropped impacts how well users are able to find and utilize that page, so websites that use a membership system to control access to their content, either for profit or a consistent experience, have a unique problem: either their content needs to be made freely available, or their content will be much harder to find.
Introducing First Click Free
To combat the scenario described above, a concept was introduced known as “First Click Free,” which tries to reconcile the problem of gated content.
The basic premise is to allow ungated access to the first page a user lands on, if the user came from a search engine, at least once per day. Bing requires a minimum of five such visits, while Google requires three. Provided a website abides by this, the search engine will not reduce the ranking of the search result. This makes it easier to maintain a website’s visibility without sacrificing their membership system.
While the policy may seem simple and straightforward, the official wording used to describe the policy is vague, and the technical implementation ends up being a more complex, leaving a lot to be desired from the webmaster’s point of view.
Is it worth implementing First Click Free?
Before delving into the implementation concerns and approaches, let’s look at a few considerations to think about when weighing the cost/benefit of the implementation details.
First, should all pages bypass the login security for your website in order to maintain visibility? Some websites providing for-pay content may wish to only institute this policy for particular pages. Since search engines will index based on your site map, it’s possible to direct the robot to certain portions of your site and allow for free access in order to maintain website visibility without giving away content.
Next, how busy is the website and how much time can the server spend verifying users? Since this policy would force the website to authenticate anonymous users, absolute security is unlikely; however, a substantial amount of abuse can be prevented by adding components to the verification process, such as storing user information on the server and referring to it when an anonymous request is received. Unfortunately, they all add time to the request, more load on the server, and do not strictly comply with the “spirit of the policy.” (to quote Google’s blog).
Finally, how heavily is your website relying on caching? An inherently difficult process, caching a gated website becomes that much more complicated when adding additional paths-to-verification (such as first click-free). With the answers in mind, we can proceed with some of the implementation details.
Implementing First Click Free
Implementation of a first click-free policy is looking for a way to safely authenticate anonymous users every time they come from a search-engine. This can be tricky, how do we know that the user came from a search-engine, and how do we limit the number of clicks-per-day while minimizing how easy it is to abuse the system and gain unlimited free clicks? Unfortunately, Neither problem can be readily solved.
When attempting to identify whether or not a user has come from a search engine, the only recourse is to check the HTTP referrer sent along with the request. While generally sufficient, privacy concerns have grown enough that more and more users are cloaking or blanking the referrer field. This also forces an exploitable entry point by allowing users to set their referrer as if they’d come from a search engine. It is due to this irreconcilable exploit that Google and Bing updated their policies to limit the number of clicks required per-day. However, if we decide to limit the number of time a user can access content for free, we now have to authenticate an anonymous user.
Here, we have a couple approaches to take, each with some advantages and drawbacks.
The cookie approach
First, we can set a cookie. This has the advantage of ensuring that as many individual users as possible receive the first free clicks, following most closely in the spirit of the policy. Since we can look at and verify that the cookie hasn’t been used too many times, and since each browser on each user on each machine on any network (shared or otherwise) will have a unique cookie, we can be fairly certain that very few people will ever fail to get all of the expected clicks for free; but, we’re relying on data storage that is controlled by the user.
Again, privacy concerns play a part as many users will elect not to allow cookies, which disables the system entirely, conveying the possibility that the search-engine will decide that the policy is not implemented on the site. This results in penalties to the search result and reducing the quality of the user’s experience. Since the storage is controlled by the user, it is possible that it can be tampered: deleted without a way for the system to distinguish this from a user who hasn’t yet used their first free click, or by changing the cookie to allow for more clicks.
This is the least secure, but most compliant method. Depending on whether or not we’ve allowed the search engine to index the entire website or portions with low-to-no security concerns (ie we just want to know more about the user in relation to how they interact with the content as opposed to paid-for content, etc) this may be a complete non-issue. If we’re able to draw sufficient traffic with content we don’t mind being exploited, this is a complete win.
The server-side approach
If we do have deeper security concerns, and want to try to head off this level of exploit, there are a few steps we’re able to take at the cost of less consistent compliance with the spirit of the policy. Unfortunately, neither Google’s nor Bing’s policies are explicit or well defined, so this may come with the stipulation that it isn’t a practical solution as it would inhibit some users from receiving the benefit of the policy.
All is not lost, though: we can approach this problem with a server-side solution. With minimal impact, we can setup an IP table in the website database that explicitly tracks users by IP and monitors the number of free views that user has received.
This technically has an impact on the site's performance, but as iPv4 and iPv6 addresses can both be reasonably indexed in languages like SQL (or converted to a long, or pair of longs respectively reliably for language which cannot), and the value of the count column has no reason to be anything but an integer.
Excessively complex implementations, such as a WordPress installation with several plugins whose instantiation and start up are expensive may begin to feel the difference relatively quickly. Exceptionally busy websites may collect a sizeable IP table in a short time-span, eventually leading to a slowing of click-count retrieval. Though this can be mitigated by storing a, “death time” for the entry 24 hours after the initial click and cleaning the table on a rotating cycle.
Storing an IP table introduces a complication for any shared network which has the same external IP address. Since the system would not be able to distinguish between users, the entire router would become restricted to the number of free-clicks allowed collectively. They are unable to alter the database and wouldn’t be able to circumvent their allotted clicks without a proxy or by moving to another network, neither of which can be reasonably protected against.
Given the bevy of webmaster and security concerns, implementing this policy should always come with a very frank discussion about what the content owners expect and whether or not they can come up with content that is freely available so doesn’t need to pass through a gate at all, or content that could be exploited without being considered a loss. This is, and should be understood as, an investment to improve how well the website can be found.
That said, it provides a fantastic service to consumers and another way to attract those who might have skipped on by.