Amazon has shut down Alexa.com. While it may not be immediately obvious, the decision to kill off the popular web traffic analysis and website ranking service does have some impact on the cybersecurity industry.
When accessing alexa.com, users are now greeted by an end of service notice that says the site was retired on May 1, 2022.
Alexa was founded in 1996 and it was acquired by Amazon in 1999. Amazon announced its decision to retire the service in December 2021. The Alexa Top Sites and Web Information Service APIs will be retired on December 15, 2022. Amazon did not share any details on why it shut down Alexa, only saying that it was a “difficult decision.”
One of the most popular Alexa services was “Top Sites,” which provided free lists of websites ordered by Alexa traffic rank.
The Alexa Top 1 Million list has been used by many in the cybersecurity industry, including to analyze the security practices and posture of the world’s most popular websites, and to create lists of sites that can be trusted.
DomainTools, a company that provides intelligence based on domain and DNS data, has provided the Alexa rank of a website to customers in an effort to help them determine if a certain site should be blocked.
The logic is that if a domain is ranked high in the Alexa list, then it’s likely popular, and blocking that domain for users from within an organization could cause problems.
DomainTools said in late April that it would be generating its own list, and the company concluded that the best approach would be to combine four types of data: domains that users request in their browser, domains requested by a user’s system in DNS (this is tracked by Cisco Umbrella), domains requested across an entire organization’s DNS (this is tracked by Farsight Security, which DomainTools has acquired), and domains scored by their connections to each other.
This data will be collected from Cisco Umbrella, Farsight, Netcraft (top 100 data collected by its browser plugin), and Majestic Million, which provides a free list of the top 1 million websites based on the number of referring subnets.
The data will be combined into a single list using an average methodology named Tranco, which a team of researchers from the European universities of KU Leuven, the Delft University of Technology, and the Grenoble Alps University described in 2019.
Victor Le Pochat of the imec-DistriNet research group at KU Leuven, one of the researchers involved in the Tranco project, told SecurityWeek that, in the short term, security teams will need to identify any dependencies on the Alexa list. Certain processes could break due to their inability to fetch the list.
“In the longer term,” Le Pochat explained, “these researchers should consider for what purposes they use popularity rankings, and whether they fully understand the implications. For example, we showed in our 2019 paper that these rankings contain known malicious domains. This need not be surprising — if a malicious campaign is widespread, any domain name it abuses does technically become ‘popular’.”
“One way to cope is to require popularity over a longer period of time and across vantage points. We integrated this idea into our research-oriented Tranco ranking, where we aggregate over 30 days and multiple source lists,” the researcher added.
Andrew Hay, COO at LARES Consulting and former director of research at OpenDNS, believes the largest impact from the Alexa shutdown will be “on the algorithms that security vendors leverage to provide analytics, blocklists, and baseline heuristics to their respective customers.”
“Many vendors use the Alexa rankings to baseline typical traffic for certain types of websites (e.g., online retailers, news sites, entertainment outlets, etc.). This profiling data can then be associated with ‘like’ websites to determine their validity or efficacy for the vendors’ clients. The loss of the Alexa repository will find security vendors scrambling to find a new external source of traffic data – a potential monetization opportunity for the large Internet Service Providers,” Hay explained.
Examples of Alexa alternatives mentioned by Hay include Semrush, Ahrefs, Moz, and Serpstat, but he noted that a majority of these come with a monthly subscription cost.
John Bambenek, principal threat hunter at Netenrich, a California-based digital IT and security operations company, described the Alexa Top Million as an “imperfect whitelist for machine learning and other automated systems to prevent blocking legitimate websites.”
“Cisco Umbrella, Majestic, and Tranco are alternatives, however, any of the above expose one of the problems we have in cybersecurity machine learning in that we just don’t have good training data. What few sources we have, while imperfect approximations, are probably not long for this world,” Bambenek told SecurityWeek.