Facebook Scraping Incident Leaks Info for Half Billion Users

Jun 10, 2021

In early April, numerous sources disclosed discovery of a pool of Facebook records including information on more than 530 million of its users. The leaked information included users’ names, dates of birth, and phone numbers as posted to a website for hackers. Business Insider’s (BI) April 3 story represented some of the first reporting on this breach, and focused on a database that security researcher Alon Gal of cybercrime intelligence firm Hudson Rock discovered in January 2021. BI reports further that it “reviewed a sample of the leaked data and verified several records by matching known Facebook users’ phone numbers with IDs listed in the data set.”

This image has an empty alt attribute; its file name is c503f4aa-5c6e-4c47-ae65-d2d0a81f43f1

Facebook’s Response and Explanation

The BI story states that a “Facebook spokesperson told Insider that the data has been scraped because of a vulnerability that the company patched in 2019.” Scraping attacks involve downloading account pages from a Website and parsing their contents to discover personal information amongst the data the underlying Web markup contains. The vulnerability involved was based on the ability to import contact lists from users’ cellphones (with their permission) to extend friend lists and associated data. But while the vulnerability is no longer open to current exploit, even PII (personally identifiable information) data from 2019 can serve as entry points for various types of attack, including impersonation, identity theft, targeted phishing, and potential fraud.

According to numerous sources who’ve analyzed the database in question, users from 106 countries are included in its contents. Of the over 500 million users represented therein, over US-based users number 32 million, with 11 million more from the UK, and an additional 6 million from India. For most users, their data includes Facebook IDs, phone numbers, full names, locations, dates of birth, and self-descriptions (bios). For some users, email addresses are also disclosed.

How the Breach Was Identified

Mr. Gal found the leaked data in January when a hacking forum users advertised a bot that could provide phone numbers for hundreds of millions of Facebook users at a price. At around that same time, Joseph Cox at Motherboard reported the existence of this automated Telegram bot, with a proof of function demo, with charges ranges from US$20 to get information for a single user account, and up to US$5K for 10,000 users. Motherboard reports it tested the bot and confirmed that it provides a valid phone number for a Facebook user known to them who elected to keep that number private. The exploit was documented in 2019 for Instagram users (Instagram is a subsidiary of Facebook) and included this statement “It would … enable automated scripts and bots to build user databases that could be searched, linking high-profile or highly-vulnerable users with their contact details.”

This is apparently just what the database that Gal discovered contains. Since his initial findings in January, that database has been posted to a hacking forum at no charge. Thus, it’s available to anyone able to access the site. And indeed it could provide ample data to drive attacks even to those with only basic data management and manipulation skills. In other words, it’s wide open to hackers of all kinds, from script kiddies to other, more skilled malefactors. Gal tweeted numerous posts about this and has pinned his original April 3 post to his Twitter account, @UnderThe Breach.

Ironically, Facebook had already stated its intentions to stymie mass data scraping after Cambridge Analytica scraped data from over 80 million users to target voters with political ads during the 2016 election. Despite violating Facebook’s terms of service, the UK consulting firm used the data to persuade users to vote for then candidate Donald Trump in that election. Some sources claim their efforts impacted the election, where one claims they “help[ed] Donald Trump win the 2016 U.S. Presidential election” (Bloomberg). These activities have resulted in several lawsuits, against both Facebook and Cambridge Analytica, with settlements expected to exceed hundreds of millions of dollars.

Protecting Against Scraping Attacks

Once the damage is done, breaches like the Instagram/Facebook cannot be recalled or cancelled. Pre-emptive security-forward design, implementation and testing prior to public release are the only tools in the arsenal that can prevent scraping attacks from succeeding. Thus it’s essential to carefully vet all code that involves user PII and to pro-actively test (and attack) such code to make sure that unwanted access and disclosure is at least unlikely, if not extremely difficult to downright impossible.

At a minimum, developers should work to make it harder to obtain PII from design through implementation. Increased use of strong encryption and added levels of authentication and access control could help, as would stronger limits on access to rich repositories of valuable information such as contact lists.

Instagram/Facebook users potentially impacted by the breach may want to consider changing phone numbers. They should also stay on heightened alert for identity theft attempts, and remain vigilant against phishing or spearphishing attacks as well. Clearly, this is a case where prevention beats cure, but where cure is bound to be time-consuming, litigious and expensive for parties involved in the breach.


Would you like to know more about implementing secure application development solution in your company? Get in touch with our Kiuwan team! We love to talk about security.

Get Your FREE Demo of Kiuwan Application Security Today!

Identify and remediate vulnerabilities with fast and efficient scanning and reporting. We are compliant with all security standards and offer tailored packages to mitigate your cyber risk within the SDLC.

Related Posts