
Recent academic work reveals that large language models can effectively deanonymize pseudonymous social media users by analyzing public posts and profiles. This approach achieves significantly higher success rates compared to traditional methods that depend on manual data curation or expert investigation.
In experiments, the recall rate—the proportion of users correctly identified across multiple platforms—reached 68 percent. Precision, indicating the accuracy of these identifications, peaked at 90 percent. These figures substantially outperform classical deanonymization techniques, which often rely on structured datasets assembled by humans or skilled analysts.
The researchers stated, “Our findings have significant implications for online privacy.” They explained, “The average online user has long operated under an implicit threat model where they have assumed pseudonymity provides adequate protection because targeted deanonymization would require extensive effort. LLMs invalidate this assumption.”
Pseudonymity has served as a common privacy measure, allowing individuals to engage in sensitive discussions or post queries without easy identification. This new capability threatens to undermine that protection, potentially exposing users to risks like doxxing, stalking, or the creation of detailed marketing profiles that track personal details such as location and occupation.
To test their methods, the team gathered datasets from public social media sources while ensuring speaker privacy. One dataset combined posts from Hacker News with LinkedIn profiles, using cross-platform references found in user profiles to establish links. After removing all identifying information from the posts, they applied a large language model to analyze the content.
A second dataset originated from a Netflix release containing micro-identities, including individual preferences, recommendations, and transaction records. Previous research from 2008, known as the Netflix prize attack, demonstrated that such data could identify users and infer political affiliations and other personal information.
A third technique involved splitting a single user’s Reddit history to assess deanonymization potential. The findings indicate that pseudonymity no longer holds as a reliable barrier against identification, given the efficiency and scalability of LLM-based approaches.



