Published in the Web Conference Companion 2018 PDF
Over the past few years new “fringe” communities have been created and gained traction on the Web at a rapid rate. Very often little is known about how they evolve or what kind of activities they attract, despite recent research showing that they influence how false information reaches mainstream communities. This motivates the need to monitor these communities, and to analyze their impact on the Web’s information ecosystem. In August 2016, a new social network called Gab was created as an alternative to Twitter. It positions itself as putting “people and free speech first”, welcoming users banned or suspended from other social networks. In this paper, we provide, to the best of our knowledge, the first characterization of Gab. We collect and analyze 22M posts produced by 336K users between August 2016 and January 2018, finding that Gab is predominantly used for the dissemination and discussion of news and world events, and that it attracts alt-right users, conspiracy theorists, and other trolls. We also measure the prevalence of hate speech on the platform, finding it to be much higher than Twitter, but lower than 4chan’s Politically Incorrect board.
The Register covers our Disinformation Warfare paper, where we study Russian Trolls on Twitter with a focus on understanding their influence on Twitter as well as other Web communities like Reddit and 4chan.
The main points of discussion in the article are the following:
Under Review, 2018 Technical Report PDF
Either by ensuring the continuing availability of information, or by deliberately caching content that might get deleted or removed, Web archiving services play an increasingly important role in today’s information ecosystem. Among these, the Wayback Machine has been proactively archiving, since 2001, versions of a large number of Web pages, while newer services like archive.is allow users to create on-demand snapshots of specific Web pages, which serve as time capsules that can be shared across the Web. In this paper, we present a large-scale analysis of Web archiving services and their use on social media, aiming to shed light on the actors involved in this ecosystem, the content that gets archived, and how it is shared. To this end, we crawl and study: 1) 21M URLs, spanning almost two years, from archive.is; and 2) 356K archive.is plus 391K Wayback Machine URLs that were shared on four social networks: Reddit, Twitter, Gab, and 4chan’s Politically Incorrect board (/pol/) over 14 months. We observe that news and social media posts are the most common types of content archived, likely due to their perceived ephemeral and/or controversial nature. Moreover, URLs of archiving services are extensively shared on “fringe” communities within Reddit and 4chan to preserve possibly contentious content. Lastly, we find evidence of moderators nudging or even forcing users to use archives, instead of direct links, for news sources with opposing ideologies, potentially depriving them of ad revenue.
Under Review, 2018 Technical Report PDF
Over the past couple of years, anecdotal evidence has emerged linking coordinated campaigns by state-sponsored actors with efforts to manipulate public opinion on the Web, often around major political events, through dedicated accounts, or “trolls.” Although they are often involved in spreading disinformation on social media, there is little understanding of how these trolls operate, what type of content they disseminate, and most importantly their influence on the information ecosystem. In this paper, we shed light on these questions by analyzing 27K tweets posted by 1K Twitter users identified as having ties with Russia’s Internet Research Agency and thus likely state-sponsored trolls. We compare their behavior to a random set of Twitter users, finding interesting differences in terms of the content they disseminate, the evolution of their account, as well as their general behavior and use of the Twitter platform. Then, using a statistical model known as Hawkes Processes, we quantify the influence that these accounts had on the dissemination of news on social platforms such as Twitter, Reddit, and 4chan. Overall, our findings indicate that Russian troll accounts managed to stay active for long periods of time and to reach a substantial number of Twitter users with their messages. When looking at their ability of spreading news content and making it viral, however, we find that their effect on social platforms was minor, with the significant exception of news published by the Russian state-sponsored news outlet RT (Russia Today).
Under Review, 2018 Technical Report PDF
Hate speech, offensive language, sexism, racism and other types of abusive behavior have become a common phenomenon in many online social media platforms. In recent years, such diverse abusive behaviors have been manifesting with increased frequency and levels of intensity. This is due to the openness and willingness of popular media platforms, such as Twitter and Facebook, to host content of sensitive or controversial topics. However, these platforms have not adequately addressed the problem of online abusive behavior, and their responsiveness to the effective detection and blocking of such inappropriate behavior remains limited. In the present paper, we study this complex problem by following a more holistic approach, which considers the various aspects of abusive behavior. To make the approach tangible, we focus on Twitter data and analyze user and textual properties from different angles of abusive posting behavior. We propose a deep learning architecture, which utilizes a wide variety of available metadata, and combines it with automatically-extracted hidden patterns within the text of the tweets, to detect multiple abusive behavioral norms which are highly inter-related. We apply this unified architecture in a seamless, transparent fashion to detect different types of abusive behavior (hate speech, sexism vs. racism, bullying, sarcasm, etc.) without the need for any tuning of the model architecture for each task. We test the proposed approach with multiple datasets addressing different and multiple abusive behaviors on Twitter. Our results demonstrate that it largely outperforms the state-of-art methods (between 21 and 45\% improvement in AUC, depending on the dataset).
Under Review, 2018 Technical Report PDF
In recent years, offensive, abusive and hateful language, sexism, racism and other types of aggressive and cyberbullying behavior have been manifesting with increased frequency, and in many online social media platforms. In fact, past scientific work focused on studying these forms in popular media, such as Facebook and Twitter. Building on such work, we present an 8-month study of the various forms of abusive behavior on Twitter, in a holistic fashion. Departing from past work, we examine a wide variety of labeling schemes, which cover different forms of abusive behavior, at the same time. We propose an incremental and iterative methodology, that utilizes the power of crowdsourcing to annotate a large scale collection of tweets with a set of abuse-related labels. In fact, by applying our methodology including statistical analysis for label merging or elimination, we identify a reduced but robust set of labels. Finally, we offer a first overview and findings of our collected and annotated dataset of 100 thousand tweets, which we make publicly available for further scientific exploration.
Dr. Jeremy Blackburn, one of the co-founders of the iDrama lab, was interviewed by The Atlantic on our research on fake news. The Atlantic article focus on the existence of extremists in Web communities, especially in small fringe Web communities like 4chan. Among other things, the article includes some of our paper’s findings; i.e., the disproportionate impact that small Web communities can have on large mainstream Web communities.
Naked Security covers our Web Centipede paper. The article explains the motivation and main findings of our paper. It also includes some phrases obtained from an interview with Dr. Jeremy Blackburn, the last author of this study and one of the co-founders of the iDrama Lab.
Phys.org summarizes our Web Centipede paper. In a nutshell, the article explains the methodology used in our paper as well as our main findings regarding the propagation of mainstream and alternative news in multiple Web communities as well as the influence measurements we undertook for Twitter, 4chan, and specific subreddits.
Our lab’s 4chan paper that was published to ICWSM 17 (Best Paper Runner-Up) got extensive press coverage during 2016 and 2017. In this thread, we report the most notable articles discussing our paper.
Independent and The Conversation : Both articles include an overview of the paper written by Emiliano De Cristofaro, a senior cuckademic of our Lab. The article provides an overview of 4chan, with an emphasis to the content that is posted as well as the coordinated raids that are orchestrated on 4chan and can disrupt other services on the Web. An example, is the so-called “Operation Google” where users began to replace companies names with racial slur in an attempt to trick Google’s algorithms.
Similarly to the previous article, Motherboard focus on the raids that are orchestrated on 4chan. Also, the article contains some quotes from Dr. Emiliano De Cristofaro who shared his thoughs on our research with the article’s author.
BoingBoing mentions our paper by writing about 4chan; “it’s mostly American white supremacists sharing YouTube videos and plotting raids and Pepe”.
Vice wrote an article after interviewing Dr. Jeremy Blackburn focusing on the hate speech aspect of 4chan. Furthermore, the article highlights the extreme/disturbing content that is usually disseminated within the 4chan platform.
Finally, Pacific Standard interviewed Gabriel Emile Hine, the lead author of our 4chan paper and a dramanaut in our Lab, which shared his thoughts with regard to 4chan’s hateful content and the orchestrated raids.
Published in the 17th Internet Measurement Conference, 2017 PDF
As the number and diversity of news sources on the Web grows, so does the opportunity for alternative sources of information production. The emergence of mainstream social networks like Twitter and Facebook makes it easier for misleading, false, and agenda driven information to quickly and seamlessly spread online, deceiving people or influencing their opinions. Moreover, the increased engagement of tightly knit communities, such as Reddit and 4chan, compounds the problem as their users initiate and propagate alternative information not only within their own communities, but also to other communities and social media platforms across the Web. These platforms thus constitute an important piece of the modern information ecosystem which, alas, has not been studied as a whole. In this paper, we begin to fill this gap by studying mainstream and alternative news shared on Twitter, Reddit, and 4chan. By analyzing millions of posts around a variety of axes, we measure how mainstream and alternative news flow between these platforms. Our results indicate that alt-right communities within 4chan and Reddit can have a surprising level of influence on Twitter, providing evidence that “fringe” communities may often be succeeding in spreading these alternative news sources to mainstream social networks and the greater Web.
Published in the 11th International AAAI Conference on Web and Social Media, 2017 PDF
Best paper runner up!
The discussion-board site 4chan has been part of the Internet’s dark underbelly since its inception, and recent political events have put it increasingly in the spotlight. In particular, /pol/, the “Politically Incorrect” board, has been a central figure in the outlandish 2016 US election season, as it has often been linked to the alt-right movement and its rhetoric of hate and racism. However, 4chan remains relatively unstudied by the scientific community: little is known about its user base, the content it generates, and how it affects other parts of the Web. In this paper, we start addressing this gap by analyzing /pol/ along several axes, using a dataset of over 8M posts we collected over two and a half months. First, we perform a general characterization, showing that /pol/ users are well distributed around the world and that 4chan’s unique features encourage fresh discussions. We also analyze content, finding, for instance, that YouTube links and hate speech are predominant on /pol/. Overall, our analysis not only provides the first measurement study of /pol/, but also insight into online harassment and hate speech trends in social media.
Sky News interviewed Dr. Gianluca Stringhini on our lab’s research. Specifically, Gianluca discussed that hate speech is a prevalent phenomenon within 4chan and that trolls that flood the platform can raid other services on the Web like YouTube, hence disrupting the YouTube community (based on our 4chan paper).
Also, Dr. Stringhini, which is one of the co-founders of our lab, explained the role and influence that a small fringe Web community like 4chan has to the Web’s information ecosystem (based on our Web Centipede paper).
Nature interviewed Dr. Gianluca Stringhini on our 4chan and Web Centipede papers. During the interview, Gianluca explained the rationale behind studying the 4chan platform, the high degree of hate speech that the platform exhibits as well as the study of orchestrated campaigns (also known as raids) that can disrupt other services on the Web (e.g., YouTube).
Also, Dr. Gianluca Stringhini highlighted 4chan’s surprising influence (with regard to the dissemination of news) on mainstream social networks like Twitter.