Trails of Propagation


The Twitter Trails Project

TRAILS: A System for Monitoring the Propagation of Rumors On Twitter
S. Finn, P. T. Metaxas, E. Mustafaraj, M. O’Keefe, L. Tang, S. Tang, and L. Zeng
Social media has become part of modern news reporting, whether it is being used by journalists to spread information and find sources, or as a medium by citizen reporters. The quest for prominence and recognition on websites like Twitter can sometimes eclipse accuracy and lead to the spread of false information. As a way to study and react to this trend, we introduce TRAILS, an interactive, web-based tool that allows users to investigate the origin and propagation characteristics of a rumor and its denial, if any, on Twitter. Propagation, timeline, retweet and co-retweeted network visualiza- tions help users trace the spread of a story. While we envision that TRAILS would be valuable as a tool for individual use, in the initial stages we see it as a tool for amateur and professional journalists investigating recent and breaking stories.
Do Retweets indicate Interest, Trust, Agreement?
P. T. Metaxas, E. Mustafaraj, K. Wong, L. Zeng, M. O’Keefe, S. Finn
Despite the fact that retweets are routinely studied and reported, many important questions remain about user motivation for their use and their significance. In this paper we answer the question of what users indicate when they retweet. We do so in a comprehensive fashion, by employing a user survey, a study of user profiles, and a meta- analysis of over 100 research publications from three related major conferences. Our findings indicate that retweeting indicates not only interest in a message, but also trust in the message and the originator, and agreement with the message contents. However, the findings are significantly weaker for journalists, some of whom beg to differ declaring so in their own user profiles. On the other hand, the inclusion of hashtags strengthens the signal of agreement, especially when the hashtags are related to politics.
Trails of Trustworthiness in Real-Time Streams (Extended Summary)
Panagiotis T. Metaxas and Eni Mustafaraj
The overall aim of our ongoing research is to lay the foundation of a comprehensive approach to support critical thinking and increase security while maintaining privacy in a trusted cyber-world. Building on the work of other researchers, as well as on the success we had in the past with recognizing and uncovering some of the causes of misinformation, we design a system that can maintain trails of trustworthiness for information propagated through real-time information channels. When confronted with information that requires fast action, our system will enable its educated users to evaluate its provenance, its credibility and the independence of the multiple sources that may provide this information.

The Co-Retweeted Network and Political Polarization

The Co-Retweeted Network and its Applications for Measuring the Perceived Political Polarization
Samantha Finn, Eni Mustafaraj and Panagiotis T. Metaxas
This paper introduces a novel network, the co-retweeted network, that is constructed as the undirected weighted graph that connects highly visible accounts who have been retweeted by members of the audience during some real-time event. Like bibliographics co-citation used to indicate that two papers treat a related subject matter, co-retweeting is used to indicate that two accounts present similar opinions in an online discussion. Thus, the co-retweeted network can be seen as a form of consulting the opinion of the crowd that is following the discussion about the similarity (or difference) of positions expressed by the highly visible accounts. When applied on political conversations related to some event, the co-retweeted network enables the measurement of the polarity of political orientation of major players (including news organizations) based on the views of the audience. It can also measure the degree of polarization of the event itself.
Presented at WEBIST14.
Visualizing Co-Retweeting Behavior for Recommending Relevant Real-Time Content
Samantha Finn and Eni Mustafaraj
Twitter is a popular medium for discussing unfolding events in real-time. Due to the large volume of user generated data during these events, it's important to be able recommend the best content while it's fresh. Current recommendation algorithms for Twitter take into account the user's tweets and her social network, but since real-time events might be unique or unexpected, the history of a user may not be sufficient for finding the most relevant content. Additionally, for users who want to join the conversation at that specific moment (or follow it without having to create an account), the system will be faced with the cold-start problem. We propose a simple visualization technique that considers the activity of the whole community participating in the real- time discussion, by capturing their co-retweeting behavior. Such a technique depicts the big picture, allowing a user to choose content from parts of the community that share her opinions or beliefs.
Presented at MSM2013: Slides

Social Media and Politics

Social Media and the Elections
Panagiotis T. Metaxas and Eni Mustafaraj
In the United States, social media sites—such as Facebook, Twitter, and YouTube—are currently being used by two out of three people (1), and search engines are used daily (2). Monitoring what users share or search for in social media and on the Web has led to greater insights about what people care about or pay attention to at any moment in time. Furthermore, it is also helping segments of the population to be informed, to organize, and to react rapidly. However, social media and search results can be readily manipulated, which is something that has been underappreciated by the press and the general public.
What Edited Retweets Reveal about Online Political Discourse
Panagiotis T. Metaxas and Eni Mustafaraj
How widespread is the phenomenon of commenting or editing a tweet in the practice of retweeting by members of political communities in Twitter? What is the nature of comments(agree/disagree), or of edits (change audience, change meaning, curate content). We argue that is necessary to go beyond the much-adopted aggregate text analysis of the volume of tweets, in order to discover and understand phenomena at the level of single tweets. This becomes important in the light of the increase in the number of human-mimicking bots in Twitter. Genuine interaction and engagement can be better measured by analyzing tweets that display signs of human intervention. Editing the text of an original tweet before it is retweeted, could reveal mindful user engagement with the content, and therefore, would allow us to perform sampling among real human users. This paper presents work in progress that deals with the challenges of discovering retweets that contain comments or edits, and outlines a machine-learning based strategy for classifying the nature of such comments.
Vocal Minority versus Silent Majority: Discovering the Opinions of the Long Tail
Eni Mustafaraj, Samantha Finn, Carolyn Whitlock, and Panagiotis T. Metaxas
Social networks such as Facebook and Twitter have become the favorite places on the Web where people discuss real-time events. In fact, search engines such as Google and Bing have special agreements, which allow them to include into their search results public conversations happening in real-time in these social networks. However, for anyone who only reads these conversations occasionally, it is difficult to evaluate the (often) complex context in which these conversation bits are embedded. Who are the people carrying on the conversation? Are they random participants or people with a specific agenda? Making sense of real-time social streams often requires much more information than what is visible in the messages themselves. In this paper, we study this phenomenon in the context of one political event: a special election for the US Senate which took place in Massachusetts in January 2010, as observed in conversations on Twitter. We present results of data analysis that compares two groups of different users: the vocal minority (users who tweet very often) and the silent majority (users who tweeted only once). We discover that the content generated by these two groups is significantly different, therefore, researchers should take care in separating them when trying to create predictive models based on aggregated data.
From Obscurity to Prominence in Minutes: Political Speech and Real-Time Search
Eni Mustafaraj and Panagiotis T. Metaxas
Recently, all major search engines introduced a new feature: real-time search results, embedded in the first page of organic search results. The content appearing in these results is pulled within minutes of its generation from the so-called \real-time Web" such as Twitter, blogs, and news websites. In this paper, we argue that in the context of political speech, this feature provides disproportionate exposure to personal opinions, fabricated content, unverified events, lies and misrepresentations that otherwise would not find their way in the first page, giving them the opportunity to spread virally. To support our argument we provide concrete evidence from the recent Massachusetts (MA) senate race between Martha Coakley and Scott Brown, analyzing political community behavior on Twitter. In the process, we analyze the Twitter activity of those involved in exchanging messages, and we find that it is possible to predict their political orientation and detect attacks launched on Twitter, based on behavioral patterns of activity.

Citizen Reporting and Twitter

Hiding in Plain Sight: A Tale of Trust and Mistrust inside a Community of Citizen Reporters
Eni Mustafaraj, Panagiotis T. Metaxas, Samantha Finn, and Andrés Monroy-Hernández
In this paper, we discuss the case of a community of Twitter citizen reporters, located in a Mexican city plagued by the drug cartels fighting for control of territory. Our analysis shows that the most influential individuals inside the community were anonymous accounts. Neither the Mexican authorities, nor the drug cartels were happy about the real-time citizen reporting of crime or anti-crime operations in an open social network such as Twitter, and we discovered external pressures to this community and its influential players to change their reporting behavior.
The Rise and the Fall of a Citizen Reporter
Panagiotis T. Metaxas and Eni Mustafaraj
Recently, research interest has been growing in the development of online communities sharing news and information curated by “citizen reporters”. Using “Big Data” techniques researchers try to discover influence groups and major events in the lives of such communities. However, the big picture may sometimes miss important stories that are essential to the development and evolution of online communities. In particular, how does one identify and verify events when the important actors are operating anonymously and without sufficient news coverage, as in drug war-torn Mexico? In this paper, we present some techniques that allow us to make sense of the data collected, identify important dates of significant events therein, and direct our limited resources to discover hidden stories that, in our case, affect the lives and safety of prominent citizen reporters. In particular, we describe how focused analysis enabled us to discover an important story in the life of this community involving the reputation of an anonymous leader, and how trust was built in order to verify the validity of that story.

Predicting Elections

How (Not) To Predict Elections
Eni Mustafaraj, Panagiotis T. Metaxas, and Daniel Gayo-Avello
This work aims to test the predictive power of social media metrics against several Senate races of the two recent US Congressional elections. We review the findings of other researchers and we try to duplicate their findings both in terms of data volume and sentiment analysis. Our research aim is to shed light on why predictions of electoral (or other social events) using social media might or might not be feasible. In this paper, we offer two conclusions and a proposal: First, we find that electoral predictions using the published research methods on Twitter data are not better than chance. Second, we reveal some major challenges that limit the predictability of election results through data from social media. We propose a set of standards that any theory aiming to predict elections (or other social events) using social media should follow.
Limits of Electoral Predictions Using Twitter
Panagiotis T. Metaxas, Eni Mustafaraj, and Daniel Gayo-Avello
Using social media for political discourse is becoming common practice, especially around election time. One interesting aspect of this trend is the possibility of pulsing the public’s opinion about the elections, and that has attracted the interest of many researchers and the press. Allegedly, predicting electoral outcomes from social media data can be feasible and even simple. Positive results have been reported, but without an analysis on what principle enables them. Our work puts to test the purported predictive power of social media metrics against the 2010 US congressional elections. Here, we applied techniques that had reportedly led to positive election predictions in the past, on the Twitter data collected from the 2010 US congressional elections. Unfortunately, we find no correlation between the analysis results and the electoral outcomes, contradicting previous reports. Observing that 80 years of polling research would support our findings, we argue that one should not be accepting predictions about events using social media data as a black box. Instead, scholarly research should be accompanied by a model explaining the predictive power of social media, when there is one.