News & Stories > Troll Tracker: keeping disinformation accountable

Project Highlights

Troll Tracker: keeping disinformation accountable

Words by Allan Cheboi and Robin Kiplangat • Nov 9 2021

This article was originally written and published by; Allan Cheboi, Senior Investigations Manager, iLAB and Robin Kiplangat, Senior Investigative Technologist, iLAB on October 8, 2021.

“A Twitter-based disinformation tracking tool built on a web-based dashboard that collects deleted tweet content from previously identified trolls and disinformation actors. The project seeks to help monitor the social posts of known disinfo actors. The primary tangible output of the project is to expose trolls behind toxic disinformation campaigns who routinely cover their tracks by deleting original inflammatory social media posts that sparked hate speech, disinformation campaigns or conspiracy theories.”

Monitoring disinformation on social media relies on detecting crucial evidence in the form of tweets or posts on platforms like Twitter and Facebook and referencing such content on evidence-based reports. This enables researchers to track and attribute false content and hate speech or propaganda to different online personas and ultimately track the real account owners.

However, the disinformation actors (trolls) usually initiate disinformation campaigns, then try to cover their tracks by deleting their original inflammatory social media posts or video content once the content/ campaign begins to gain organic momentum. In some cases, social media platforms often delete these posts to slow down the spread of harmful content; as such, crucial evidence that investigators need to understand who is behind the content or campaign disappears. It has become quite evident that digital evidence is volatile and fragile and can easily be altered in cases of improper handling. New challenges are emerging for investigative journalists and researchers to provide evidence-backed analysis when the actors delete posts that may be required to combat or counter false information.

“Once a Tweet has been deleted, the Tweet contents, associated metadata, and all analytical information about that Tweet is no longer publicly available on Twitter.”

from Twitter.

This is precisely what we are trying to tackle under this project. Code for Africa (CfA), partnering with Code for All (CfAll), aims to make the most out of Twitter data and API functionality. The underlying technology leverages CfA’s successful Politwoops tool, which tracks deleted tweets from politically exposed people (PEPs) in Kenya. TrollTracker seeks to archive crucial evidence that disinformation investigators need to understand who is behind a campaign once it ‘disappears’.

Have we lost crucial evidence before?

In South Africa, for example, disinformation peddlers on the alt-right are currently systematically deleting the original tweets that sparked widespread xenophobic hysteria and anti-foreigner campaign using hashtags like #PutSouthAfricansFirst. The original tweets are only deleted once the campaigns achieve ‘organic’ momentum. A Reddit social justice group that monitors the deletions regularly expresses frustration that it is unable to retrieve/archive the deleted tweets to use as evidence:

Sample post by a researcher who noticed that original tweets for the #PutSouthAfricansFirst (Source: Reddit/ CfA)

Internationally, a report presented to the U.S. Congress earlier this year by Code for Africa’s partner, the DFRLab, showed that political activist Jack Posobiec was one of the main amplifiers of the #StopTheSteal hashtag in 2020. Unfortunately, during the investigation/ research, all @JackPosobiec’s tweets mentioning “#StopTheSteal” had since been deleted, and it was only possible to reconstruct the timeline thanks to the tremendous work of researchers.

Tweets from @JackPosobiec with #StopTheSteal (Source: Twitter/ CfA)

The data and Twitter’s API integration

The platform uses Twitter’s stream API to detect and track publicly available tweets in real-time. This allows the tool to spool and ingest all tweet contents from each user on the watchlist. Once a troll’s account has been added to the watchlist, all the tweets the user posted as at the time of addition are spooled/ cloned to the Trolltracker database. This database keeps being updated with any additional tweets posted by the user. In case the user account deletes a tweet, a code base called the ‘archive worker’ is triggered, which identifies and flags the tweet in the database as ‘deleted’. The tweet will then be featured on the Trolltracker dashboard.

Sample flow chart of the Twitter-TrollTracker integration (Source: CfA)

How it works

The new users must register for an account on the platform through the login/ sign-up page. User data is stored in a secure database behind CfA’s firewall and can only be accessed by administrators of the tool.

Users can then login and view the live ticker of the most recent deleted tweets based on a predefined set of public watchlist accounts. Users can then conduct basic or advanced boolean queries on the search box to surface tweets with preferred content or keywords and sort or filter the content based on their preferences. The live ticker can also be organised by several additional curated filters, such as the name or networks of disinfo actors, hashtag and disinformation themes, including Covid-19, anti-vaccination content, foreign influence, religious extremism, and climate denial. Users can also filter content by location in cases where the tweets are geo-located.

Each user can upload, and curate (by grouping or tagging) customised or personalised/ private watchlists of Twitter accounts, both through uploading batches (CSV or JSON) and individual accounts. The watchlist management dashboard also allows for downloads and entity-by-entity edits or tagging, or grouping.

The tool also has a showcase page where the complete audit methodology, user guidelines, skills and tutorials toolkit and a curated selection of investigations using the tool’s data/techniques will be display.

Who can use this tool?

The tool is meant to serve these three primary audiences; however, based on the specific use case and need, special and privileged access will be granted to users outside these audiences upon request, due diligence, and approval from CfA.

  • Persona #1: Disinformation researchers: individuals reporting and alerting investigative teams on activities from the watchlist. Accumulated analysis/research/investigations into trends and tropes in disinformation campaigns will be open for review and additional analysis by CfAll researchers.
  • Persona #2: CfAll community: the open-source code for the Troll Tracker toolkit will be deployable by the CfAll community.
  • Persona #3: Consumer of media or someone interested in the disinformation or influence operations issue. Both inside CfAll community and the wider world, subject to review and approval of use cases.

Project Outcomes

We are extremely proud of tangible achievements under this project. The hard-working team leads us to contribute and cover most of the use cases and expectations by the several research institutions and individuals who conduct social media investigations on Twitter to track disinformation, hate speech and propaganda, and other preferred volatile information on Twitter.

Opportunities just ahead

We identified five main axes for improvement opportunities:

  • Addition of an alert functionality enabling users to set up push alerts for notable events and deletions based on specific triggers or queries. The alerts will be delivered to the users’ email and a preferred Slack channel. The alerts will be set up as digests or the most recent tweets or notifications of spikes based on search criteria.
  • Enriching currently collected API nodes with additional content such as details of followers, retweets and likes, to allow network analysis after a deletion;
  • Addition of a preliminary network analysis functionality and visualisation of accounts that delete identical tweets, which will help in mapping coordinated networks;
  • Addition of content and user tracking from other platforms e.g. Facebook, Reddit, blogs and other websites.
  • Organizations lists and collaborative sharing of leads or watchlists

It has been a great journey, thanks to the great project team and Code for All community for this amazing opportunity!


Author picture

Allan Cheboi

Senior Investigations Manager, Code for Africa’s ANCIR iLAB team

CfA’s Senior Investigations Manager at the iLAB & ANCIR, managing a team of forensic data scientists/ analysts and technologists in eight African countries working in three international languages (Arabic, English and French). Prior to joining CfA, Allan was a cybercrime investigator and forensic audit manager at KPMG. Allan leads CfA’s African Network & Center for Investigative Reporting (ANCIR), CfA's initiative that brings together the continent’s best investigative newsrooms, ranging from large traditional mainstream media to smaller specialist units.


Author picture

Robin Kiplangat

Senior Investigative Technologist, Code for Africa’s ANCIR iLAB team

An idealist taking on his own hero’s journey in seeking meaningful existence. With love, respect, and support from many who continue to impact great values in his life; nurtured with an inborn love of nature and a desire to understand the mysteries of life. Open to new experiences and the dense diversity aspect of being.

More News & Stories

Keep up to date with the global civic tech community