Fighting Chinese Disinfo with Doublethink Lab
Words by Matt Stempeck • Jun 8 2023
As part of Matt Stempeck’s work researching the ongoing work within the Code for All Network confronting disinformation, we are sharing the most exciting takeaways from those exchanges in the Civic Tech & Disinformation Project Series.
These interviews were conducted in mid-2022. Please note that the information might change as the organization seems appropriate.
Get to know Doublethink Lab
Doublethink Lab is a recent entry into the disinfo-fighting arena. They were founded in Taiwan in 2019 to research Chinese influence operations and disinformation campaigns, including those that have infiltrated Western media sources.
Doublethink’s co-founder and CEO is Wu Min Hsuan (or as he often introduces himself to Westerners, TTCat). Wu was previously a leader at Taiwan’s field-leading Open Culture Foundation.
He transitioned into full-time disinfo-fighting because he saw it as a barrier to the civic tech many of us hope to create. “In the whole open government movement, we’re talking about how to build a more transparent, public deliberative process,” Wu says. “The ideal we’re working towards is that by using technology to help citizens assess those data, those conversations, to be part of that participatory decision-making process, we can take our democracies to the next level.”
That’s the dream. But Wu realized that the influence of foreign actors via the mostly-borderless internet can corrupt even the most well-intentioned civic tech project in order to manipulate participants and results. Well-timed bots and fake news campaigns can distort collective decision processes like elections or digital participation processes.
This tension raises the dilemma of verifying digital participants. On one hand, proposing to verify the identity and address of every user is a scary idea in the many places where you can’t trust the government with the ability to match every online opinion to its speaker. And even if we do authenticate every account, it’s quite common for old accounts to get hacked. At the same time, we want to ensure the integrity of the process and know that the public debates we’re facilitating are authentic. There’s no perfect solution to this dilemma.
[Author's note: the EU's DECODE project ran a pilot on Barcelona's Decidim platform giving residents the ability to authenticate their legitimacy while still participating anonymously.]
Disinfo traverses political boundaries and languages or dialects
Doublethink Lab has developed a Machine Learning algorithm that uses Natural Language Processing to help identity who’s spreading the narratives introduced by the People’s Republic of China. It’s known that state-funded Chinese media invests to spread their content to media outlets outside of China and in other languages.
They archive media publications’ content and analyze it to detect bias in the framing of news stories, word choice, narrative, and sentiment around key terms. Doublethink then compares its findings across media outlets to determine which companies are most aligned with PRC state media. That same algorithm can also monitor the news in Australia and Japan, with the United Kingdom and Spanish and African countries to follow.
This work can detect when Chinese government talking points make it into foreign publications verbatim, for example. The goal is to be able to disclose the findings for news consumers, civil society groups, and media watchdogs to raise legitimate questions about an outlet’s alignment with the PRC, as these collaborations or partnerships are rarely transparently disclosed. It’s one thing if a media outlet happens to be aligned with the PRC worldview, and another entirely if they are government-funded without disclosure.
For example, Doublethink’s research identified a single Australian media outlet that refrains from reporting on human rights violations against China’s Uighur population and has a starkly different take on the Hong Kong protests relative to the rest of Western media. This work can identify media actors that warrant further investigation.
Doublethink’s focus on Chinese state-driven disinformation clearly relates to the existential threat it poses to Taiwan. As a result, their work needs to beyond proving and disproving individual claims and zoom out to the broader information battlefield. They pay particularly close attention to the actors – who are amplifying disinformation or propaganda – instead of getting bogged down in whether a given piece of content is a fact, partial fact, or opinion. By studying speakers’ behavior, posting patterns, and account authenticity, Doublethink can stay focused on the broader effect of this content published on public platforms. The disinfo is designed to be consumed by others in order to change the public discourse on key topics. If those accounts are animated by paid staff, foreign actors, or bots, democracy has a real problem.
Researchers often break the digital disinformation ecosystem down into subgroups of actors that are based on motives and tactics. Two commonly cited subgroups are state-sponsored actors and economic actors (who are in it for the earnings).
What Wu and his team have discovered is an insidious hybrid model, whereby an ecosystem of social media influencers adopt state-sponsored narratives and benefit financially from doing so. While some portion of this influencer class may simply be patriotic content producers, many amplify PRC narratives primarily for the amplification and revenue that tends to follow.
“If you want to be rich, especially if you are a white person, you can create a TikTok video saying how much you like China, and remember to include Chinese subtitles,” Wu says. “Then you will get a lot of traffic, you will become an influencer, and then you can make a profit.”
As we’ve seen in other contexts, entrepreneurial web developers have created content farms masquerading as news sites, but they just copy and paste the text. Whether or not they believe in the political ideologies they’re publishing, they have a huge profit-driven motivation. Wu points to the fact that you don’t even need to show your face – you can buy fake accounts, photos and videos, and engagement for your fake accounts. There are desktop programs that facilitate the operation of hundreds of Facebook accounts on the same computer. Proving state-sponsored financial support has been difficult. The YouTuber might instead receive ad deals from Chinese businessmen, or sudden bursts of traffic.
Just like in other contexts, discussing Chinese topics on certain social media platforms can lead to vicious attacks by some combination of bots, patriotic users, and/or paid nationalist trolls. To help mitigate this army of online hate speech, Doublethink Lab built their ‘little pink’ detector (‘Little Pink’ is the name that’s been given to young, hyper-nationalist Chinese online). The tool can help classify Twitter accounts on a large scale, grouping thousands of accounts to facilitate social network analysis.
Doublethink’s tools are being developed and tested in-house. They use existing tools for network visualization and then employ their account classifier to score accounts in large volumes. The Little Pink detector can be used to analyze their narratives, tweets, and content at scale.
For example, Doublethink used the Little Pink detector to identify that nearly everyone talking about a bioweapons conspiracy in Ukraine on Twitter, some 3,000 accounts, had clear links to one another. They found that the conspiracy theory was being discussed by three distinct communities: little pinks and PRC state media, Russia-aligned conspiracy theorists, and those opposing the conspiracy narratives.
This analysis identified, among other things, bridge figures between Russian and Chinese language influence operations, and Twitter accounts created solely to amplify Chinese state media in violation of Twitter’s labeling policy. The analysis concludes with recommendations for Twitter, Inc., which the company should review.
Access to data
Social media platforms offer disinfo fighters major challenges, but also public evidence. Disinformation campaigns obviously seek to trick the platforms’ feed algorithms into believing their content is organically popular, which will trigger a promotion on the platforms. To do this, the campaigns must create lots of fake accounts to interact with their content and game the algorithm. That activity, by necessity, leaves a public evidence trail behind because the algorithm needs to see those signals.
Civic tech groups can and have developed tools to come in and detect those traces and leads. At the same time, the democracy-corroding coordination is taking place right there on YouTube, Facebook, Twitter, and other platforms. The companies have all the data you could ever need in their server logs. And it’s the social platforms’ content promotion algorithms that are so vulnerable, and so easily manipulated, by state-funded disinfo efforts as well as marketing companies. The reward is clear, and the attention arms race has thus far favored everyone looking to distort public opinion for influence or profit (or both).
The social platforms, Wu says, have the responsibility to make it more transparent who is behind these supposedly-viral messages that the platforms help amplify. Like many other disinfo-fighting civic tech groups, Doublethink struggles with getting research access to quality social media data. Though the official platform APIs are limited or non-existent, they can always build tools to archive social media data unofficially. But this approach presents its own thorny questions. By developing social media archiving utilities, you’re investing a lot of civil society resources into capturing data that the platforms already have readily available on their servers. Hundreds of organizations in our field are archiving the same data every day. This duplication of resources benefits no one, and Wu hasn’t found a reliable or trustworthy mechanism for sharing that data for research purposes.
[Author's note: Meta's efforts to shut down NYU's Ad Observatory might help explain why this is the case].
“Facebook thinks the data belongs to them because it’s generated by their users, but those data is also what’s happening in our country, discussions about our own democracy, taking place on their platform,” Wu says. “The discussion belongs to the public, to our country. We need to know what’s happening in our own information space. It’s directly related to our democracy and the health of our society. They need to find a way to build a mechanism for people to come in and see what’s going on, especially when there’s a lot of inauthentic activity happening. No one can argue the discussions on their platforms aren’t affecting elections, public policy, or democracy.”
AI and the Achilles Heel of Authoritarianism
I asked Wu if we were going to see an arms race between AI-generated content and AI-fueled content takedowns. He says no, pointing out that “Authoritarian regimes are very top-down, a centralized government with a centralized narrative.” They find it easy enough to amplify what they like but are quick to shut down anything they don’t. Wu’s skeptical that they’ll turn to AI to generate content because they insist on maintaining such tight control of their agenda.
Biggest challenges in confronting disinfo
Like many nonprofits, Doublethink struggles to recruit staff with a relatively lower salary. But also, this is a new field. There’s no college department in Taiwan training students in these specialties. We need to establish the career pathways, plus the necessary training. Applicants need to have a solid political sense as well as be tech-savvy, plus a detective’s mindset to track leads.
Doublethink estimates it will need tremendous resources to actually achieve its goal. They’ve had good fortune fundraising compared to some other organizations, but considering the work they’re doing, estimate they’ll need ten times their current analyst capacity, and more staff based in more places to develop local contacts and understand the local languages.
Another limit on resources is the high overhead that comes with taking on China online. Doublethink has been on the receiving end of very sophisticated cyberattacks on both the organization and its staff and needs resources to bolster security and training.
How do we increase the cost of opportunists spreading disinformation for a living, to the point that it’s no longer an appealing trade? The whole disinformation space moves very quickly, and exchanging knowledge with peer organizations would be helpful.
Doublethink is looking to the civic tech movement for inspiration. The open data movement took off worldwide – could a unified schema and data structure for recording global disinformation operation incidents allow civil society groups to exchange data, detect patterns, identify actors, and alert one another to the spread of new tactics?
Confronting digital disinfo is inherently interdisciplinary. It will require not just Political Science and Chinese Studies experts, but also technologists, psychologists, storytellers, and everyone who understands how information moves on social media and the internet.
Lastly, how do we know if any of this is working? Is it having the intended effect? The truth is we don’t know. Doublethink would like to work with researchers to understand what effect they’re having and to establish a baseline for online audiences.
Interested in learning more?
Matt Stempeck, the National Democratic Institute, and Code for All worked on a research project to understand how civic tech can help confront disinformation. The project’s goal is to learn from (and share out!) lessons learned from organizations that focus on this area. Check out the Disinformation Research Project to learn more and help us spread the word!