quickcode Use Case: Social Media & Vaccine Misinformation

Research performed by Soubhik Barari, Sophie Hill, and Gary King, Harvard University. Use case written by Soubhik Barari.


In the last few years, we've seen mind-blowing gains in what AI can accomplish with natural language processing. At the same time, identifying and characterizing complex social phenomena from messy data like text documents requires deep expertise and judgment from human analysts, even though it can't realistically or affordably be scaled up. quickcode exactly fills this gap through its analyst-in-the-loop technology for keyword detection.


My team and I – Sophie Hill, PhD candidate in Government at Harvard University, and Gary King, Professor at Harvard University and principal investigator of the project – have been using quickcode to tackle one of the biggest problems the world is facing in real-time: combatting health misinformation on social media. In particular, we are using quickcode to characterize the subtle – and not-so-subtle – vocabulary adopted by spreaders of vaccine misinformation in a large dataset of Twitter users.


A non-trivial challenge was separating vaccine misinformation from scientifically corroborated information about the vaccine. In machine learning, in order to identify some phenomenon well, you also have to identify everything that appears to be, but actually isn’t that phenomenon. For example, we realized that phrases referring to vaccine hesitancy such as "anti-vax" or “skeptic” are almost never used by individuals who share vaccine misinformation. We quickly discovered this when quickcode produced these as exclusion phrases for our initial keyword queries.


https://twitter.com/JoRudman/status/1348696046139019264


https://twitter.com/StarshipBased/status/1343894918532640768

Although we would've likely intuited this on our own eventually, quickcode zeroed in on this and helped us more precisely capture the online language of vaccine misinformation.


quickcode also systematically revealed some of the common narratives and slogans in (and not in) the vaccine misinformation in our dataset. For example, as we developed our set of keywords, quickcode showed that the phrase "don't have to worry" was negatively associated with misinformation-sharing in our keyword sets. It turns out the former was used in a recurring meme shared by vaccine supporters in the format of "if you've ever done [something gross], you don't have to worry about what’s in the vaccine":


https://twitter.com/skinsley26/status/1339933205953191936


On the other hand, quickcode helped us, very early on, to identify unsubstantiated narratives about the efficacy of alternative treatments for COVID-19, in particular ivermectin.


https://twitter.com/BradGeyer/status/1346237113214382080


Finally, after iteratively compiling a keyword set through quickcode, we deployed the keyword query to classify vaccine misinformation spreaders in our dataset and evaluated the classification accuracy for a sample of 200 users. We found that the quickcode keyword set produced a 95% accuracy rate, a false positive rate of less than 10% and a false negative rate of nearly 0%.


Altogether, quickcode has proved to be an extremely useful tool that's helped to sharpen – and expose holes in – our own substantive expertise in misinformation rather than replace it with a black box algorithm. I’m personally very excited to continue using quickcode’s innovative technologies for our research in detecting and combating online misinformation.