This network mapping may also identify a particular strategy used by bad actors of splitting their edit histories between a number of accounts to evade detection. The editors put in the effort to build reputation and status within the Wikipedia community, mixing legitimate page edits with the more politically sensitive ones.
“The main message that I have taken away from all of this is that the main danger is not vandalism. It’s entryism,” Miller says.
If the theory is correct, however, it means that it could also take years of work for state actors to mount a disinformation campaign capable of slipping by unnoticed.
“Russian influence operations can be quite sophisticated and go on for a long time, but it’s unclear to me whether the benefits would be that great,” says O’Neil.
Wikipedia has been battling inaccuracies and false information for 21 years. One of the most long-running disinformation attempts went on for more than a decade after a group of ultra-nationalists gamed Wikipedia’s administrator rules to take over the Croatian-language community, rewriting history to rehabilitate World War II fascist leaders of the country. The platform has also been vulnerable to “reputation management” efforts aimed at embellishing powerful people’s biographies. Then there are outright hoaxes. In 2021, a Chinese Wikipedia editor was found to have spent years writing 200 articles of fabricated history of medieval Russia, complete with imaginary states, aristocrats, and battles.
To fight this, Wikipedia has developed a collection of intricate rules, governing bodies, and public discussion forums wielded by a self-organizing and self-governing body of 43 million registered users across the world.
Nadee Gunasena, chief of staff and executive communications at the Wikimedia Foundation, says the organization “welcomes deep dives into the Wikimedia model and our projects,” particularly in the area of disinformation. But she also adds that the research covers only a part of the article’s edit history.
“Wikipedia content is protected through a combination of machine learning tools and rigorous human oversight from volunteer editors,” says Gunasena. All content, including the history of every article, is public, while sourcing is vetted for neutrality and reliability.
The fact that the research focused on bad actors who were already found and rooted out may also show that Wikipedia’s system is working, adds O’Neil. But while the study did not produce a “smoking gun,” it could be invaluable to Wikipedia: “The study is really a first attempt at describing suspicious editing behavior so we can use those signals to find it elsewhere,” says Miller.
Victoria Doronina, a member of the Wikimedia Foundation’s board of trustees and a molecular biologist, says that Wikipedia has historically been targeted by coordinated attacks by “cabals” that aim to bias its content.
“While individual editors act in good faith, and a combination of different points of view allows the creation of neutral content, off-Wiki coordination of a specific group allows it to skew the narrative,” she says. If Miller and its researchers are correct in identifying state strategies for influencing Wikipedia, the next struggle on the horizon could be “Wikimedians versus state propaganda,” Doronina adds.
The analyzed behavior of the bad actors, Miller says, could be used to create models that can detect disinformation and find how just how vulnerable the platform is to the forms of systematic manipulation that have been exposed on Facebook, Twitter, YouTube, Reddit, and other major platforms.
The English-language edition of Wikipedia has 1,026 administrators monitoring over 6.5 million pages, the most articles of any edition. Tracking down bad actors has mostly relied on someone reporting suspicious behavior. But much of this behavior may not be visible without the right tools. In terms of data science, it’s difficult to analyze Wikipedia data because, unlike a tweet or a Facebook post, Wikipedia has many versions of the same text.
As Miller explains it, “a human brain just simply can’t identify hundreds of thousands of edits across hundreds of thousands of pages to see what the patterns are like.”