It’s Time to Open the Black Box of Social Media

It’s Time to Open the Black Box of Social Media

Social media companies need to give their data to independent researchers to better understand how to keep users safe

Credit: Kailey Whitman

Social media sites are where billions of people go to make sense of the world, connect with others and get information. These companies, which include Reddit, Instagram, TikTok, Twitter, Instagram and TikTok, collect huge amounts of data from every interaction on their platforms.

Despite social media being one of the most important forums for speech, a few people control many of the most important platforms. Mark Zuckerberg controls 58 percent of the voting share of Meta, the parent company of both Facebook and Instagram, effectively giving him sole control of two of the largest social platforms. Elon Musk made a $44-billion offer to take Twitter private (although whether that deal goes through will be determined by a lawsuit). [Editor’s Note: Musk completed his acquisition of Twitter in late October.] These companies have a history sharing very little data about their platforms with researchers. This prevents us from understanding the impact of social media on individuals. We fear that this lockdown on data sharing will continue because of the singular ownership of three of the most powerful social media platforms.

After decades of little regulation, it’s time for more transparency from social media companies.

In 2020 social media was an important mechanism for the spread of false and misleading claims about the election and for mobilization by groups that participated in the January 6, 2021, Capitol insurrection. During the pandemic, misinformation about COVID was widely spread online. Today, social media companies are failing in their promise to ban Russian propaganda about the war against Ukraine. Social media has been a major channel for spreading false information about any issue that is of concern to society. We don’t know the next crisis, but we do know that false claims will be made about it on these platforms.

Unfortunately social media companies are reluctant to release data or publish research, especially when they might not be welcome (although there are notable exceptions). Legislators and regulators must require social media companies that they release data to independent researchers in order to understand the platform’s activities. In particular, we need access to data on the structures of social media, such as platform features and algorithms, so we can better analyze how they shape the spread of information and affect user behavior.

Platforms have stated to legislators that they are taking steps against misinformation and disinformation, by flagging content and inserting facts-checks. Are these efforts successful? Access to data is required to find out. Without better data, it is impossible to have a substantive discussion about which interventions will be most effective and in line with our values. We run the risk of creating new laws or regulations that don’t adequately address harms, or worsening existing problems.

Some members of our team have met with legislators in the U.S.A. and Europe to discuss possible legislative reforms along these lines. The conversation about transparency and accountability for social media businesses has become deeper and more substantive. It moved from vague generalities towards specific proposals. However, the context is still lacking. Regulators and lawmakers often ask us to explain why we need data access, what research it would allow, and how this research would benefit the public and inform regulation of social networks.

To address this need, this list of questions could be answered if social media companies started to share more data about how their services work and how users interact with them. This research could help platforms create safer and more secure systems, and inform regulators and lawmakers who want to hold them accountable for their promises to the public.

  • Research suggests that misinformation is often more engaging than other types of content. Why is this? What characteristics of misinformation are most closely associated with virality and increased user engagement? Researchers believe novelty and emotionality are key factors. However, more research is needed to confirm this. A better understanding of why misinformation is so engaging will help platforms improve their algorithms and recommend misinformation less often.
  • Research shows that the delivery-optimization techniques companies use to maximize revenue, and even the ad-delivery algorithms themselves, can be discriminatory. Are certain groups more likely to see potentially harmful ads such as scams, than others? Are there other users who are less likely to see useful ads such as job postings? How can ad networks improve delivery to make it less discriminatory?
  • Social media companies attempt to combat misinformation by labeling content of questionable provenance, hoping to push users toward more accurate information. Survey experiments have shown mixed results regarding the effects of labels on beliefs, and behavior. We need to find out if labels are effective when people encounter them on platforms. Are labels effective in reducing misinformation spread or attracting attention to posts that users might otherwise ignore. Are labels becoming more familiar to people?
  • Internal studies at Twitter show that Twitter’s algorithms amplify right-leaning politicians and political news sources more than left-leaning accounts in six of seven countries studied. Are there other social media platforms that use similar algorithms?
  • Because of the central role they now play in public discourse, platforms have a great deal of power over who can speak. Platform moderation decisions can sometimes lead to minorities feeling silenced online. Are there disproportionately many groups that are affected by decisions regarding content allowed on platforms? Platforms allow some users to silence others by using moderation tools or systemic harassment to silence certain viewpoints.

Social media firms should welcome independent researchers to help them better understand online harm and inform their policies. Reddit and Twitter have been helpful, but it is not possible to rely on the goodwill of all businesses whose policies may change at the will of a new owner. We hope that a Musk-led Twitter will be just as open as before. [Editor’s Note: This article was written and posted before Musk took ownership of Twitter.] We should not regulate and legislate based on anecdotes in our rapidly changing information environment. We need legislators to ensure we have access to the data that we need to keep users safe.

Editor’s Note (11/11/22): This story was edited after posting to include updates about Elon Musk’s acquisition of Twitter.

A version of this article with the title “Social Media Companies Must Share Data” was adapted for inclusion in the December 2022 issue of Scientific American.



    Renee DiResta is the technical research manager at Stanford Internet Observatory. Credit: Nick Higgins


      Laura Edelson is a postdoc researcher at N.Y.U. Credit: Nick Higgins


        Brendan Nyhan is James O. Freedman Presidential Professor of Government at Dartmouth College. Credit: Nick Higgins


          Ethan Zuckerman teaches public policy, information and communication at UMass Amherst. Credit: Nick Higgins

          Read More