Twitter words of warning

From CommunityData

Twitter research can be heaps of fun, but it does have some pitfalls. Here are a few things to keep in mind...

You are what you sample. When we write a python script to collect Twitter data we are only building one part of a scientific collection instrument. Twitter provides the rest. Not knowing how your collection instrument works is a problem for a scientific researcher. For example, we can’t be sure when we run a collection that we got everything that we requested. We don’t know what’s missing. Things might be missing because Twitter has a policy on how they release information. Things might be missing because there is something funny going on in Twitter’s technical systems. Either way, we scientists working outside of Twitter can not interrogate Twitter’s side of the collection directly.

Lots of people use Twitter. Lots don’t. And we don’t know the difference. There are heaps and piles and mounds of research that tells us that so far, no single socio-technical system is used by every person on earth. Rather, every communication has its own set of users. Twitter is a company. Who Twitter users are and how those users might be the same or different to any other group is proprietary information. Be wary of generalizations made between Twitter users and any other group such as populations are large.

Love the rainbow. Fear the rainbow. The fun of doing research on Twitter is that there is such so much heterogeniety. Twitter breaks up the account and Tweet data into over 200 different categories. Many of these categories are themselves hugely diverse.

Scientific Twitter research = Big work, small claims. For the reasons above, expect to do a lot of leg work to get meaningful insights out of Twitter data. Also expect that those insights may be very circumspect.

Don’t go easy on other Twitter researchers. The above advice might seem like Research 101 advice, we see these mistakes over and over in published papers.