View on GitHub

pdickson2.github.io

Review of Incorporating twitter-based human activity information in spatial analysis of crashes in urban areas by Bao, Liu, Yu, and Xu.

In anticipation of utilizing twitter data spatially, we looked at the available academic literature on the subject. I looked at a piece written about the correlations between tweet content and locations and traffic patterns, specifically of crashes. The article can be found here for context.

The work looked specifically at the relationship between check-ins on twitter and crash patterns around Los Angeles, focusing on the type of check-in made and how its spatial data could be looked at relative to the spatial patterns of traffic around it. Their methodology seemed to follow inductive thinking, as they set out to see basically if their research would be possible, and what would come of it, rather than beginning with an established hypothesis or body of research as their starting point.

The authors of the study were quite overt about the sources of their data, their traffic data is all well established and open for public use. Similarly, their twitter data was taken from a preexisting study in 2011, who includes their data set in their work themself. Then they describe how they extracted the particular tweets they utilized in this study, providing reasonable hope for replicability and reproducibility. However, the one place where a question arises is that they state that they created a python crawler to extract the locations from the tweet information, but fail to provide documentation on this crawler.

Following the data, the paper goes into great detail to describe and present its analytical methodology, presenting the names and equations of the analyses they performed, with explanations of what each function serves to do. However, I would say that this report is more replicable than it is reproducible. While they are quite transparent about describing the sources of their data and the basic ideas behind their methodology, I think it would be impossible to recreate the exact same results without reaching out for additional information. However, I do think that the results of this paper are quite replicable, given the right skill set. Certain aspects of their analysis can realistically be recreated in R, such as spgwr, Moran’s I, gaussianweights, and others. I don’t know that I personally could reproduce their results given my general lack of familiarity with statistical analyses, but someone experienced in python and statistical theory and methodology most likely could apply their research to another city.

However, the authors are quite careful to stress that their research was less about creating meaningful conclusions and more about establishing a functional means of combining spatial data with social media data. They also took care to mention many of the faults in their methodology and how it could be improved in the future.

Return to homepage