Storywrangler: A journey in visualizing 100 billion tweets in 100 languages, and what it means for future research
Data visualization artist-in-residence at the University of Vermont's Complex Systems Center, Jane Adams describes the Computational Story Lab's process of parsing tweets containing 1 trillion 1-grams from a 10% sample of all public tweets spanning from 2008 to 2021. From language identification; to parsing 1, 2, and 3-grams; computing normalized frequency and usage rank; gathering statistics on language speakers; developing a new instrument for measuring rank-turbulence divergence; and building a real-time visualization pipeline in React, this project has involved a massive team of researchers and many years of stumbling blocks before reaching the web. The creation of Storywrangler has subsequently informed research on projects related to hurricane awareness; public discourse on mental health and the #BlackLivesMatter movement; data journalism about the meteoric rise of discussion about Juneteenth, the January 6th Capitol Riots, and the verdict in the trial of Derek Chauvin; and shows promise on topics ranging from linguistic study of idioms in common parlance, to trends in the use of "cashtags" relative to financial movements in the stock and cryptocurrency markets.
Jane Adams is the Data Visualization Artist-in-Residence at the University of Vermont Complex Systems Center. She builds interactive visualization tools for exploratory analysis of linguistics on social media platforms as part of the Computational Story Lab, and EDA for high-dimensional health information as part of the MassMutual Center of Excellence in Complex Systems and Data Science. This fall, Jane will be joining Khoury College of Computer Sciences at Northeastern University in the Data Visualization Lab. In her spare time, Jane is a practicing artist experimenting with physical computing, data visualization, and machine learning as art media. Website: universalities.com