Although phase 3 projects have only recently started some have already received interest in the media. The Trees and Tweets project is conducting an analysis of dialect variation based on a corpus of billions of tweets and an analysis of migration patterns based on a dataset consisting of millions of family trees.
At the Methods in Dialectology XV conference in Groningen, the Netherlands, the Project Manager for Trees and Tweets, Jack Grieve, presented some of the first results of their study. In fact, he used the data to illustrate the application of some advanced spatial methods for dialectology and produced some quick maps for the popular linguistics blog Language Log. The maps show the significant geographical variation of using “um” and “uh” across the USA. A good example of something the team can only do now with this type of data.
Following on from this blog post, qz.com produced the following article on the results: Um, here’s an, uh, map that shows where Americans use “um” vs. “uh”.
You can find out more about the project at the Trees and Tweets Dialect Project Blog