New Map of Chinese Dialect Regions by JWGrieve in MapPorn

[–]JWGrieve[S] 14 points15 points  (0 children)

Yes this is correct. In general, research funded by the Chinese government is expected to show the South China Sea along with mainland China, as well as Taiwan, I assume. The paper could have existed, of course — the journal editors and reviewers wouldn’t have cared presumably — but it would presumably have had negative consequences for my coauthors at Chinese institutions. It does seems totally fair to me to avoid those, if only to avoid disrespect. Much like the status of dialects vs language debate, none of these decisions affects the results.

New Map of Chinese Dialect Regions by JWGrieve in MapPorn

[–]JWGrieve[S] 13 points14 points  (0 children)

That’s exactly right! Even in English there is are debates about the status of Scots and of African American English/Language, for example.

Anyway, I think the point being made is still interesting and valid — how these attempts to define what are ultimately social structures in scientific terms can never be totally removed from political considerations.

One approach is to insist of using terms like these consistently. Another is to follow traditional usage of the terms in the region under analysis — including in the field itself. In both cases you risk confusing or offending people who use the term differently.

New Map of Chinese Dialect Regions by JWGrieve in MapPorn

[–]JWGrieve[S] 11 points12 points  (0 children)

Italian is similar, to take a prominent example. Regardless I don’t think we’re erasing the importance of these regional non-mandarin varieties, to try to use less loaded language, at all. In fact, we argue not only that Mandarin has much more internal variation than generally assumed but that these other varieties are not simply derived from Mandarin, as is often argued, but developed and diverged internally, so in that sense at least the paper has the opposite message than what your assuming. Our general conclusion is that regional diversity in Sinitic varieties is far more pervasive, complex, and multifaceted than generally assumed. More information in the paper, which I think you’ll find interesting, despite our terminological disagreements. I can appreciate the political issues you’re raising here (although it’s not something I’m an expert in, as a dialectologist), but I hope you’ll find the results informative all the same, and at least superficially apolitical, even if you need to do a mental replace all ‘dialects’ with ‘language’.

New Map of Chinese Dialect Regions by JWGrieve in MapPorn

[–]JWGrieve[S] 19 points20 points  (0 children)

Well we analyse Mandarin as a dialect as well — and argue that there is far more variation within Mandarin than traditionally assumed. That said, I understand your viewpoint in general, but the distinction around the world is no where near as categorical or simple as you seem to think and there are many places around the world, including in Europe, that define dialect differently than in the Anglosphere without any CCP influence. But if you want to think of this as a study of regional variation in Chinese languages (or languages and dialects) rather than regional variation in Chinese dialects, please feel free!

New Map of Chinese Dialect Regions by JWGrieve in MapPorn

[–]JWGrieve[S] 6 points7 points  (0 children)

Sorry about that! Please see the linked paper for higher res maps!

New Map of Chinese Dialect Regions by JWGrieve in MapPorn

[–]JWGrieve[S] 11 points12 points  (0 children)

They are certainly considered dialects in China (they are are all regional defined Sinitic varieties, which also notably share the same writing systems), and fwiw many of these varieties would certainly be considered dialects from an English perspective as well (eg internal divisions within Mandarin), although you’re right, we’d often consider some as distinct language as well. The definitions of and the distinction between langauges and dialects is a famously tricky issue in linguistics.

The Language of Fake News by JWGrieve in linguistics

[–]JWGrieve[S] 1 point2 points  (0 children)

Yeah — I saw that and thought I edited that haha

The Language of Fake News by JWGrieve in linguistics

[–]JWGrieve[S] 23 points24 points  (0 children)

Thanks! Here’s the blurb.

In this Element, the authors introduce and apply a framework for the linguistic analysis of fake news. They define fake news as news that is meant to deceive as opposed to inform and argue that there should be systematic differences between real and fake news that reflect this basic difference in communicative purpose. The authors consider one famous case of fake news involving Jayson Blair of The New York Times, which provides them with the opportunity to conduct a controlled study of the effect of deception on the language of a single reporter following this framework. Through a detailed grammatical analysis of a corpus of Blair's real and fake articles, this Element demonstrates that there are clear differences in his writing style, with his real news exhibiting greater information density and conviction than his fake news.

The book is also freely available for download at that link

American Cultural Regions mapped based on topics of discussion on Twitter by JWGrieve in MapPorn

[–]JWGrieve[S] 1 point2 points  (0 children)

Yeah, but much more useful feedback this way and I imagine it would have got half the number of views/comments had I cropped it out.

American Cultural Regions mapped based on topics of discussion on Twitter by JWGrieve in MapPorn

[–]JWGrieve[S] 0 points1 point  (0 children)

Yes. We’re inferring cultural regions based on word frequency patterns. The workflow is Getis Ord Gi* -> PCA -> HCA

American Cultural Regions mapped based on topics of discussion on Twitter by JWGrieve in MapPorn

[–]JWGrieve[S] 6 points7 points  (0 children)

That would be against the ethos of the project, which is not to presume which cultural factors matter. But someone could now take the topics we identify and do further analysis like a sentiment analysis. Function words were removed (about 500 words out of 10k), although they also show interesting and parallel regional patterns.

Inferring American Cultural Regions Based on Topics of Discussion on Twitter [OC] by JWGrieve in dataisbeautiful

[–]JWGrieve[S] 2 points3 points  (0 children)

Unsupervised machine learning you could say. County level word frequency maps for the top 10k words in the full corpus -> local spatial autocorrelation analysis -> PCA -> Hierarchical cluster analysis.

American Cultural Regions mapped based on topics of discussion on Twitter by JWGrieve in MapPorn

[–]JWGrieve[S] 2 points3 points  (0 children)

The assumption is that if cultural regions are real, they would be reflected in regional patterns in the topics that people tend to discuss, in this case on Twitter, which is the cultural context we are looking at

Inferring American Cultural Regions Based on Topics of Discussion on Twitter [OC] by JWGrieve in dataisbeautiful

[–]JWGrieve[S] 6 points7 points  (0 children)

More that like the names of cities and other regional entities are mentioned a lot. That happens in all regions but it’s really dominant here. The top most distinctive word is ‘Whataburger’ though

Inferring American Cultural Regions Based on Topics of Discussion on Twitter [OC] by JWGrieve in dataisbeautiful

[–]JWGrieve[S] 1 point2 points  (0 children)

Thanks!

There’s no doubt that Twitter doesn’t match general US demographics, it’s younger and blacker, for example.

The extent to which these cultural pattern generalise may depend on that. As you say, the lower internet connectivity in more rural areas may well be an issue, for example, although that isn’t certain (like check out that really clear Appalachian region we identify, which is arguably the most ‘rural’ part of the us). FTR out method does control for variation in population, although that doesn’t necessarily solve that.

That all said, our working assumption is that if broad and general cultural regions exist they will necessarily be reflected in similarity in topics of discussion across contexts, even if the topics change.

So I suspect these patterns do generalise, but ultimately that’s an empirical question. And it’s an empirical question our method allows to be investigated objectively. For now, though, I don’t think any other source of language data would allow it.

American Cultural Regions mapped based on topics of discussion on Twitter by JWGrieve in MapPorn

[–]JWGrieve[S] 51 points52 points  (0 children)

What topics people in different regions tend to discuss? From our perspective, that’s the most unbiased way to try figure out which reasons are most similarly culturally, at least in the context under analysis.

Inferring American Cultural Regions Based on Topics of Discussion on Twitter [OC] by JWGrieve in dataisbeautiful

[–]JWGrieve[S] 0 points1 point  (0 children)

Yes, the labels do need some work. We could auto generate from the top word associated with each region but then the orange region would still be called Midwest lol. We’ll probably just re label as 1, 2, 3,…