Language Log
Taiwan(ese) Taiwanese
This has become a hot button issue in recent weeks.
Do we need such a term? What does it signify?
Is there any other kind of Taiwanese?
We have Australian English, British English, and American English; we have Canadian / Quebec French and Belgian French and Louisiana French (I love to hear it), and Swiss French…; Caribbean Spanish, Castilian Spanish, Andean Spanish, Rioplatense Spanish, Canarian Spanish, Central American Spanish, Andalusian Spanish, Mexican Spanish…; Taiwan Mandarin, PRC Modern Standard Mandarin (MSM), Sichuan Mandarin, Northeastern Mandarin….
What's the contrasting / distinguishing term for "Taiwan(ese) Taiwanese"?
Here's an article in Chinese in a Taiwan newspaper that argues for the name of Minnan language on Formosa to be "rectified (zhèngmíng 正名)" as "Táiwān Táiyǔ 台灣台語" ("Taiwan[ese] Taiwanese"). Here's Chau Wu's reaction to the article:
Oy vey! The news network you cited from belongs to the pro-China, pro-"Re-unification" United Daily News organization (Note: PRC has never controlled Taiwan, and the latter has never been part of the former, so why call it "re-unification"?). Of course, their reporters will seek out opinions from the so-called scholars who would spit out such non-sense.
Please take a look at the following YouTube video on Ayo's YouTube channel, Tâi-lâm muē-á kà lí kóng Tâi-gí (A Tainan Girl Teaches You Taiwanese). She provides some cogent information regarding this controversial issue. She speaks in Taiwanese, but you can read the Mandarin subtitles. EP0【台語的迷思】台語為什麼不叫閩南語?學台語的重要性是啥?|台南妹仔教你講台語
There is a recent article in BBC, "Tainan: The 400-year-old cradle of Taiwanese culture." In it the writer mentions his interview with this YouTuber. Tainan: The 400-year-old cradle of Taiwanese culture (7/10/24)
[VHM: This is a worthy article, covering many facets about the history and culture of Tainan. What the author, Will Buckingham, has to say about Ayo makes clear that she is a treasure for the preservation of Taiwanese language.]
Ayo summarizes it very nicely: Tai-gi is a proper noun, which was developed during the Japanese era and this term has been in customary use since then. Even the dictator Chiang Ka-shek used this term. The situation is no different than the American usage of "English" in this country. This term is a historic term, and is a proper noun. Americans never give a thought to its nominal incongruity (a wrong language in a wrong country – Italian spoken in Italy, Icelandic in Iceland, Japanese in Japan, etc. But English in America?).
I think Chau put it very nicely, especially as he added in a subsequent note:
On another aspect – When I first saw the term 臺灣台語 (Taiwanese of Taiwan), I knew it was another example of artificial bureaucratese. My reaction: another "oy vey"! Is it so difficult to simply call it "Taiwanese" without the redundant appendage of "of Taiwan"? In UK, is English called "English of England"? Similarly, Japanese of Japan? Icelandic of Iceland?
Taiwan(ese) Taiwanese — enough already! Selected readings
* "Mixed script writing in Taiwan" (5/24/24)
* "A crack in the hegemonic edifice of hanzi" (5/23/24)
* Taiwanese, Mandarin, and Taiwan's language situation
* Dozens of Language Log posts touching upon American English, British English, Australian English
[h.t. shaing tai]
➖ @EngSkills ➖
Slang of the Day | Vocabulary | EnglishClub
mosh pit
an area in front of the stage at a rock concert where people dance energetically, or "mosh"
➖ @EngSkills ➖
Idiom of the Day
knock Anthony
obsolete To knock one's knees together while walking or running (i.e., be "knock-kneed"). Watch the video
➖ @EngSkills ➖
Language Log
Reading Old Turkic runiform inscriptions with the aid of 3D simulation
"Augmenting parametric data synthesis with 3D simulation for OCR on Old Turkic runiform inscriptions: A case study of the Kül Tegin inscription", Mehmet Oğuz Derin and Erdem Uçar, Journal of Old Turkic Studies (7/21/24)
Abstract
Optical character recognition for historical scripts like Old Turkic runiform script poses significant challenges due to the need for abundant annotated data and varying writing styles, materials, and degradations. The paper proposes a novel data synthesis pipeline that augments parametric generation with 3D rendering to build realistic and diverse training data for Old Turkic runiform script grapheme classification. Our approach synthesizes distance field variations of graphemes, applies parametric randomization, and renders them in simulated 3D scenes with varying textures, lighting, and environments. We train a Vision Transformer model on the synthesized data and evaluate its performance on the Kül Tegin inscription photographs. Experimental results demonstrate the effectiveness of our approach, with the model achieving high accuracy without seeing any real-world data during training. We finally discuss avenues for future research. Our work provides a promising direction to overcome data scarcity in Old Turkic runiform script.
Aside from the Abstract, the lead author also shared with me the following summary paragraph:
For Old Turkic, there is a problem with the text of inscriptions that they are deformed, etc., due to aging and environmental conditions, and there is not a good enough amount of data that correlates various angles of a glyph to its value, as you know, data is the oil for AI. To tackle this problem, we developed a system where we create completely random strings and put them on virtual inscriptions with photorealistic rendering techniques, and it turns out that works wonders: we have been able to go beyond 80% accuracy for actual photographs without making the AI ever see one. Although we had success for this one, and generating images for training was in an application for paper materials, etc., I am also pondering if it might be helpful for other ancient inscriptions whose systematic nature might be more or less known, but a layer of complexity on the surface makes it more challenging to annotate data, hence making it harder to train with actual photographs or estampages.
I am hoping that the techniques developed here for reading Old Turkic runiform script may also be adapted for use on other historical scripts.
Selected readings
* "Pugu, boga, beg" (8/11/20)
* "Tocharian, Turkic, and Old Sinitic 'ten thousand'" (4/23/19)
* "Northernmost runic finds in the world" (2/10/20)
* "Turkish written with Latin letters half a millennium ago" (8/29/16)
* "Unknown language #18" (6/3/24)
* "Unknown language #17" (5/2/24)
* "On the etymology of the title Tham of Burusho kings" (5/17/20)
➖ @EngSkills ➖
Idiom of the Day
a knife in the back
A grievous or supreme act of treachery or betrayal. (Usually preceding "of/for (someone).") Watch the video
➖ @EngSkills ➖
Language Log
Government dampers on AI in the PRC, part 2
"China deploys censors to create socialist AI: Large language models are being tested by officials to ensure their systems ‘embody core socialist values’", by Ryan McMorrow and Tina Hu in Beijing, Financial Times (July 17 2024)
Chinese government officials are testing artificial intelligence companies’ large language models to ensure their systems “embody core socialist values”, in the latest expansion of the country’s censorship regime.
The Cyberspace Administration of China (CAC), a powerful internet overseer, has forced large tech companies and AI start-ups including ByteDance, Alibaba, Moonshot and 01.AI to take part in a mandatory government review of their AI models, according to multiple people involved in the process.
The effort involves batch-testing an LLM’s responses to a litany of questions, according to those with knowledge of the process, with many of them related to China’s political sensitivities and its President Xi Jinping.
The basic premises under which the testing is being carried out ensure that China's AI efforts will end in abject failure:
Two decades after introducing a “great firewall” to block foreign websites and other information deemed harmful by the ruling Communist party, China is putting in place the world’s toughest regulatory regime to govern AI and the content it generates.
The CAC has “a special team doing this, they came to our office and sat in our conference room to do the audit”, said an employee at a Hangzhou-based AI company, who asked not to be named.
“We didn’t pass the first time; the reason wasn’t very clear so we had to go and talk to our peers,” the person said. “It takes a bit of guessing and adjusting. We passed the second time but the whole process took months.”
So you fail but don't know why you failed, you pass but don't know why you passed. Par for the course with anything ideologically imbued in China. That leaves you guessing and eternally hesitant to do anything truly creative.
Self-censorship: that's the name of the game in the PRC.
The filtering begins with weeding out problematic information from training data and building a database of sensitive keywords. China’s operational guidance to AI companies published in February says AI groups need to collect thousands of sensitive keywords and questions that violate “core socialist values”, such as “inciting the subversion of state power” or “undermining national unity”. The sensitive keywords are supposed to be updated weekly.
Users of PRC AI proucts spot their weaknesses immediately:
The result is visible to users of China’s AI chatbots. Queries around sensitive topics such as what happened on June 4 1989 — the date of the Tiananmen Square massacre — or whether Xi looks like Winnie the Pooh, an internet meme, are rejected by most Chinese chatbots. Baidu’s Ernie chatbot tells users to “try a different question” while Alibaba’s Tongyi Qianwen responds: “I have not yet learned how to answer this question. I will keep studying to better serve you.”
Nauseatingly useless.
It gets even worse when you start to look at the hyper-sensitive matter of the mind of Xi Jinping:
…Beijing has rolled out an AI chatbot based on a new model on the Chinese president’s political philosophy known as “Xi Jinping Thought on Socialism with Chinese Characteristics for a New Era”, as well as other official literature provided by the Cyberspace Administration of China.
Then it gets really funny when the authorities try to think of ways to make the system seem not entirely resistant to inquiries regarding political topics:
The CAC has introduced limits on the number of questions LLMs can decline during the safety tests, according to staff at groups that help tech companies navigate the process. The quasi-national standards unveiled in Fe[...]
Phrasal Verb of the Day | Vocabulary | EnglishClub
throw off
to get rid of something that has been bothering you
➖ @EngSkills ➖
Word of the Day
coltish
Definition: (adjective) Lively and playful; frisky.
Synonyms: frolicky, frolicsome, rollicking, sportive.
Usage: The substitute teacher found himself entirely overwhelmed by the energetic seventh-graders, whose coltish antics disrupted the lesson time and time again.
Discuss
➖ @EngSkills ➖
Word of the Day
Word of the Day: credo
This word has appeared in 45 articles on NYTimes.com in the past year. Can you use it in a sentence?
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
stand for (1)
If letters or symbols stand for something, they represent that thing.
➖ @EngSkills ➖
Word of the Day
exceptionable
Definition: (adjective) Open or liable to objection or debate; debatable.
Synonyms: objectionable.
Usage: We can't have perfection; and if I keep him, I must sustain his administration as a whole, even if there are, now and then, things that are exceptionable.
Discuss
➖ @EngSkills ➖
Language Log
Topolect: a Four-Body Problem
From Jeff DeMarco:
The fanfic fourth book in the sāntǐ 三体 ("three-body [problem]") series, translated by Ken Liu has the following sentence: http://languagelog.ldc.upenn.edu/~bgzimmer/baoshu.jpg Women dressed in flowing silk dresses oared elegant barges over the placid waterways, singing folk ditties in the gentle, refined accents of the Wu topolect …
fāngyán 方言 (lit., "place speech", i.e., "topolect; dialect")
Wú fāngyán 吳方言 ("Wu topolect") Wu (traditional Chinese: 吳語; simplified Chinese: 吴语; Wu romanization and IPA:ngu ngei [ŋu²³³.ŋə̰i²¹⁴], wu6 gniu6 [ɦu˩˩˧.n̠ʲy˩˩˧] (Shanghainese), ghou2 gniu6 [ɦou˨˨˦.n̠ʲy˨˧˩] (Suzhounese), Mandarin Wúyǔ [u³⁵ y²¹⁴]) is a major group of Sinitic languages spoken primarily in Shanghai, Zhejiang Province, and the part of Jiangsu Province south of the Yangtze River, which makes up the cultural region of Wu. Speakers of various Wu languages sometimes labelled their mother tongue as Shanghainese when introduced to foreigners. The Suzhou dialect was the prestige dialect of Wu as of the 19th century, but had been replaced in status by Shanghainese by the turn of the 20th century. The languages of Northern Wu are mutually intelligible with each other, while those of Southern Wu are not.
(Wikipedia) Selected readings
* "'The Three Body Problem' as rendered by Netflix: vinegar and dumplings'" (3/23/24)
* "Ken Liu reinvents Chinese characters" (12/5/16) — translator of The Three Body Problem
* "Ted Chiang uninvents Chinese characters" (5/13/16)
* "Bringing back the Cultural Revolution — in English" (5/28/21)
* "Thought panzers" (2/24/2) — on "River Elegy"
* "The Three-Body Problem: The 'unfilmable' Chinese sci-fi novel set to be Netflix's new hit 3 Body Problem", BBC (3/19/24), by James Balmont
* "'Topolect' is in China!" (4/14/18)
* "'Topolect' is spreading in China" (6/20/19)
* "Tianjin topolect: linguistic diversity in China (and India)" (4/29/24)
* "Crosstalk about topolects" (12/16/19)
* "Concentric circles of language in Beijing, part 2" (6/13/20)
* "Dialectometry" (4/26/24)
* "Topolect writing" (11/23/14)
* "The American Heritage Dictionary of the English Language, 5th edition" (11/14/12) — q.v. "topolect"
* "Mutual intelligibility" (5/28/14) — see the long list of posts linked at the bottom)
* "What Is a Chinese “Dialect/Topolect”? Reflections on Some Key Sino-English Linguistic Terms," Sino-Platonic Papers, 29 (1991).
➖ @EngSkills ➖
Slang of the Day | Vocabulary | EnglishClub
gross
disgusting, very unpleasant
➖ @EngSkills ➖
Word of the Day
Word of the Day: futile
This word has appeared in 184 articles on NYTimes.com in the past year. Can you use it in a sentence?
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
get along
If two people get along, they like each other and are friendly.
➖ @EngSkills ➖
Word of the Day
emanation
Definition: (noun) Something that issues from a source.
Synonyms: emission.
Usage: The sulphuretted hydrogen emanations, which Captain Burton mentions, could be distinctly smelt.
Discuss
➖ @EngSkills ➖
Language Log
Deutsche Zungenbrecher
"Some German tongue-twisters", posted on 21/07/2024 by StephenJones.blog
Whereas the mind-boggling “tapeworm words” in my post on Some German mouthfuls are of a practical nature, the realm of fantasy opens up whole new linguistic vistas. In a stimulating article, Deborah Cole introduces the work of the Berlin-based cabaret performer, playwright, and pianist Bodo Wartke.
She begins with some drôle political context:
Annegret Kramp-Karrenbauer, a former defence minister with a dastardly difficult name to say, was long seen as a likely successor to the relatively pronounceable ex-chancellor, Angela Merkel. Kramp-Karrenbauer’s resignation as the conservatives’ party chief came as a relief to news presenters the world over, clearing the way for the tight three-syllabic Olaf Scholz. Sabine Leutheusser-Schnarrenberger, once a federal justice minister and the ultimate double-barrelled tongue-tripper, was not invited to join his cabinet.
Now Bodo Wartke and his musical partner Marti Fischer have gone viral with their rap-tinged Zungenbrecher (“tongue-breakers”)—notably “Barbaras Rhabarberbar” (recorded in 144 takes!), the story of a bar owner named Barbara who enchants all who try her rhubarb cake, including a group of bushy-bearded, beer-swilling barbarians who bring their barber back to try a bite….
The post includes the two part video of “Barbaras Rhabarberbar”. En passant, I heard "barber shop" and "abracadabra".
The related readings at the bottom include a link to an entertaining post on German compound nouns (Bandwurmwörter “tapeworm words”).
Selected readings
* "Long words" (6/25/18)
* "German lexicographic richness" (10/11/21)
* "The Germans have a word for it" (9/9/09)
* "Verschlimmbessert" (3/13/15)
* "Translating the untranslatable" (10/28/10)
* "TFW" (12/28/16)
* "Googlefreude, Googleschaden, Schadengoogle…" (1/2/07)
* "German wordcraziness rules" (12/18/22)
* "Googlefreude, Googleschaden, Schadengoogle…" (1/2/07)
* "Schadenfreudeful" (4/20/19)
* "Herrgottsbescheisserle" (9/4/20)
* "Five words" (6/30/20) — this comment and several of the following comments, including this one where I introduce a word my Austrian father taught me when I was a little boy: Constantinopolitanischerdudellsackpfeiffenmachergesellschaft (Constantinople Bagpipe Manufacturing Company)
➖ Sent by @TheFeedReaderBot ➖
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
keep in
to make someone stay in a place like a school or a hospital
➖ @EngSkills ➖
Word of the Day
precede
Definition: (verb) Furnish with a preface or introduction.
Synonyms: preface, premise, introduce.
Usage: She always precedes her lectures with a joke.
Discuss
➖ @EngSkills ➖
bruary say LLMs should not reject more than 5 per cent of the questions put to them.
LOL! If, heaven forbid, I had to live in the the PRC, I could defeat the system very easily: I would just keep asking difficult political questions, such as the treatment of Uyghurs and Tibetans and policies regarding languages other than Mandarin. But then the system would undoubtedly report ME for being obstreperous, and I would be brought in to drink tea.
The safest policy, one that has been adopted by some LLM companies, is just to reject all questions that touch upon Xi Jinping. Another is to ensure that their chatbots can only supply answers that are certifiably safe by government censors.
AI with socialist characteristics reminds me of mathematics with socialist characteristics, physics with socialist characteristics, chemistry with socialist characteristics, English literature studies with socialist characteristics… — all bound to fail miserably.
Selected readings
* "Government dampers on AI in the PRC", (7/16/24)
* "The perils of AI (Artificial Intelligence) in the PRC" (4/17/23) — with extended bibliography
[Thanks to Mark Metcalf]
➖ @EngSkills ➖
Idiom of the Day
knick-knack
Any miscellaneous trinket or toy, especially one that is delicate or dainty. Watch the video
➖ @EngSkills ➖
Slang of the Day | Vocabulary | EnglishClub
hang | hang out
to spend time with
➖ @EngSkills ➖
Idiom of the Day
a knee-slapper
A hilarious joke, especially one that evokes loud and prolonged laughter. Watch the video
➖ @EngSkills ➖
Language Log
New horizons in word sense analysis
Today's xkcd:
http://languagelog.ldc.upenn.edu/myl/organ_meanings_2x.png
Mouseover title: IMO the thymus is one of the coolest organs and we should really use it in metaphors more."
Like all aspects of word meaning, such metaphors come and go. For example, batshit (in the metaphorical meaning "nonsense" or "crazy") came into use in the middle of the 20th century, presumably via confluence of the older "bats in the belfry" phrase and the proliferation of other (and older) metaphorical "fecal compounds". And medicine has long since left the science of humorism behind, but we've inherited a metaphorical residue when we use phlegmatic to mean "calm, sluggish", or bilious to mean "irascible".
Recent applications of "deep learning" to the analysis of semantic change will open another chapter in the adventure that I described in my 2011 Henry Sweet Lecture, "Towards the Golden Age of Speech and Language Science":
For the sciences of speech and language, the 21st century promises to bring the kind of progress that the 17th century brought to the physical sciences.
Our telescopes and microscopes, our alembics and Pneumatical Engines, are today's vast archives of digital text and speech, along with new analysis techniques and inexpensive networked computation.
However, the scientific use of these new instruments remains mainly exploratory and potential. There are several critical problems for which we have at best partial solutions; and like our 17th-century predecessors, we need to unlearn some old ideas on the way to learning new ones.
Focusing especially on Henry Sweet's own interests in phonetics and in the history of English, this talk will discuss some of the barriers to be overcome, present some successful examples, and speculate about future directions.
Some recent papers (and code) on corpus-based semantic change analysis:
Dominick Schlechtweg et al., "SemEval-2020 task 1: Unsupervised lexical semantic change detection", 2020.
Sinan Kurtyigit et al., "Lexical Semantic Change Discovery", 2021.
Francesco Periti and Stefano Montanelli, "Lexical Semantic Change through Large Language Models: a Survey", 2024.
➖ @EngSkills ➖
Word of the Day
Word of the Day: imperceptibly
This word has appeared in 14 articles on NYTimes.com in the past year. Can you use it in a sentence?
➖ @EngSkills ➖
Phrasal Verb of the Day | Vocabulary | EnglishClub
iron out
If you iron out the last details of a deal, you sort out the final problems or issues.
➖ @EngSkills ➖