Three words that mislead regional sentiment analysis
误导地区性情感分析的3个单词
燕山大学 刘立军 供稿
TRANSCRIPT
This is Scientific American's 60-second Science. I'm Karen Hopkin.
You can tell a lot about people's general state of mind based on their social media feeds. Are they always tweeting about their biggest peeves or posting pics of particularly cute kitties? Well, in a similar fashion, researchers are turning to Twitter for clues about the overall happiness of entire geographic communities. What they're finding is that regional variation in the use of common phrases produces predictions that don't always reflect the local state of well-being. But removing from their analyses just three specific terms - good, love and LOL - greatly improves the accuracy of the methods.
"We're living in a crazy COVID-19 era. And now more than ever, we're using social media to adapt to a new normal and reach out to the friends and family that we can't meet face-to-face."
Kokil Jaidka studies computational linguistics at the National University of Singapore.
"But our words aren't useful just to understand what we, as individuals, think and feel. They're also useful clues about the community we live in."
One of the simpler methods that many scientists use to parse the data involves correlating words with positive or negative emotions. But when those tallies are compared with phone surveys that assess regional well-being, Jaidka says, they don't paint an accurate picture of the local zeitgeist.
To find out why, Jaidka and her colleague Johannes Eichstaedt of Stanford University analyzed billions of tweets from around the United States. And they found that among the most frequently used terms on Twitter are LOL, love and good.
"And they actually throw the analysis off. In fact, when we removed these three words alone, we managed to improve upon the simpler word-counting methods - and obtain better, if not perfect, estimates of happiness."
Why the disconnect? Well, Jaidka says one issue is...
"Internet language is really a different beast than regular spoken language. We've adapted words from the English vocabulary to mean different things in different situations."
Take, for example, LOL.
"I've tweeted the word LOL to flirt, express irony, annoyance and sometimes just pure surprise. When the methods for measuring LOL as a marker of happiness were created in the 1990s, it still meant laughing out loud."
There are plenty of terms that are less misleading, says Eichstaedt.
"Our models tell us that words like excited, fun, great, opportunity, interesting, fantastic and those are better words for measuring subjective well-being, just looking at the data."
Their work appears in the Proceedings of the National Academy of Sciences.
Being able to get an accurate read on the mood of the population is no laughing matter.
"That's particularly important now, in the time of COVID, where we're expecting a mental health crisis - and we're already seeing in survey data the largest diminishment in subjective well-being in 10 years at least, if not ever."
No doubt we could all use more fantastic opportunities for great fun and excitement - give or take the LOL.
Thanks for listening for Scientific American's 60-second Science. I'm Karen Hopkin.
VOCABULARY
1. peeve n. 麻烦的事物,怨恨
2. kitty n. a way of referring to a cat 猫咪;小猫
3. parse v. to divide a sentence into parts and describe the grammar of each word or part (对句子)作语法分析;作句法分析
4. tally n. a record of the number or amount of sth., especially one that you can keep adding to 记录;积分表;账。例如:He hopes to improve on his tally of three goals in the past nine games. 他希望提高在过去九场比赛中打进三球的纪录。
5. zeitgeist n. (from German, formal) the general mood or quality of a particular period of history, as shown by the ideas, beliefs, etc. common at the time 时代精神;时代思潮
6. diminish v. to become or to make sth. become smaller, weaker, etc. 减少;(使)减弱,缩减;降低
QUESTIONS
Read the passage. Then listen to the news and fill in the blanks with the information (words, phrases or sentences) you hear.
This is Scientific American's 60-second Science. I'm Karen Hopkin.
You can tell a lot about people's general state of mind based on their social media feeds. Are they always tweeting about their biggest peeves or posting pics of particularly cute kitties? Well, in a similar fashion, researchers are turning to Twitter for clues about the overall (Q1) ______________ of entire geographic communities. What they're finding is that regional variation in the use of (Q2) _______________________________ produces predictions that don't always reflect the local state of well-being. But removing from their analyses just three specific terms - good, love and LOL - greatly improves the (Q3) _______________________ of the methods.
"We're living in a crazy COVID-19 era. And now more than ever, we're using social media to adapt to a new normal and reach out to the friends and family that we can't meet face-to-face."
Kokil Jaidka studies (Q4) __________________________ at the National University of Singapore.
"But our words aren't useful just to understand what we, as individuals, think and feel. They're also useful clues about the community we live in."
One of the simpler methods that many scientists use to parse the data involves correlating words with (Q5) ______________________________. But when those tallies are compared with phone surveys that assess regional well-being, Jaidka says, they don't paint an accurate picture of the local zeitgeist.
To find out why, Jaidka and her colleague Johannes Eichstaedt of Stanford University analyzed billions of tweets from around the United States. And they found that among the most frequently used terms on Twitter are LOL, love and good.
"And they actually throw the analysis off. In fact, when we removed these three words alone, we managed to improve upon the simpler word-counting methods and obtain better, if not perfect, estimates of happiness."
Why the (Q6) ____________? Well, Jaidka says one issue is...
"Internet language is really a different beast than regular spoken language. We've adapted words from the English vocabulary to mean different things in different situations."
Take, for example, LOL.
"I've tweeted the word LOL to flirt, express irony, (Q7) _____________________ and sometimes just pure surprise. When the methods for measuring LOL as a marker of happiness were created in the 1990s, it still meant laughing out loud."
There are plenty of terms that are less (Q8) ______________________, says Eichstaedt.
"Our models tell us that words like excited, fun, great, opportunity, interesting, fantastic and those are better words for measuring subjective well-being, just looking at the data."
Their work appears in the Proceedings of the National Academy of Sciences.
Being able to get an accurate read on the mood of the population is no laughing matter.
"That's particularly important now, in the time of COVID, where we're expecting a (Q9) __________ crisis and we're already seeing in survey data the largest diminishment in subjective well-being in 10 years at least, if not ever."
No doubt we could all use more fantastic (Q10) _______________________ for great fun and excitement give or take the LOL.
Thanks for listening for Scientific American's 60-second Science. I'm Karen Hopkin.
KEY
Read the passage. Then listen to the news and fill in the blanks with the information (words, phrases or sentences) you hear.
This is Scientific American's 60-second Science. I'm Karen Hopkin.
You can tell a lot about people's general state of mind based on their social media feeds. Are they always tweeting about their biggest peeves or posting pics of particularly cute kitties? Well, in a similar fashion, researchers are turning to Twitter for clues about the overall (Q1) happiness of entire geographic communities. What they're finding is that regional variation in the use of (Q2) common phrases produces predictions that don't always reflect the local state of well-being. But removing from their analyses just three specific terms - good, love and LOL - greatly improves the (Q3) accuracy of the methods.
"We're living in a crazy COVID-19 era. And now more than ever, we're using social media to adapt to a new normal and reach out to the friends and family that we can't meet face-to-face."
Kokil Jaidka studies (Q4) computational linguistics at the National University of Singapore.
"But our words aren't useful just to understand what we, as individuals, think and feel. They're also useful clues about the community we live in."
One of the simpler methods that many scientists use to parse the data involves correlating words with (Q5) positive or negative emotions. But when those tallies are compared with phone surveys that assess regional well-being, Jaidka says, they don't paint an accurate picture of the local zeitgeist.
To find out why, Jaidka and her colleague Johannes Eichstaedt of Stanford University analyzed billions of tweets from around the United States. And they found that among the most frequently used terms on Twitter are LOL, love and good.
"And they actually throw the analysis off. In fact, when we removed these three words alone, we managed to improve upon the simpler word-counting methods and obtain better, if not perfect, estimates of happiness."
Why the (Q6) disconnect? Well, Jaidka says one issue is...
"Internet language is really a different beast than regular spoken language. We've adapted words from the English vocabulary to mean different things in different situations."
Take, for example, LOL.
"I've tweeted the word LOL to flirt, express irony, (Q7) annoyance and sometimes just pure surprise. When the methods for measuring LOL as a marker of happiness were created in the 1990s, it still meant laughing out loud."
There are plenty of terms that are less (Q8) misleading, says Eichstaedt.
"Our models tell us that words like excited, fun, great, opportunity, interesting, fantastic and those are better words for measuring subjective well-being, just looking at the data."
Their work appears in the Proceedings of the National Academy of Sciences.
Being able to get an accurate read on the mood of the population is no laughing matter.
"That's particularly important now, in the time of COVID, where we're expecting a (Q9) mental health crisis and we're already seeing in survey data the largest diminishment in subjective well-being in 10 years at least, if not ever."
No doubt we could all use more fantastic (Q10) opportunities for great fun and excitement give or take the LOL.
Thanks for listening for Scientific American's 60-second Science. I'm Karen Hopkin.
(封面图片来源于摄图网,版权归摄图网所有)