Dear parents, please note. You may not need to ask teachers how your kid is performing in studies as his or her tweets will be enough to gauge whether he or she will make it big in the future or not, thanks to Artificial Intelligence (AI).
A team of Russian researchers has used AI-based models to predict high academic achievers from lower ones based on their social media posts. The prediction model uses a mathematical textual analysis that registers users’ vocabulary (its range and the semantic fields from which concepts are taken), characters and symbols, post length and word length. Every word has its own rating (a kind of IQ).
Scientific and cultural topics, English words, and words and posts that are longer in length rank highly and serve as indicators of good academic performance. An abundance of emojis, words or whole phrases written in capital letters, and vocabulary related to horoscopes, driving and military service indicate lower grades in school.
“At the same time, posts can be quite short — even tweets are quite informative,” said Ivan Smirnov, leading research fellow at the Institute of Education of Higher School of Economics University in Moscow.
The study traces the career paths of 4,400 students in 42 Russian regions. “Since this kind of data, in combination with digital traces, is difficult to obtain, it is almost never used,” Smirnov said. This kind of dataset allows you to develop a reliable model that can be applied to other settings.
“And the results can be extrapolated to all other students — high school students and middle school students,” Smirnov said in a paper published in the journal EPJ Data Science.
The researchers said that it is important that the model worked successfully on datasets of different social media sites, such as VK (a Russian online social media and social networking service) and Twitter, thereby proving that it can be effective in different contexts. In addition, the model can be used to predict very different characteristics, from student academic performance to income or depression. The study data included data about the students’ VK accounts (3,483 students consented to provide this information). In the study, unsupervised machine learning with word vector representations was performed on VK post corpus (totalling 1.9 billion words, with 2.5 million unique words).
It was then combined with a simpler supervised machine learning model that was trained in individual positions and taught to predict PISA (Programme for International Students Assessment) scores. Posts from publicly viewable VK pages were used as a training sample — this included a total of 130,575 posts from 2,468 subjects who took the PISA test. The test allowed the researcher to assess a student’s academic aptitude as well as their ability to apply their knowledge in practice, the authors wrote.