Gensim model with 984,676 word vectors from 10 million blog posts from 487,787 Swedish blogs 2007-2016 as CC-BY 4.0

Courtesy of Triop, we’ve been able to produce a large word2vec model on a sizeable amount of Swedish everyday language for the linguistically inclined to enjoy. It is published on Open Science Framework together with other open data and code from the Society-

Östmar, M Swedish Word Vectors from 10 millions blog posts, 2018-07-31

Can word2vec word embeddings be used to find similar high/low psychological state words?

Summary:  Inspired by the book The Listening Society by Hanzi Freinacht a quick’n’dirty experiment was run with an old word2vec model that was lying around to no better use. From the example words of high and low psychological state listed in the book – could we get similar words from the model? Apparently for low state words it works alright, but not for high state words.

Östmar, M Can word2vec word embeddings be used to find similar high and low state words? 2018-03-05 to 2018-03-27

Kan man förutsäga Myers-Briggs personlighetstyp från bloggtexter?

Summary: Can one predict Myers-Briggs personality type from blog texts? In a series of experiments we broke the mbti-types (e.g. INTJ, ESFJ) into their primary Jungian cognitive functions sensing, intuition, thinking and feeling, trained a Naive-Bayesian classifier on uClassify and evaluated the accuracy together with some other statistics. In the end we could achieve an accuracy of 0.58 per cent for the perceiving dichotomi, but then we ran into some postmodern criticism that spoiled all the fun anyways.

Östmar, M, Huss, M Kan man förutsäga Myers-Briggs personlighetstyp från bloggtexter 2018-03-05 to 2018-03-27

Dynamics in Swedish Twitter Communities

Summary: A comparative study of the communities identified in tweets collected during 2015 and 2016 – how stable communities are, which communities are born or disappear, and how people move between them. A searchable web app was also created making the data available under Creative Commons Zero.

Huss, M, Östmar, M Dynamics in Swedish Twitter communities 2017-02-27

Quantified Culture – can word embeddings help understand differences in culture?

Summary: An attempt to use word vectors to analyse culture. The author has analysed word2vec word embeddings trained on English and Swedish wikipedia corpuses, to examine whether there are particular areas of expression that are enriched or depleted in one language compared to another.

Whitington, T Using word vectors to decipher Swedish culture 2017-01-01

Automatic clustering of politically meaningful groups on Twitter

Summary: We identify communities in the Swedish twitterverse by analyzing a large network of millions of reciprocal mentions in a sample of 312,292,997 tweets from 435,792 twitter accounts in 2015 and show that politically meaningful communities among others can be detected without having to read or search for specific words or phrases.

Östmar, M, Huss, M
The big picture of public discourse on Twitter by clustering metadata 2016-12-29

The most grateful people on Swedish Twitter

Summary: We looked through 2,010,781 tweets for expressions of gratitude by using any form of the word ”tack” (thanks), aggregated that by user and calculated a weighted gratitude-score. Thereby we where able to present a top list of the most grateful Twitter accounts. A clear example of our ambition to quantify the qualitative.

Östmar, M
Tacksammast på Twitter – hela listan. 2016-03-10

In progress:

How ideas spread – a theory I long to test

Har du forskningsidéer du vill ha hjälp med?
Kontakta kansli(a)