Stefan Dahlberg, associate professor at DIGSSCORE, received a grant for this project on studying opinions and societies using online data. For this project, they will use the Norwegian Citizen Panel to study open text answers in surveys.
Online text data is produced at an accelerating speed, with people expressing their opinions, beliefs and attitudes in traditional websites, blogs, forums, in commentator fields, and through social media. The availability of massive amounts of online text data has generated expectations and enthusiasm within the Social Sciences regarding new approaches and analytical pathways based on people's unsolicited communicative behavior.
However, vast amounts of text data is not enough per se for solving societal questions, and selection problems and representativeness cannot be solved by merely adding more data. So far, great effort has been put on the development of analytical tools, while less attention has been placed on the actual data. For whom is data collected from the Internet representative? To what extent is it comparable across countries, languages and demography? Questions like these, regarding data availability and data validity have been largely ignored in the optimistic Big Data climate of late.
This project will bring together scholars from Computer Science and Social Science to answer these questions, and to validate the use of online text data as a complement to traditional surveys and polls. They will do this by first answering the question "What text data is actually available on the Internet?", and then collect text data using traditional survey experiments in order to answer the question "What does representative data really look like?". For this, the Norwegian Citizen Panel will be one of the arenas to collect text data. Text data will be collected from representative samples in 20 countries.
In the last phase, the project will develop computational methods to answer questions such as "Is this text data relevant for my purposes?", "Is this text data reliable?", and "Is this text data representative of a population?". The outcome of the project will contribute with important means for using online text data as complements to traditional surveys and measures of human behavior, attitudes and opinions.
This project is related to the project "Language Effects in Surveys" at the University of Gothenburg, which you can read more about here.