"At the peak of the Syria civil war, tens of thousands of Syrians were leaving their home country and seeking refuge in European countries every day. While some countries welcomed the refugees, some didn’t want them to pass or stay. The Syrian refugees tried various routes to avoid being blocked and get to their destination countries safely. Unfortunately, some routes were very dangerous and as a result some refugees, including young children, got injured or killed when they attempted to pass these routes. Some refugees were helping each other by sharing their experience and knowledge of the routes after they had passed them. They usually did that through exchanging online social media messages anonymously using their smartphones because the messages could appear on the social media site instantly and could keep the information up to date. However, as they didn’t want to be identified in case their whereabouts were found by the Syrian authorities or people smugglers, they tended to use a made-up name when they sent social media messages.
"For many years, social media messages have been analysed for discovering useful information for commercial and other uses. This led me to think whether the data analysis techniques could be used to analyse the messages of refugees such that the routes they were taking and the difficulties they were encountering could be discovered. This information would help the authorities of the countries where the refugees were passing or trying to reach to prepare themselves for the sudden influx of people. To test this hypothesis, I led an internally funded pilot project with academics from the OU and University of Bedfordshire to investigate the feasibility of predicting the migration patterns of refugees. The team had expertise in data analysis, human behaviour analysis, policy and Arab culture and language. We employed a consultant to study the relevant literature, collect and analyse the social media messages, which were mainly from Twitter as the messages were publicly available. We investigated the use of various key words for finding relevant messages. After filtering off many irrelevant messages, over 5000 tweets related to Syrian refugees were collected and these messages formed the basis for this pilot study.
"Through the study we find that the refugees are very cautious about revealing their identities and this leads to a difficulty in authenticating the refugees’ messages. Furthermore, there were lots of messages from people who had an interest or opinion of the Syria crisis and these messages were mixed together with the refugees' message and were hard to be filtered off. We therefore decided to revise the goal of this pilot project to categorise the messages into three classes: messages from Syrians who are considering leaving, messages from Syrians who are migrating and messages from the people who have an interest or opinion in the Syria crisis. Using a machine learning technique, we were able to automatically categorise the collected messages in the three classes with an accuracy of up to 62%. We are confident this classification rate can be further improved when more experiments are conducted. The finding of this study is being written and will be submitted to a machine learning conference in June.
The next goal of this project is to extend the developed technique into identifying the refugees locations and study messages from other social media sites."
Written by Dr Patrick Wong, Lecturer of Intelligent Computer Systems