The purpose of data scraping and visualisation is to collect, visualise and interpret user-generated content on a chosen online network, with a focus on particular key terms. In this case, Instagram Hashtag Explorer has been used in order to scrape and collect big data. Further, in order for this data to be organised, visualised and later analysed, Gephi operations have been applied.
To be able to deduct big data based on the key-terms that have previously been used in this research, Instagram Hashtag Explorer was picked. The choice of this scraper lies in the aim of focusing on the frequency and intensity with which consumers include these terms in their online activity. The advantage of this choice is that it enables us to make correlations and relevant visual graphs, which would add content and value to the research overall. However, the limitation of scraping data related to one hashtag-only at the time did not allow us to search for particular connections. Nonetheless, this disadvantage may be seen as a proof of unbiased and unaltered collection of content, as the researchers could not manipulate the focus of the results, keeping it to a general level. Also, not being able to search by both hashtag and location, has kept the research to Global parameters.
The selection of the key hashtags has drawn the following list: #airbnb, #blablacar, #couchsurfing, #uber, #collaborativeconsumption, #shareconomy, #peertopeer. The first four hashtags represent the four companies/platforms that have been previously mentioned in this research. They refer to two key industry fields: accommodation and transport. The choice of scraping data particularly oriented towards particular companies/platforms, is to determine in which context and with what connections these terms are adopted and used by consumers. Further, the last three hashtags refer to the generic names of the phenomenon and they correlate to the trend in larger terms. This data has been used for a larger overview of the Global context of the trend. The reason why other of the previously used key-terms have not been included is that they are either too broad or too vague if used alone, e.g. ‘disruption’, ‘customers’, ‘enabling the optimization of resources’ etc.
The scraper would collect data, which is mainly relevant to the first sub-question, namely: ‘What are the motives that drive customers to use shareconomy businesses?’. In particular, the first four hashtags reveal in what context the customers use these mentions, which would derive in an understanding of their motives and customer profiles. Nonetheless, the data resulted from scraping the activity on the last three terms adds to comprehending the customers’ understanding of what the concepts stand for, giving insight on their use in the daily customer life. The connections that will arise also give input on how users see value in these topics. However, although at a first glance the results will aid to answering the first sub-question, these insights are of much relevance for providing the industry with recommendations on opportunities and threats on the current trend, which relates to second sub-question: What are the opportunities and threats (consequences) of shareconomy?.
Collecting the Data
Since the data scraping was conducted on a Global level, the number of iterations have been kept to 10. This is nonetheless due to administrative aspects, which in this case refer to collecting data that is manageable in size. Although the number of iterations has been kept relatively low, the search query still resulted in databases of thousands of nodes and edges. Even in this case the volume was fairly big and it will be further explained in the data visualisation part, how the most relevant connections and terms have been traced down and prioritised with view on interpreting them. Further, the scraped data traced the Instagram activity related to the respective hashtag in the course of a day.
Beside from the regular disconnections that this tool has a disadvantage in, the data scraping process has been conducted without any other problems. However, as mentioned in one of the sections above, one of the limitations of this tool and therefore one of the potential improvements comes in enabling the simultaneous search of multiple hashtags. In addition, enabling the synchronic search based on key words and location would lead to a more classified data and therefore to the elimination of residual information, which might not be of use to the researcher. Overall, the data scraping action could be conducted under relatively good conditions.
Visualising the Data
Given the fact that there were seven hashtags, where data was scraped from, the visualisation of data will be based on seven different packages of data resulted from the Instagram Hashtag Explorer.
The gdf files have been approached one at the time, with consistency in the data visualisation settings. Having one graph only would not be sufficient for the purpose of this research, since its Global approach is focusing on various platforms and aim at answering two sub research questions. Therefore, it has been concluded that the collection of big data, in this case, has to be rich enough in order to add a real value, otherwise it is a mere grasp. Hence, seven different graphs will be discussed in the following sections and based on the visualisation and interpretation of their data; the report will conclude with recommendations.
The raw data has been organised through the Force Atlas 2 layout and prevention of overlapping. In addition, the graphs have been structured based on the nodes attributes of count and edges attributes of weight. For the purpose of this research the choice of structuring the nodes based on their count regarded the importance of having displayed the most frequently mentioned hashtags that came in the posts where the key hashtag has been mentioned. This enables us to draw conclusion as on how the main hashtag is used, getting insight on users’ preference and context of the trend. In addition, all the graphs have been coloured in the way of the main nod- key hashtag to be the largest and most intensely coloured, leaving the other most frequent to have gradually toning down colours. Further, the weight of the edges is portrayed by the thickness of the line that connects the respective nodes. The stronger two nodes-hashtags-are connected the thicker and stronger coloured the line is. Relevant to be mentioned is that the attached graphs do not display the entire visualisation of the data collected on the respective hashtag, but only the top most used terms. This way the graph focuses on the key aspects, which are worth taken into discussion. The threshold floor for the number of nodes included is 10. Thus no graph is illustrating a situation with less than ten connection, only but from ten and more above.
Interpreting the Data
When it comes to #airbnb the top connection is with ‘#travel’, please refer to Graph 1. Within 189 posts, ‘#travel’ occupies the second place, with 62 mentions, followed by ‘#vacation’, ‘#trip’ and ‘#wanderlust’, with 28, 22 and 19 hashtags. Airbnb is, therefore mentioned in the context of leisure activities and not business-related remotes. Most of the key connections of #airbnb bring a sense of adventure through the related terms, which gives an insight on the profile of customers that choose to opt for Airbnb. Since the customers might be considered to fit within a particular persona, we can draw conclusions on strong motives that drives customers towards collaborative consumption: a need for experimenting the new, for a change from the traditional accommodation means and nevertheless, a desire for travelling for leisure purposes. This aspect is relevant for the profile and motives of the customers of shareconomy, when it comes to issuing recommendations to the traditional industry.
Graph 2 has been made based on 191 posts related to #blablacar. In this case as well, the second runner for most mentions- ‘#travel’ again- comes at great difference from the key hashtag, scoring a third of occupancy- 65 posts. This graph is however overall richer in nodes than the previous one, and that is due to the fact that multiple terms were mentioned for the same number of time. In the present graph, the following most connected hashtags to blablacar are: ‘#rideshare’, ‘#couchsurfing’, ‘#travelblog’. As seen, couchsurfing is fairly often mentioned together with blablacar, reasoning on the fact that the customers prone to opt for shareconomy in terms of transport, would do so for accommodation as well. Another relevant nod is ‘#bugdettravel’, which has been as well mentioned for 44 times. Having this hashtag present in a good amount of the posts endorses the findings of the literature review on why customers opt for the trend: financial reasons.
Further, Graph 3 reflects the data collected on #couchsurfing and here as well, the second most used hashtag is ‘#travel’. For this graph, a striking resemblance may be noticed with regard to Graph 1 and Graph 2. Most of the hashtags mentioned in the case of the other graphs are present here as well, leading that the customer of Couchsurfing may be underneath the same profile of the one of Blablacar and Airbnb and therefore to be driven by the same motives. With 22 and 14 mentions, the connections to ‘#backpacking’ and ‘#bugdettravel’ reveal the fact that the customers are after adventurous alternatives, on a tight budget sometimes. Which means that shareconomy is satisfying the needs of a budget traveller, amongst others.
The particularity of Graph 4 and #uber data comes from the fact that for the first time in this series of data analysis, the reference to geographical locations is often mentioned; and although the data could not be scrapped based on location as well, the Instagram activity localized itself through hashtags. Although with a large difference from the main hashtag, the second runner up-‘#lyft’-mentioned 38 times is also a peer-to-peer transport company, very much similar to Uber. Therefore, most of those users that are opting for Uber are also open to other companies from the shareconomy sphere, not being tight to a company in particular but to the concept in general. Further, other hashtags locate the users in America, through ‘#miami’ and ‘#ubernyc’. Also, the German hashtags indicate the localisation of the users. These findings come in strong connection with the analysis of the Google Trends, which have revealed that the trend is most popular in US and Germany.
Graph 5 and Graph 6, which visualise the data on #collaborativeconspumtion and #shareconomy present a shift in paradigm. While the data resulted from the research of the four companies portray the users as customers, the following two graphs display the users as generators of goods and services themselves. Both the graphs have the hashtag ‘#startup’ amongst the most mentioned ones. At the same time, the graph on #collaborativeconspumtion and its most connected hashtags generate insight on the category of Instagram users whose interest come upon the trend: young or student population. This aspect can be approached as another trait of the market of the trend. In the graph of #shareconomy the population is further shaped under the persona of the young entrepreneur, who builds on the trend. In Graph 6 Frankfurt is twice present amongst the most used hashtags in connection with the key one-#Shareconomy -leaving room for discussing the popularity of the trend in Germany.
Probably the least relevant data came from the search upon #peertopeer, visualised in Graph 7. In this case, the concept seems to be mostly used in connection to military service and in a military network. In 189 Instagram posts, ‘#sharingeconomy’ has only been mentioned 20 times and it therefore occupies the position of the least mentioned hashtags amongst the top ones selected in the graph. In addition, it might be noticed that the cluster of nodes gravitates around the army-related hashtags, the trend being fairly isolated. Therefore, the data collected on #peertopeer does not bring any value to the research, beside from understanding that it is a broadly used term, going beyond from the trend to other users’ spheres.
As already mentioned throughout the date interpretation, the scraped and visualised data endorses the findings of the trend analysis. As the main findings of the Google trend analysis has led to the conclusion of it being most popular in the US and Germany, the Graphs find themselves in accordance to it. The hashtags locating the geography of the posts to American and German cities can be seen in multiple graphs, such as: #uber, #collaborativeconspumtion and #shareconomy. Nevertheless, the fact that the graphs have been realised on the data collected within one day of Instagram activity and that they have resulted in hundreds of posts, reveals the fact that this is a discussed and current topic. Thus shareconomy can be indeed approached as a trend.
In addition, when it comes to analysing the graphs from the perspective of the literature review, they find each other in conformity. As already anticipated and endorsed by the findings of a research paper on this matter (discussed before in the literature review), there is indeed a change of paradigm and the graphs have revealed that through the user-generated data as well. According to Black and Cracau (2015), customers turn into generators therefore enabling an ongoing peer-to-peer exchange, as well as the optimisation of intelligence and resources. Graph 5 and Graph 6 emphasize on the shift in contrast to Graph 1, Graph 2, Graph 3 and Graph 4, where users were generating trend-related content from their role of consumers only.
Further, the profile of the shareconomy consumers as well as their traits, as revealed by the graphs strengthen the findings discussed in the literature review. As deducted from the work of Van de Glind (2013) the customers are driven by practical, social, environmental, financial motives, as well as by curiosity. Since ‘#bugdettravel’ was to be found in the visualisation of more than one package of data resulted from scrapping data from the selected hashtags, it is endorsed that indeed this is one of the motives for which customers opt for the trend. In addition, from the choice of correlated hashtags, the profile of the customer has been shaped into an adventurous, trips-lover backpacker, attributes that fit under the environmental and curiosity drivers.
Very relevant to mention is that the part of the literature review that can easily be connected to the graphs is the part dedicated to the customer motives, which is linked to the first sub-question. As anticipated from the planning of this research method, data scraping, visualisation and analysis can be related and therefore will bring input to the first sub-question, namely: ‘What are the motives that drive customers to use shareconomy businesses?’. The motives are not to be repeated, since they have already been mentioned in the paragraph above. However, although mainly related to the customer motives, the data gives valuable input on the profile of the customer, which for the second sub-question, represents an understanding of consumer behaviour. Based on this knowledge advantages as well as threats can be turned into recommendations towards the optimization of the traditional industries, in order to meet the new customer criteria and not to lose market share.
The findings of data scraping and Gephi visualisation can be related to the Netnography in terms of collection of data. While scraping gives us more quantitative data, netnograhy provides the research with more qualitative data. Both the methods should give insight on the customer motives, which connected to the second sub-question with regard to advantages and threats, they will endorse the recommendation that we will provide the traditional industries with the current ones.
Graph 1. #airbnb Instagram data
Graph 2. #blablacar Instagram data
Graph 3. #couchsurfing Instagram data
Graph 4. #uber Instagram Data
Graph 5. #collaborativeconspumtion Instagram data
Graph 6. #Shareconomy Instagram data
Graph 7. #peertopeer Instagram data