Personalised Taste Profiling in Short-Text Microblogs
Wandabwa, Herman Masindano
MetadataShow full metadata
The objective of this thesis is to develop diverse and user-representative methods for taste profiling in short-text microblog users. The proposed methods are entirely based on the disseminated content, social network structure and their variations over time. Inferring user interests and subsequent formulation of taste profiles is pertinent in personalizing content recommendations for micro-blogging services as well as in extraction of users with similarities in preferences. The methods are broadly divided into two categories: i) short-text analytics methods (Part I, Chapter 3) and ii) user interest identification and quantification (taste profiling) over time (Part II, Chapters 4,5 and 6). With the proposed method in Part I, it is possible to accurately extract knowledge from short texts, a usually difficult process due to the unconventional language on such platforms. As a case study, a semi-supervised modelling framework is proposed based on tweets metadata in extraction of better topical representations of short texts. In the findings, topical vectors from semantically relevant long texts made shorter and otherwise noisy texts more interpretable. The built models generated better results in terms of topical classifications compared to similar approaches. The methods in Part II largely support the detection of user interests and subsequent modelling of taste profiles. As case studies, several approaches were proposed in identifying and quantifying short-text microblog users’ interests. A neural network-based approach was proposed in the computation of user interests in a specific topic as part of the process to identify relevant users for follow-back feature in certain domains. In addition, a soft clustering method was proposed to identify user interests in several topics and to certain levels. Lastly, the time dependency factor in interest decay and gain in such microblogs was modelled. This mirrored a conventional short-text microblogging platform where content is volatile based on for example the prevailing news at the time. Twitter was used as the testing platform for the proposed approaches mainly because of its popularity, API access ability as well as the temporal-dynamism of its overall network structure. This research is fundamental to services, content recommendations and audience measurement.