Sunday, January 31, 2010

How "Birds of a Feather Flock Together" on Online Social Spaces

This post is among my first attempts to talk about my research and a specific problem of interest to a generic audience. Your comments are welcome!

A primary domain of interest to social researchers through several decades has been the study of interpersonal communication among groups of individuals. Communication is central to the evolution of social systems. Hence the monotonic surge of interest springs from the potential of such communication impacting variegated social processes: such as propagation of influence, evolution of communities and so on.

Typically studies geared towards understanding these social processes via communication, until a few years back, have essentially been cross-sectional in nature, often based on participant observations and surveys on relatively small sets of people. However, the advent of the "social web" over the past decade is providing researchers with newer ways to validate their hypotheses on large-scale data. For example, the Web 2.0 technologies today have provided considerable leeway to a rich rubric of platforms that promote multifaceted user interactions on shared spaces. The resultant impact of these plethora of social websites such as Flickr, YouTube, Twitter, Digg, Facebook and the Blogosphere have been widespread. Right from shopping a new car, to getting suggestions on investment, searching for the next holiday destination or even planning their next meal out, people have started to rely heavily on opinions expressed online or social resources that can provide them with useful insights into the diversely available set of options.

My research spans the study of large-scale social processes on such online platform, an area that has popularly begun to be known as "computational social sciences". This post deals with the specific problem of understanding, modeling and analyzing how information propagates in a "network" of individuals, via a certain mode of communication. Today, because electronic social data can be collected at comparatively low cost of acquisition and resource maintenance, can span over diverse populations and be acquired over extended time periods, it provides a rich and broad test bed to understand the social process of information diffusion.

Specifically, in this context, I present the impact of the "homophily" principle on information diffusion in social media. The homophily principle states that users in a social system tend to bond more with ones who are "similar" to them than ones who are dissimilar. Hence homophily structures networks: people's ego-centric social networks are often homogeneous with regard to diverse social, demographic, behavioral, and intra-personal characteristics or revolve around social foci such as co-location or commonly situated activities. The existence of such homogeneity, i.e. homophily is therefore likely to impact the information these individuals receive and propagate and the communication activities they engage in.

In our work, we consider communication occurring via posts on the popular micro-blogging service Twitter and investigate the relationship between homophily among users and the social process of diffusion. We particularly study four kind of contextual attributes on Twitter: location, activity behavior, social role and activity distribution. Thereafter we predict diffusion characteristics under homophily on these attributes based on a novel probabilistic framework. Our experimental results on a large dataset from Twitter have been promising, and reveals how "similarity breeds connection" in a social network.


Xiaoju said...

This is very interesting and promising line of research, not to mention the excitement of observing the real-time development of the whole system. 'Homophily" is definitely one of the well-observed phenomenon and, I think, the key in the concept is the granuality of or dimensions of homophily. Person A can well be similar with person B on a certain attribute, and differ from person B on others. Of course, we can observe correlations among different attributes. At the same time, would different platforms render different weights on the set of attributes? Also, how would online and offline attributes and behaviors interact (e.g. network based on physical location vs. network based on online network)? These are all exciting questions. Looking forward to you thesis!!

karan said...

you complicate things.