Matthew Edwards, Stephen Wattam, Paul Rayson and Awais Rashid


Identity  resolution  capability  for  social  networking profiles  is  important  for  a  range  of  purposes,  from  open-source intelligence  applications  to  forming  semantic  web  connections. Yet  replication  of  research  in  this  area  is  hampered  by  the  lack of  access  to  ground-truth  data  linking  the  identities  of  profiles from different networks. Almost all data sources previously used by researchers are no longer available, and historic datasets are both  of  decreasing  relevance  to  the  modern  social  networking landscape  and  ethically  troublesome  regarding  the  preservation and  publication  of  personal  data.  We  present  and  evaluate  a method  which  provides  researchers  in  identity  resolution  with easy access to a realistically-challenging labelled dataset of online profiles, drawing on four of the currently largest and most influ-ential  online  social  networks.  We  validate  the  comparability  of samples drawn through this method and discuss the implications of this mechanism for researchers as well as potential alternatives and extensions.

Date: April 2016 Published: 2016 IEEE International Conference on Big Data (Big Data) Publisher: IEEE Publisher URL
Full text
DOI: 10.1109/BigData.2016.7840645