This wiki service has now been shut down and archived

New Kinds of Social Data: from blogs to administrative data

From ESIWiki

Jump to: navigation, search



Original Information for this workshop.

Workshop presentations

In summary form: its purpose is to address tensions between

a) privacy and disclosure concerns about data linkage

b) voluntary (typically inconsistent) disclosure of personal information e.g. in blogs

c) increasing amounts of administrative and commercial data of interest to researchers

d) worries about how such data might be preserved for future generations of researchers

One of the background questions to be addressed is whether there might be e-Science solutions which allow different types of data (at different levels of details hence disclosure risk) to be connected in ways which satisfy privacy issues yet allow meaningful research to be undertaken...

Meeting Report

Possible e-Science applications

(Extracted from the report above)

  • A1: Federating databases with 'patchy' data - disparate records, incomplete data and data quality issues.
  • A2: Turning the internet archives into a dynamic database (processor intensive).
  • A3: Secure virtual laboratories or secure data service for dealing with highly sensitive data at a micro level.
  • A4: Text mining of existing Mass Observation data (ca. 8000 essays).
  • A5: Data mining to detect legally contentious statements.
  • A6: Data mining for external validation of blog contents - to assist analysis.
  • A7: Tools for assessing disclosure risk - especially when extended to include matching with datasets external to the information system.

Research Prospects and Problems

  1. Blogs and data discovery
  2. Validity. How to use the information. Does it represent what it purports to represent?
  3. Blogs as research material: re-use of personal documents as social research data. Are they historical material going into archives, or ephemera with a shelf life that should be destroyed (qv medical data according to some ethicists). Qv the legal, ethical and ethical issues surrounding social science research.
  4. Blogs as research tools: part of the apparatus helping us do research and document what we have done.
  5. Controls on how we collect information: safe settings.
  6. Data stewardship and data citizens. Changing attitudes to privacy.

Next Steps

Literature Survey

Current Approaches to Data Mining Blogs by Francine Barone from the University of Kent.


Vignette A

Examine attitudes to Islam before and after 11 September attacks. To really make the results of data mining blogs sociologically revealing we would need to know age/sex and religious affiliation of the writers. So we need to link blogs to other information about the writers although the results may only be aggregate.

Vignette B

(To be constructed.)

Vignette C

(To be constructed.)

Areas to Explore

  • Collaboration with BBC Online e.g. on reactions to BBC website – comments and discussion from around the world which are responses or reactions to BBC activities.
  • Meet online with bloggers so that their questions and reactions feed into the meeting (web cast + blogs + messaging)
This is an archived website, preserved and hosted by the School of Physics and Astronomy at the University of Edinburgh. The School of Physics and Astronomy takes no responsibility for the content, accuracy or freshness of this website. Please email webmaster [at] ph [dot] ed [dot] ac [dot] uk for enquiries about this archive.