|
Conférenciers invités
1 - DBLP: A medium sized graph
Michael Ley Department for Databases and Informationsystems, University of Trier, Germany
Abstarct. DBLP now is one of the most popular portals for computer science publications. End of September 2006 the 800000th bibliographic record was entered into DBLP. The records are very simple descriptions of published papers: Usually they contain only information about the title, the authors, the journal or conference proceedings, and the URL of the online version if available. The complete collection of bibliographic records may be downloaded as a single XML file. These DBLP raw data became a "standard" benchmark to test new algorithms or methods in the data base area and beyond. We are aware of more than 150 papers which report some experiments with dblp.xml — e.g. in the proceedings of the VLDB conference 2006 you may find 11 papers which use the DBLP data. The main part of the talk describes the DBLP data in some detail. We view the data as a medium sized graph which has some interesting properties. To keep data quality on a reasonable level we use the coauthor graph to locate spelling variants of author names [2]. The coauthor graph is a typical example of a social network which may be analysed more deeply, recent examples are [1] and [3].
References :
- L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan. Group Formation in Large Social Networks: Membership, Growth, and Evolution. Proc. 12th ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 2006
- Michael Ley, Patrick Reuther: Maintaining an Online Bibliographical Database:
The Problem of Data Quality. pp. 5-10 in: Gilbert Ritschard, Chabane Djeraba
(Eds.): Extraction et gestion des connaissances (EGC'2006), Actes des sixièmes
journées Extraction et Gestion des Connaissances, Lille, France, 17-20 janvier
2006, 2 Volumes. Revue des Nouvelles Technologies de l'Information RNTI-E-6
Cépaduès-´Editions 2006, ISBN 2-85428-718-5
- Xiaoxin Yin, Jiawei Han, Philip S. Yu: LinkClus: Efficient Clustering via Heterogeneous Semantic Links. VLDB 2006: 427-438
2 - Stochastic Modelling of the Web
George Loizou Department of Computer Science, Birkbeck College, University of London, UK
Abstract. Recently several authors have proposed stochastic evolutionary models for the growth of the Web graph and other networks that give rise to power-law distributions. These models are based on the notion of preferential attachment, leading to the "rich get richer" phenomenon. We present a generalization of the basic model by allowing deletion of individual links and show that it also gives rise to a power-law distribution. We derive the mean-field equations for this stochastic model and show that, by examining a snapshot of the distribution at the steady state of the model, we are able to determine the extent to which link deletion has taken place and estimate the probability of deleting a link. Applying our model to actual Web graph data provides evidence of the extent to which link deletion has occurred. We also discuss a problem that frequently arises in estimating the power-law exponent from empirical data and a few possible methods for dealing with this, indicating our preferred approach. Using this approach our analysis of the data suggests a power-law exponent of approximately 2.15 for the distribution of inlinks in the Web graph, rather than the widely published value of 2.1. The flexibility of the proposed model, based on the classical Urn Model, is such that it can easily be adapted to model the P2P paradigm and much more!
|
|