Asia Web Watch

Methodology

Editor: Dr T.Matthew Ciolek tmciolek@ciolek.com

Est.: 1 Oct 1997. Last updated: 1 Mar 2002


Google
 
Web ciolek.com
 


|| Asia Web Watch main page || Index of all Tables || Terminology || Methodological Issues || General Internet Statistics || Web Databases Statistics || Asia Online Materials Statistics || Patterns of hypertext linkages among Asia-based web sites || References ||

Sources of Data and Methodological Issues

This site makes use of data derived from a number of sources.

Internet Statistics

Statistical information on the growth of the Internet as a whole, as well as the growth of one its major components, the WWW. These interlocking sets of figures were originally collected and published by Network Wizards (1977) and Zakon (1977) as well as by Gray (1996).

Altavista Statistics

Statistics obtained from a series of systematic English keyword searches directed to the Altavista database (Digital Corporation, 1997). Altavista is the world's second largest database of Web documents. Assuming that currently an average Web Server publishes approximately 49.5 documents one can estimate from the Table 1 that the entire universe of Web-based information consists of some 59.4 million online documents or pages. Armed with this information we can see that while the Excite system appears to be more complete (84% coverage of the world's web resources) it appears, to 'know' about fewer documents on Asia, than does the smaller (52% coverage) Altavista system.

Since this study is focused on Asia-related online documents, the Altavista database has been selected as the chief source of intelligence.

It must be noted that statistics derived from Altavista pertain only to English language documents. This is an important issue. The choice of English as the language of enquiry means that this paper excludes from its analyses approximately 10% of Altavista's Asia-related material, simply because it was produced in other languages. On the other hand, the decision to stick to material published in a single (and dominant) language has greatly expedited the task of gathering the replicable data.

The final methodological decision related to the use of Altavista was to convert raw statistics on a number of URLs (uncovered via keywords searches) into estimates of equivalent Web servers. In other words, every 49.5 pages, regardless of their actual provenance, were treated as a rough equivalent of one web server. Thus, for example, a figure of 460 servers dealing with Afghanistan was based on the finding that an Altavista query involving the keyword 'Afghanistan' generates links to 22,770 distinct pages (URLs) with that keyword. Also, a decision was made that all server statistics are to be rounded to the nearest ten units.

October Sample

Results of a statistical analysis of content, provenance, usefulness and other characteristics of a sample of scholarly or factual online information resources relevant to the South East Asian studies. A set of 270 web-sites has been extracted between the 23-26 October 1997 from a population of 3247 English language online documents known at the time of inquiry to the Altavista database (Digital Corporation, 1997). This relatively large population of potential links was generated through a query containing the string "South East Asian Studies".

The "October Sample" was arrived at through the quick weeding-out from the list of 3247 web links any materials which appeared to be The final sample of 270 documents, or 8.3% of the initial population of "South East Asian Studies" web links is, in fact, an outcome of a compromise between the need to finish the data collection before an inflexible deadline and the need to make the sample as large and as diverse as possible. In other words, the "October Sample" data may be interesting but they do not come from a systematic and comprehensive census.
Maintainer: Dr T.Matthew Ciolek (tmciolek@ciolek.com)

Copyright © 1997 by T.Matthew Ciolek. All rights reserved. This Web page may be freely linked to other Web pages. Contents may not be republished, altered or plagiarized. The www.ciolek.com editors do not control or endorse the content of third party Web Sites.

URL http://www.ciolek.com/Asia-Web-Watch/methodology.html

[See also: Aboriginal Studies || Asia Search Engines || Buddhist Studies || Ciolek - Research Papers || Global Timeline ||
|| Information Quality || Tibetan Studies || Trade Routes || Zen Buddhism
]