Revised on Nov. 23, 2001


Due to the rapid expansion of the Internet, it has become possible for ordinary end users to obtain many kinds of information easily. However, it is still difficult for them to use the network effectively. For example, although mirror servers and cache servers have been provided in order to improve scalability and response times, it is difficult for users to identify the optimal server.

To solve this problem we have already proposed a "URL Resolver" (called Alternative Resource Access Information Navigator - ARESAIN) framework, which allows users to select the optimal server from multiple servers that provide any kinds of services via data storage facilities such as caches or mirror servers [1]. To enable users to select one of the servers, it is first necessary to gather information such as a list of servers that might be useful to the user. Initially, we focused on information related to mirror sites or similar sites, and have proposed a basic algorithm for detecting similar web sites by focusing on the link information embedded in web pages [2]. As a result of verifying the basic effectiveness of that algorithm, we then found that there are some sites that are thoroughly adequate for use as substitutes yet have a degree of similarity of no more than 50%. This type of Web pages are called "near-similar" in this research. But to identofy this type of pages in practice, it is necessary to employ a mechanism for automatically judging whether sites for which a low degree of similarity has been detected are mirrors or near-similar Web sites that can actually be used instead of mirrors. So, in this research, we investigate an automatic determination algorithm, in which web pages are divided into hub-type and content-type and the judgment is done based on the results of judgment algorithms specifically tailored to each type of web page. Initial trials of this approach have yielded favorable detection results. We also focus on the operation of this detection methodology and propose an algorithm for effectively finding candidates for similar web sites by using the user's access history to the Internet.


[1] T. Hirotsu, T. Takada, S. Kurihara and T. Sugawara, "ARESAIN - Alternative Resource Access Information Navigator," Proceedings of of hte IASTED Int. Conf. on Parallel and Distributed Computing and Systems, Anaheim, California, USA, pp. 7 - 12, Aug. 21 - 24, 2001.

[2] S. Kurihara, T. Hirotsu, T. Takada, and T. Sugawara, "Mirror Site Navigator using Link Information," Proc. of World MultiConf. on Systemics, Cybernetics and Informatics (SCI2000), Vol. IV (Communications Systems and Networks), pp. 283 - 290, 2000.

Project Members

Toshiharu Sugawara
Toshihiro Takada
Osamu Akashi
Satoshi Kurihara
Toshio Hirotsu

