April 30, 1999
Electronic Network Consortium
Development and Operation of the Next-Generation Rating/Filtering System on the Internet
- For an effective provision for children against inadequate information -
Since November 1996, the Electronic Network Consortium (ENC) has been promoting the wide use of a filtering function as a provision for children against inadequate information on the Internet. Together with the New Media Development Association, a nonprofit foundation that functions as its secretariat, the ENC developed a filtering system conforming to the PICS specification (http://www.w3.org/PICS/), and has been operating a database system (a label bureau complying with the PICS specification) for web page rating.
Following development work started last year, the ENC has recently completed a next-generation rating/filtering system designed to provide more effectively for children against inadequate information on the Internet. Utilizing its wide experience in this field, the ENC will start operations of the new system on May 1. The new system was developed by the New Media Development Association as part of the "Project for Developing and Demonstrating Leading-edge Information Systems" being implemented by the Information-technology Promotion Agency with a financial aid from the Ministry of International Trade and Industry. The features of the system are as follows:
The system features a semi-automatic rating system which effectively rates and labels content through not only the use of keyword analysis of text but also the use of the similar image recognition method for photographs. With this system, it is possible to compile a database of more than 200,000 URLs (Uniform Resource Locators, i.e., web page addresses) on the Internet.
The system can link with other label bureaus and convert labels based on different labeling standards. With this function, the labeling coverage percentage on the Internet can be substantially raised.
Development of a server-type filtering system complying with the PICS Rules specification (http://www.w3.org/TR/REC-PICSRules/), and provision of the filtering function as a public proxy server and its distribution as a free software. These features facilitate the installation of the filtering system for users with a large number of computers, such as for teachers at schools, and for users which use browsers non-compliant with the PICS specification.
1. Background leading to system's development
The Internet owes its present expansion to open and free use, and today comprises tremendous quantities of information and tens of millions of users throughout the world. Even now it continues to grow from strength to strength. However, by giving easy access to global information, it also allows the circulation of illegal and harmful information (hereinafter called "inadequate information"), causing serious social issues.
The filtering function, by enabling Internet users to receive information selectively in accordance with their requirements, is a user-friendly information system that guarantees the users' right to know and at the same time protect their children from unwanted information, while respecting the rights and freedom of information providers.
In order to give a solution to such social issues, the Electronic Network Consortium has been promoting the use of the filtering function since November 1996. Together with the New Media Development Association, which functions as its secretariat, the ENC developed a filtering system that conforms to the PICS specification, and has been operating a database system (a label bureau complying with the PICS specification) for web page rating since September 1997.
To show the extent of the success of the filtering system at that time, the database of the label bureau exceeded 20,000 URLs and the filtering software installed in personal computers was used by more than 2,000 educational institutions, business enterprises, etc.
In contrast, the structuring of a database in the past required a page-by-page visual examination of web pages and made it very difficult to cope with the daily increasing quantities of inadequate information. Moreover, the filtering software depended on the browsers used and the different versions of operating systems, and this restricted the environment in which the filtering system could be utilized. In addition, in places where a large number of personal computers were used, such as at schools, the installation and setup of filtering software in the different computers entailed a lot of time-consuming effort for school teachers. Consequently, there was a strong demand from users for the system that could improve these items.
Improving these items and creating a more effective provision to inadequate information was the aim of development on a next-generation rating/filtering system. The next-generation rating/filtering system consists of the rating system and the filtering system. This was carried out by the New Media Development Association, which acts as the ENC's secretariat, as part of the "Project for Developing and Demonstrating Leading-edge Information Systems" being implemented by the Information-technology Promotion Agency with a financial support from the Ministry of International Trade and Industry.
2. Features of the next-generation rating system
The developed system is a semi-automatic rating system that utilizes not only keyword analysis of text but also similar image recognition method for photographs to carry out effective rating and labeling work. In addition, it facilitates the development of a label bureau conforming to the PICS specification, which permits linkage with other label bureaus and is equipped with a function for converting labels based on different rating standards. As a result, the system can quickly respond to new information on a daily basis and expand the coverage rate of inadequate information on the Internet.
(1) Automatic rating capability with not only text but also images
Textual information contained in web pages is referred to weighted keyword lists classified by category (presently, inadequate words number about 6,000 and relevant words about 26,000) and analyzed to determine a rating value. It is possible to raise the degree of automation by renewing the keyword lists through visual confirmation work.
In order to effectively rate images on web pages, a rating value is calculated on the basis of a search with a similar image database (currently about 2,000 images) and then image information is "thumb-nailed" (the size of an original image is reduced in order to improve a glance). Finally a rating value is determined effectively with the calculated rating values and thumb-nailed images by visual examination. The quality of automatic rating can be improved by adding more images to the database.
In order to prevent obsolescence of the label database, URLs corresponding to label information stored in the database are accessed regularly to examine the dates of renewal and changes in size, and if there is a change, such URLs can be made objects of the automatic rating system.
The listing of URLs selected as objects of automatic rating can be made by clipping the results from main search engines, robot search and other systems.
(2) Cooperation capability with other label bureaus
It is not possible for a single label bureau to cover the enormous number of web pages on the Internet throughout the world. If an inquiry is made about the rating of a web page, the label bureau operated by the ENC can respond to the inquiry by referring to other label bureau complying with the PICS specification in case it does not have data on the rating value of the web page. Moreover, it features a function for converting labels using a different rating standard.
In the system, the ENC follows a rating standard (SafetyOnline) expanded from the standard (called RSACi which is publicly used in the US) of the Recreational Software Advisory Council (RSAC), a nonprofit organization in the US.
(3) A fast response and high reliability system
The label bureau consists of two or more UNIX machines, ensuring a fast response as the load is distributed among the machines. Should trouble come in one machine, the other UNIX machines operate in its place to maintain the overall reliability of the label bureau.
3. Features of the next-generation filtering system
To meet the needs of schools and relatively large organizations, the next-generation filtering system is a server-type filtering system (SFS) which facilitates installation and use with large numbers of personal computers at schools, and for users of Netscape and other browsers which are not compliant with the PICS specification. The main features of the filtering system are as follows.
(1) Filtering by a proxy server
To enable schools and relatively large organizations to implement filtering capability easily, filtering is done by a proxy server. With this system, there is no need to set up and manage the filtering function separately for each PC.
The SFS is described by the Java language, and can be operated with any computer environment provided with JDK 1.1.6. It has been confirmed that it operates on the Linux and Solaris operating systems. (It will operate on Windows NT.)
It responds not only to the viewing of web pages but also to the file transfer and Newsgroups.
(2) Provision of a profile bureau for sharing and installing profiles
As the SFS conforms to the PICS Rules specification, it is necessary for access managers, such as teachers and parents, to set profiles for describing the filtering values and policy. This description work, however, is not easy. In view of this, a profile bureau for sharing and installing profiles is provided. The profile bureau is provided with profile models classified by age and application, which access managers can download for their use.
(3) Rating by administrators, third parties and information providers
The SFS permits rating by the following three raters: (a) rating by SFS administrators (for instance, school teachers), (b) rating by third parties (for instance, the ENC's label bureau), and (c) rating by information providers themselves (for instance, the playboy.com's page). In case two or more ratings exist, they are adopted in the order of priority: (a), (b), and (c).
4. Future Plans
(1) Expansion of label database and assistance in constructing label bureaus
The ENC plans to expand the scale of its database by regularly rating inadequate information through the use of the developed semi-automatic rating function. The present scale of the database is about 30,000 URLs, which the ENC plans to increase to 200,000 URLs by the summer of this year.
Rating itself depends on the subjectivity of human values and is based on a standard derived from a subjective system of values. It is desirable that there will be several label bureaus that follow different subjective systems of values, and there is a possibility that private organizations in Japan will launch label bureaus in the future. The ENC, by supplying software and other technical assistance, will make continued efforts so that many label bureaus based on different systems of values can be established and provide services.
(2) Provision of a public SFS server and the SFS as a free software
A public SFS proxy server will be provided on the Internet for a test purpose and for users of Netscape and other browsers which are not compliant with the PICS specification (http://pops.pics.enc.or.jp:8180).
In addition, to accelerate the widespread use of this system in educational institutions and other organizations which have an urgent need for it, the New Media Development Association is distributing the SFS software and tools (redirectors) that are designed to prevent a change in the proxy setup for browsers. However, in view of the fact that the SFS operates on the UNIX environment and technical support is needed for its installation in computers, the tools will not be distributed on the web page for the time being, and is obtained from the ENC only through a request by mail.
(3) Linkage between other label bureaus and international cooperation
As the quantities of information on the Internet are tremendously large, it is impossible for a single label bureau to rate information throughout the world. It is therefore necessary in the future to establish an international distribution system as in the domain name system. For this purpose, the ENC will call on overseas label bureaus for cooperation.
Furthermore, it is necessary to develop a common international rating standard. The Internet Content Rating Association (ICRA), which inherited the intellectual property of the RSAC, has started studies on a global rating standard which will permit objective content description free from cultural values of particular countries and which will enable users in different countries to apply their values in selecting content on the Internet. As a founding member of the ICRA, the ENC will endeavor to contribute to the establishment of a rating standard not only for Japan but also for other countries in the Asian region.
The Next-Generation Rating/Filtering System on the Internet
About the Electronic Network Consortium (ENC)
The Electronic Network Consortium was established in October 1992 for the purpose of promoting online services in Japan and comprises 84 corporations, including leading online service providers, Internet service providers, computer manufacturers and software houses, 14 special individual members including experts, and 51 local governments interested in public networks. The New Media Development Association, a nonprofit foundation, acts as the consortium's secretariat.
Please address your inquiries to:
Electronic Network Consortium
23F, Mita Kokusai Bldg., 1-4-28 Mita, Minato-ku, Tokyo 108-0073, Japan
Persons in charge: Shimizu, Kokubu
Tel. +81 3 3457 0672
Fax +81 3 3451 9604