DESIGN AND IMPLEMENTATION OF A COMPUTERIZED LIBRARY CIRCULATORY SYSTEM (CASE STUDY OF IMT ENUGU)

4000.00

CHAPTER ONE

INTRODUCTION

The amount of documents in computerized library circulatory usually grows rapidly overtime. How to store, manage and search these documents within the computerized library circulatory is a challenging problem. Documents in computerized library circulatory are stored as semi-structured data, while in the traditional relational database it is stored as structured data. Relational database management system cannot manage semi-structured data efficiently and cannot satisfy the requirement of contentbased text retrieval. A lot of research works have been done about semi-structured data, such as data modeling, query language for text retrieval, index methods and text retrieval algorithms and similarity search algorithms. These research results have been used a lot in computerized library circulatory systems. SSREADER Computerized library circulatory, the National Computerized library circulatory and WanFang Database are popular online libraries in China. All the online libraries classify the documents into several classes and support querying inside a given class. Metadata search and full-text search through a single keyword or expressions are both supported in these online libraries. Other examples of online libraries are Greenstone computerized library circulatory, UC Berkeley Computerized library circulatory, Tufts Computerized library circulatory, ACM computerized library circulatory, NCSTRL etc. Similar functions are supported in these online libraries, such as metadata searching, full-text searching, documents classification and browsing. Greenstone computerized library circulatory has a suite of software that provides management to CLCS for creating and maintaining a computerized library circulatory.

Tufts Computerized library circulatory is for the integration of collections that exist or may be developed in the future. There is a system named Lore developed by Stanford. It is a database management system for managing semi-structured data. The NCSTRL at Cornell University is a distributed technical report library developed by the ARPA-sponsored Computer Science Technical Report Project. The NCSTRL collection is distributed among a set of interoperating servers operated by participating institutions. All of the online libraries described above do not support the following functions: structure and content-based queries, automatic entries of external documents and parallel document processing. The CLCS system described in this study has the following features.

(1) Generalization: It is essentially a general document database management system. It can be used to build online libraries for user needs and provides a suite of toCLCS to maintain it.

(2) Parallelism. CLCS uses a lot of processors to execute queries and manage documents, which improves both storage capacity and query efficiency.

(3) Structure and content-based retrieval. Users can query inside a document for an element, e.g. a chapter of a book, which not only allows users to propose for a more accurate query, but also reduce the information transmission workload in networks.

(4) Personalization. CLCS can query according to user’s interest and recommend documents relevant to user.

(5) Automatic external data entering. CLCS can combine with other search engines in finding and adding references automatically.

(6) Multi-format supporting. DL collects a lot of document resources including books, journal papers, proceedings etc. and supports document information retrieval for a lot of document formats.

(7) DLSQL query. CLCS defines a query language like standard SQL, named DLSQL. By using DLSQL, users can program and do all the operations in CLCS.

(8) Automatic document classification. It creates a classifier according to the sample documents loaded by the system manager and automatically classifies documents.

Architecture of CLCS: To meet the need of parallelization, we design the parallel and extendable architecture as shown in Fig. l, which is made up of three hierarchies: Client, Mediator and Server. Client: There are two kinds of users, end user and system manager. Any user connected to the Web can access the web pages of PDLS through URL. These users are called end users. End users submit queries through the query interface and wait for the corresponding query results from the system. System manager accesses the system in local area network. Only the system manager authorizes system creation and maintenance. No matter the query operation submitted by the end users or maintenance operation submitted by the system manager, the Client accepts the operation, transmits it to DLSQL(which is the query language of CLCS), sends it to the Mediator and accepts the query results returned from the Server and displays them to the user

DESIGN AND IMPLEMENTATION OF A COMPUTERIZED LIBRARY CIRCULATORY SYSTEM (CASE STUDY OF IMT ENUGU)