APPLICATION OF FUZZY LOGIC TO DOCUMENT ARCHIVING

4000.00

APPLICATION OF FUZZY LOGIC TO DOCUMENT ARCHIVING

 

CHAPTER ONE

1.0     INTRODUCTION

Logic in its literal meaning could mean the ability of a system to make a rational decision which can be regarded as the theory of reasoning in decision making. Mathematically, logic generates two results which can be either TRUE or FALSE, 0 or 1, ON or OFF, or any other applicable representation. This concept is referred to as Boolean logic.

Unfortunately, Boolean logic has its limitations. This is due to the fact that it is limited to a set of (0, 1) only, meaning Boolean logic is too precise. This also means that a condition can either be true or false only. For example, Boolean logic cannot differentiate between something that is “good” and that which is “very good”. This limitation is being eliminated by the concept of fuzzy logic.

Fuzzy logic is a branch of logical systems and artificial intelligence. Although it has being studied since 1920,as infinite-valued logics notably by Łukasiewicz and Tarski, the concept was fully developed in 1965 by Lofti A. Zadeh in one of his seminar works regarded as the "fuzzy set theory”. Fuzzy logic is a kind of logic that allows for imprecise or ambiguous answers to questions, forming the basis of computer programming designed to mimic human intelligence (Microsoft Encarta Encyclopedia, 2009). Unlike Boolean logic, fuzzy logic extends its set elements to [0.0, 1.0] and appliesmembership function to each of the elements contained in the set.

From the above, it could be seen that fuzzy logic compared to Boolean logic, is more complex and it is not too precise, giving a wider range of results to a condition. Rather than mere producing true of false, fuzzy logic can produce very true, true, false, very false. This concept is regarded as degree of truth, where; 0.0 is represented as absolute falseness, and; 1.0 is represented as absolute truth.

Before we go deeper into fuzzy logic, we should not neglect a concept known as defuzzification. Defuzzification is the process of producing a quantifiable result in fuzzy logic, given fuzzy sets and corresponding membership degrees. It is typically needed in fuzzy control systems. These will have a number of rules that transform a number of variables into a fuzzy result, that is, the result is described in terms of membership in fuzzy sets. For example, rules designed to decide how much pressure to apply might result in "Decrease Pressure (15%), Maintain Pressure (34%), and Increase Pressure (72%)". Defuzzification is interpreting the membership degrees of the fuzzy sets into a specific decision or real value.

Fuzzy set theory defines fuzzy operations on fuzzy sets. It uses the feature of human decision making using levels of possibility in a number of uncertain/fuzzy categories. Therefore, fuzzy logic uses IF – Then – Else constructs in the format:

                        IF variable IS property THEN action

The AND, OR and NOT Boolean logic operators are also used in fuzzy logic usually referred to as MAXIMUM, MINIMUM and COMPLIMENT. They are also referred to as the Zadeh operators. These operators are defined as:

-       AND: If Xa is a member of set a, for a measurable variable Xband is a member of set b, for another measurable variable, then the fuzzy AND will be:

A and B = min(X(a), X(b)) or

Xa and b = Xa ^ Xb = Xa * Xb = min (Xa, Xb)

-       OR: If Xa is a member of set a, for a measurable variable Xb,and is a member of set b, for another measurable variable, then the fuzzy OR will be:

A or B = max(X(a), X(b)) or

Xa or b = Xa ˅ Xb = Xa + Xb = max (Xa, Xb)

-       NOT: For member of set Xa, the fuzzy NOT will be:

NOTa = 1 – X(a) or

X not a = 1 – X(a) = ¬Xa

Fuzzy logic has being applied in many areas which include; medicine, engineering equipment, databases, archives, etc. The application of fuzzy logic in archives is a branch of information retrieval system.

Archiving is a process of compressing large files or data for long term storage. Data of archives usually consists of compressed files having extensions either .zip, .rar etc. Archives mostly contain very old files that are not needed for daily processing but only for reference purposes.An archive is a collection of records containing primary source documents over an individual or organization’s lifetime.

Archiving has many advantages like performance improvement, availability of storage space, reduced maintenance costs, etc. Though, archiving has advantages, organizations cannot archive as they please. An organization needs to have data on the database to a certain period of time before it is archived in order to meet some legal and government requirements.

-       An efficient data archiving process can be far more cost effective than using the traditional method of simply adding more storage (disks) and servers.

-       Data archives can be used to retrieve information at a later stage if a suspected misdemeanor or criminal act has been suspected. This has become particularly important over recent years due to many incidents of criminal activities, such as drug dealing using companies’ computer resources and even issues around terrorist activities.

-       Data archiving systems can compress the information thereby reducing the storage requirements of an organization.

-       Data or content archiving systems may automatically ensure that documents or records are not duplicated.  Again, the replication of the same information can be a massive overhead on an organization’s resources.

-       Mitigation of breaching regulations.  Implementing a data archiving system minimizes the risk of being in breach of key codes of practice and other legislation.

The archived data can be made available upon request. In order to make the archived data available it has to be re-loaded in to the online database. But, with NetWeaver 2004s, a new method of archiving called NearLine Storage has come into existence. NearLine Storage acts as an intermediate solution between a traditional archiving and an online database. Using NearLine Storage would allow us to have access to the archived data without the need of reloading the data to online database. There are two types of archives:

-       On Line Archiving:is a system whereby the archive system is physically attached to an organization’s network at all the time. It has the benefit in that it is efficient and allows for fast access of archived material and the archiving process can be automated.

-       Off Line Archiving: is a system whereby an IT manager would have to archive information from a computer network and then physically move that information to a separate system for retention.  The drawback with this is the time and labor required in order to complete this task and also if someone needs to access some archived data, the whole procedure would have to be repeated in reverse.

1.1     PROBLEM DEFINITION

As said earlier, fuzzy logic for archiving purposes is a branch of information retrieval systems. In general, we are faced with the problem of the selection of documentary information from storage in response to search questions (G. Klir et al, 1995). Since Archiving is a very compact way of storing data such that the problem of disk and space management is being reduced, we shall be concerned with the storage, representation, organization and access of information items. The below elaborates more on the possible problems to be encountered with fuzzy logic in archives:

-       Although memory wastage is not too much of a concern in archiving system, Archival storage capacity is always a concern since data is, as mentioned above, generally immutable and cannot be deleted until the retention period expires. This requires careful capacity management to ensure that the archive does not run out of space.

-       Archives can literarily contain hundreds of gigabytes of unique data making location of files tedious and time consuming. Therefore, a powerful indexing and search capability is required.

-      Data duplication could be a very disturbing obstacle in archives such that redundant data could exist in the archive for a longer period of time than expected and can lead to data inconsistency.

-       The retrieved documents have to be ranked in order of their significance with respect to the user query.

-       Inability to clarify the degree of usefulness of a document in an archive.

1.2     AIMS & OBJECTIVES

Due to the difficulties encountered in maintaining archives, and also inability to classify documents properly with their level of significance and membership functions, the aims of this research would be:

-       Matching mechanism is softened to a partial matching: computes the degree of relevance of each document to the user query, on the basis of membership values of the query term in document representations.

-       Proper data representation to differentiate properly which data belongs to which set (archive) and also use fuzzy logic operations to note which is a member, a partial member, not a member etc.

Project information