EFFICIENT AND SECURE STORAGE METHOD FOR LARGE SCALE FILE SERVERS UTILIZING CLIENT SIDE DE-DUPLICATION

4000.00

EFFICIENT AND SECURE STORAGE METHOD FOR LARGE SCALE FILE SERVERS UTILIZING CLIENT SIDE DE-DUPLICATION ( ELECTRICAL AND ELECTRONIC PROJECT TOPIC)

Abstract

According to a recent survey by Iternational Data Corporation [63], 75% of today’s digital data are duplicated copies. To reduce the unnecessarily redundant copies, the storage servers would handle duplication (either at a file level or chunks of data sized 4KB and larger). De-duplication can be managed both at the server-side and the client-side. In order to identify duplicated copies, it is required that files be un-encrypted. However users may be worried about the security of their files and may want their data to be encrypted. However encryption makes cipher text indistinguishable from theoretically random data, i.e., encrypted data are always distributed randomly, so identical plaintext encrypted by randomly generated cryptographic keys will very likely have different cipher texts which cannot be de-duplicated. In this research, a method that resolves the conflict between de-duplication and encryption is presented.



Chapter One

Introduction

1.1. Background

Currently, commercial large scale storage services including Microsoft Skydrive, Amazon and Google drive Storage have attracted millions of users. While data redundancy was once an acceptable operational part of the backup process, the rapid growth of digital content in the data center has pushed organizations to rethink how they approach this issue and to look for ways to optimize storage capacity utilization across the enterprise. Explosive data growth over the recent years has brought much pressure on the infrastructure and storage management.

The Flud backup system [4] and Google [6] etc can save on storage costs by removing duplication. According to a recent survey by IDC [63], 75% of today’s digital data are duplicated copies. To reduce the unnecessarily redundant copies, the storage servers would handle duplication (either at a file level or chunks of data sized 4KB and larger) by keeping only one or few copies for each file and making a link to the file for every user who asks to store the file, regardless of how many copies there are. The copies are replaced by pointers which reference the original block of data in a way that is seamless to the user, who continues to use a file as if all of the blocks of data it contains are his or hers alone.

Duplication can be managed both at the server-side and the client-side; client-side de-duplication is mostly known for effectively

  1. Reduced band width requirement
  1. Reduced storage space requirement
  1. Lower electric consumption (hence a greener environment)
  1. Lower overall cost of storage………….

EFFICIENT AND SECURE STORAGE METHOD FOR LARGE SCALE FILE SERVERS UTILIZING CLIENT SIDE DE-DUPLICATION ( ELECTRICAL AND ELECTRONIC PROJECT TOPIC)