ABSTRACT
Performance analysis is the task of monitoring the behaviour of a program execution. The main goal is to find out the possible adjustments that might be done in order to improve the performance of the computer system in use. To be able to get that improvement, it is necessary to find the different causes/contributors of overhead. Today, we are already in the multicore era, but there is a gap between the level of development of the two main divisions of multicore technology (hardware and software). This project is focused on the issues concerning performance analysis, tuning of applications running specifically in a shared memory system and development of application that automatically extract system characteristics and configurations. This application is developed using OODM and implemented using C# programming language and can be used on any windows Operating System. The application developed from this project critically analyses multicore system, determine various causes of overhead in multicore environment,extracts system parameters and present various optimization strategies.
CHAPTER ONE
INTRODUCTION
1.1 Introduction
With computers playing an increasingly
critical role in our day-to-day lives, it is important to know their components
and how each works and of what impact they impose on performance of the
computer system.
According to
(Arnold, 1994) Computer performance is characterised by the amount of useful
work accomplished by a computer system compared to time and resources used.
Depending on the context, good computer performance is dependent on the
available system resources. Most computer users do not know the system
specification, they lack the knowledge of conventional way of extracting system
parameters but with the computerised system in this thesis (Otherwise known as
Autospec) every computer users will be able to determine the system
configuration by installing and running the software.
The System
development can be likened to building a house, this demands adequate planning
and preparation in order to meet the objectives of the proposed design.
The parameters or
the resources that are of interest in our analysis include the followings:
- Summary
- Operating
system
- CPU
- RAM
- Hard
drives
- Optical
drives
- Motherboard
- Graphics
- Network
- Audio
- Peripheral
- Performance
Performance
analysis is the task of investigating the behaviour of program execution
(Mario, 2009). The main aim is to find out the possible adjustments that might
be done in order enhance the performance of computer system. Besides, the
hardware architecture and software platform (operating system) where a program
is executed has impact on its performance. Workload characterization involves
studying the user and machine environment, observing key characteristics, and
developing a workload model that can be used repeatedly. Once a workload model
is available, the effect of changes in the workload and system can be easily
evaluated by changing the parameters of the model. This can be achieved by
using compiler directives such OpenMP multithread application. In addition,
workload characterization can help you to determine what’s normal, prepare a
baseline for historical comparison, comply with management reporting, and
identify candidates for optimization.
Presently,
multicore processors chips are being introduced in almost all the areas where a
computer is needed. For example, many laptop computers have a dual core
processor inside. High Performance
Computing (HPC) address different issues, one of them is the
exploitation of the capacities of multicore architecture(Mario, 2009).
Presently,
multicore processors chips are being introduced in almost all the areas where a
computer is needed. For example, many laptop computers have a dual core
processor inside. High Performance
Computing (HPC) address different issues, one of them is the
exploitation of the capacities of multicore architecture.
Performance
analysis and optimization is a field of HPC responsible for analysing the
behaviour of applications that perform big amount of computation. Some
applications that perform high volume of computations require analysing and tuning.
Therefore, in order to achieve better performances it is necessary to find the
different causes of overhead.
There are a
considerable number of studies related to the performance analysis and tuning
of applications for supercomputing, but there are relatively few studies
addressed specifically to applications running on a multicore environment.
A multicore system is composed of
two or more independent cores (or CPUs). The cores are typically integrated
onto a single circuit die (known as a chip multiprocessor or CMP), or they may
be integrated onto multiple dies in a single chip package.
This thesis examines the issues
involved in the performance analysis and tuning of applications running
specifically in a shared Memory and the development of a computerized system
for retrieving systems specification for possible changes. Multicore hardware
is relatively more mature than multicore software, from that reality arises the
necessity of this research. We would like to emphasize that this is an active
area of research, and there are only some early results in the academic and
industrial worlds in terms of established standards and technology, but much
more will evolve in the years to come.
Several
years, the computer technology has been going through a phase of many
developments. Based on Moore law, the speed of processors has been increasing
very fast. Every new generation of micro-processor comes with clock rate
usually twice or even much faster than the previous one. That increase in clock
frequency drove increases in the processors performance, but at the same time,
the difference between the processors speed and memory speed was increasing.
Such gap was temporarily solved by instruction level parallelism (ILP) (Faxen
et al, 2008). Exploiting ILP means executing instructions that occur close to
each other in the stream of instructions through the processor in parallel.
Though it appeared very soon that more and more cycles are being spent not in
the processor core execution, but in the memory subsystem which includes the
multilevel caching structure, and the so-called Memory Wall, problem started to
evolve quite significantly due to the fact that the increase in memory speed
didn’t match that of processor cores.
Very soon a new direction for increasing the overall performance of computer systems had been proposed, namely changing the structure of the processor subsystem to utilize several processor cores on a single chip. These new computer architectures received the name of Chip Multi Processors (CMP) and provided increased performance for new generation of systems, while keeping the clock rate of individual processors cores at a reasonable level. The result of this architectural change is that it became possible to provide further improvements in performance while keeping the power consumption of the processor subsystem almost constant, the trend which appears essential not only to power sensitive market segments such as embedded systems, but also to computing server farms which suffer power consumption/dissipation problems as well.