CHAPTER
ONE
1.0
INTRODUCTION
Bioinformatics is the science of storing, extracting, organizing, analyzing, interpreting and utilizing information from biological sequences and molecules (Khalid, 2010). Bioinformatics is often defined as the application of computational techniques to understand and organize the information associated with biological macro-molecules (Luscombe et al., 2001). It has been mainly fueled by advances in DNA sequencing and mapping techniques (Khalid, 2010). Over the past few decades, rapid developments in genomic, other molecular research technologies and information technologies have combined to produce a tremendous amount of information related to molecular biology. The primary goal of bioinformatics is to increase the understanding of biological processes (Khalid, 2010).As biology is increasingly becoming a technology-driven science, databases have become indispensable to store not only data, but also the results of experiments generated by different research projects around the world (Hey et al., 2009). A biological database is a collection of information, or data from a biological system, stored in a computer readable format. Some databases are also called data repositories if they function as a place where large biological datasets can be stored and retrieved by users. Sharing of data between scientists accelerates the speed of discoveries and has the potential to greatly advance a scientific field as a whole (this is known as the Fourth Paradigm of Data-Driven Scientific Discovery (Hey et al., 2009). There are two types of biological databases: public databases that are freely accessible on-line, and private databases that require payment before you can access them (Dutilh and Keșmir, 2016).
The genome of a species encodes genes and other
functional elements, interspersed with non-functional nucleotides in a single
uninterrupted string of DNA (IHGSC, 2001).
Recognizing protein-coding genes typically relies
on finding stretches of nucleotides free of stop codons called Open Reading
Frames (ORFs) that are too long to have likely occurred by chance. Since stop
codons occur at a frequency of roughly 1 in 20 random sequence, ORFs of at
least 60 amino acids will occur frequently by chance (5% under a simple Poisson
model), and even ORFs of 150 amino acids will appear by chance in a large
genome (0.05%). This poses a huge challenge for higher eukaryotes in which
genes are typically broken into many, small exons (on average 125 nucleotides
long for internal exons in mammals (IHGSC, 2001).
Some regions within a protein sequence are more
conserved than others during evolution (Dutilh and Keșmir, 2016). These regions
are generally important for the function of a protein and/or the maintenance of
its three dimensional structure, or other features related to its localization
or modification. By analyzing constant and variable properties of such groups
of similar sequences, it is possible to derive a signature for a protein family
or domain, which distinguishes its members from other unrelated proteins by
sequence alignment, which allows us to discover these signatures (Dutilh and
Keșmir, 2016). Sequence alignment is defined as the bioinformatics task of
locating equivalent regions of two or more sequences, and aligning their
nucleotide or amino acid residues side by side, to maximize their similarity
(Dutilh and Keșmir, 2016). Multiple sequence alignments allow for
identification of conserved sequence regions. This is very useful in designing
experiments to test and modify the function of specific proteins, in predicting
the function and structure of proteins, and in identifying new members of
protein families (Dutilh and Keșmir, 2016).
DNA Sequencing is a technique/method by which the exact order of nucleotides within a DNA molecule is determined (Mayor et al., 2000). Comparative data analysis provides the opportunity to determine what is shared and what is
unique to each species (Mayor et al.,
2000).
Growth in animals is controlled by a complex
system, in which the somatotropic axis plays a key role. The genes that operate
in the somatotropic axis are responsible for the postnatal growth, mainly GH
that acts on the growth of bones and muscles mediated by IGF-1 (Sellier, 2000).
The growth hormone (GH) and insulin-like growth factor 1 (IGF-1) genes are
candidates for growth in bovine, since they play a key role in growth
regulation and development (Hossner et al.,
1997; Tuggle and Trenkle, 1996). Effects of GH on growth are observed in
several tissues, including bone, muscle and adipose tissue. These effects
result from both direct action of GH on the partition of nutrients and cellular
multiplication and IGF-1-mediated action stimulating cell proliferation and
metabolic processes associated to protein deposition (Boyd and Bauman, 1989).
IGF-1 stimulates protein metabolism and is important for the function of some
organs, being considered a factor of cellular proliferation and differentiation
(Andreaet al., 2005). Polymorphisms
in GH gene have been used as a genetic marker associated with different
performances and productions traits such as body weight, birth weight and weaning
weight in goat (Wickramaratne et al.,
2010), The rabbit GH gene has already been sequenced by Wallis and Wallis
(1995) and has been investigated as a gene associated with market weight of
commercial rabbit (Fontanesi et al.,
2012). Mutations of this GH gene have been described in goats (Malveiro et al., 2001), and poultry (Feng et al., 1997) to affect important
production traits.
In chickens divergently selected for high or low growth rates, there were significantly higher IGF-1 mRNA levels in the high growth rate line than in the low growth rate line (Beccavin, et al., 2001). The growth hormone receptor (GHR), insulin-like growth factor-1 (GH-IGF-1) system controls the number of follicles in animals that are recruited to the rapid growth phase (Roberts et al., 1994; Monget, et al.,
2002). It is also known that the GH-IGF-1 system has been modified as a result
of selection for enhanced growth rate (Ballard et al., 1990; Ge et al.,
2001). The insulin-like growth factor gene (IGF1) is a candidate gene for
growth, body composition and metabolism, skeletal characteristics and growth of
adipose tissue and fat deposition in chickens (Zhou et al., 2005). Earlier research on GHR, IGF-1 and IGFBP-3 in
cattle, goats and chickens showed genetic polymorphisms and their association
with production traits (Liu et al.,
2010). The IGF1 gene is essential for normal embryonic and postnatal growth in
mammals (Bian et al., 2008).
Myostatin (MSTN), previously called Growth
differentiation factor 8 (GDF8), is a member of transforming growth factor-β
(TGF-β) superfamily. It is a negative regulator for both embryonic development
and adult homeostasis of skeletal muscle (Tu et al., 2014). Myostatin (MSTN) is a negative regulator of the
muscle growth factor, which belongs to the transforming growth factor beta
superfamily (McPherron et al., 1997).
It is able to negatively control the growth of muscle cells by inhibiting the
transcriptional activity of MyoD family members. Its expression is negatively
correlated with muscle weight (Weber et
al., 2005). Mutations in the myostatin gene have also been shown to cause
doublemuscling in humans and other
species (Clop et al., 2006). These
findings suggest that strategies for inhibiting myostatin function may be
applied to improve animal growth. Homozygote and heterozygote cattle with
mutations of the MSTN gene-conserved Ribbon bases exhibit the advantage of
strong muscle in increase birth weight, and obvious double-hip muscle
characteristics (Casas et al., 1999).
As the candidate gene in pig double-hip muscle, the MSTN gene has an important
impact on the amount of lean meat and fat deposition (Sonstegard et al., 1998). The rabbit is a high
quality and efficient meat producing livestock as well as a common experimental
animal. Therefore, providing
information on its genetic basis and regulation
mechanism of skeletal muscle growth and development has an important
theoretical and practical significance (Qiao, 2014). The effects of the SNPs of
myostatin gene on chicken growth in a F2 resource population are associated with increase in abdominal fat
weight, abdominal fat percentage, birth weight and breast muscle percentage
(Zhiliang et al., 2004). Notably,
these data suggest that myostatin could be an ideal molecular marker for
marker-assisted selection for skeletal muscle and adipose growth in chicken
breeding program. It was reported that TTTTA deletion phenomenon occurred in
MSTN gene was unique for goats when compared with sheep, cattle, water buffalo,
domestic yak, pigs, and humans (Grisolia et
al., 2009; Zhang et al., 2013) Khichar et al. (2016) found an important effect of a 5-base pair (bp)
deletion onearly body weight and
size of a goat.
1.1 Justification
Identification of a candidate gene is a powerful method for understanding the direct genetic basis involved in the expression of quantitative traits and their differences between individuals (Rothschild and Soller, 1997; Nagaraja et al., 2000). Mutations of the MSTN gene-conserved region bases in chicken, rabbit and goat will lead to the activation or inhibition of the gene expression product and the loss or increase in function or inhibiting muscle growth, which will result in excessive muscle development and expression (Lee and McPherron, 1999). Indeed, there have been several recent examples in which comparative sequence data have led to the discovery and understanding of function of previously undefined genes. The complete human/mouse orthologous-sequence dataset proved particularly valuable in the characterization of gene families in humans and mice (Dehal et al., 2001). For instance, by comparing olfactory receptor gene families on human chromosome 19, computational analysis indicated
that humans have approximately 49 olfactory receptor genes, but only 22 had
maintained an open reading frame and appeared functional. This contrasts with
the vast majority of the homologous mouse genes that have retained an open
reading frame. This finding of reduced olfactory receptor diversity in humans
is consistent with the reduced olfactory needs and capabilities of humans
relative to rodents (Pennacchio and Rubin, 2003).
Growth hormone gene (GH) a single polypeptide
produced in the anterior pituitary gland is a promising candidate gene marker
for improving milk and meat production in goats and other farm animals (Min et al., 2005). IGF1 is a mediator of
many biological effects; it increases the absorption of glucose, stimulates
myogenesis and production of progesterone, inhibits apoptosis, participates in
the activation of cell cycle genes, increases the synthesis of lipids, and
intervenes in the synthesis of DNA, protein, RNA , and in cell proliferation
(Mohammadi et al., 2011)
The increasing availability of genomic sequence
from multiple organisms has provided biomedical scientists with a large dataset
for orthologous-sequence comparisons. The rationale for using cross-species
sequence comparisons to identify biologically active regions of a genome is
based on the observation that sequences that perform important functions are
frequently conserved between evolutionarily distant species, distinguishing
them from nonfunctional surrounding sequences. (Pennacchio and Rubin, 2003).
Sequence alignment is a good way of predicting the function of a gene or
protein. Moreover, sequences contain a lot more information, such as from which
organism the gene or protein is derived, and what are the evolutionary
relationships of the gene or species with other genes or species. Much of this
information can only be discovered by finding homologs of the gene or protein
in other species (Dutilh and Keșmir, 2016).
To justify this study, a comparative genomics
analysis to access the similarities and differences between these three growth
genes; Growth hormone (GH), Myostatin (MSTN) and Insulin-like growth factor-1
(IGF-1) gene among chicken, rabbit, and sheep will identify the similarities or
differences in the rate of increase in growth and body size to maturity, final
body size at maturity, and body conformation at maturity. The analysis of
sequences conserved between these three species will further enrich available
information of biologically active sequences in these species.