Instructions for preparation and submission of papers for the proceedings of sixth international conference on computer science and information technologies (csit 2007)*
Algorithmic analysis of functional pathways affected by
typical and atypical antipsychotics
Arsen Arakelyan, Anna Boyajian, Levon Aslanyan, David Muradian, and Hasmik Sahakyan
"Laboratory of Information Biology" Project of the Institute of Molecular Biology and Institute for Informatics and
Automation Problems
National Academy of Sciences of Republic of Armenia
Yerevan, Armenia
e-mail:
[email protected]
ABSTRACT
A different situation appears in area of pattern recognition
The advantages of atypical vs. typical neuroleptics have been
(PR). There is no satisfactory statistics in this case. These
demonstrated in a number of clinical trials. Differences in
heuristics are more responsible and conditional. Learning set
functional pathways affected by typical and atypical
is given as a limited number of known classifications but it
antipsychotics in the brain have been assessed using Gene Set
has to be large enough to describe the class properties in
Enrichment Analysis. Data on gene expression, obtained from
application area. A number of basic approaches are known in
Gene Expression Omnibus, is a numerical array of size
PR - Metric Algorithms, Logic Separation (LS), Neural
4x17000, which can be treated directly neither by statistical
Networks, etc. One of the well-known classes of metric
approach nor by means of classification and pattern
algorithms is the voting (or estimation calculation) model [1].
recognition regular theory. Extended logic-combinatorial
This is an algorithmic model with a number of additional
scheme is designed for the treatment of this kind data set.
parameters, requiring optimization during the learning stage.
Applied results show that atypical neuroleptics have less
The dataset of gene expression, being considered, structurally
effect on pathways related to neurodegeneration, cognition,
obeys neither statistical requirements nor – of pattern
neuronal architectonics, as well as stimulation of
recognition. Two classes are given, two learning examples in
inflammatory processes.
each. Instead, the features set is very large. All these raise a novel very specific situation for data analysis, when it is
necessary to recover the limited and valuable knowledge
Keywords
contained in such structures. ANOVA methods are the typical
Pattern Recognition, Functional Pathway, Gene Set
tool being proposed to determine the gene sets that are
Enrichment Analysis, Neuroleptics
diffrentially expressed over different experimental conditions.
However, only a few studies have been concerned with the
1. INTRODUCTION
use of ANOVA when the number of genes is large and the number of observations is small. The strong normality, and
The functional pathway analysis affected by second-
independence assumptions, that traditional ANOVA imposes,
generation atypical antipsychotics (atypical neuroleptics; AN)
makes it impractical and not powerful enough. Several
over those from the first generation (typical neuroleptics; TN)
improvements and alternative approaches were developed [8].
become a hot research topic. Promising studies suggest that
Biclustering or simultaneous clustering [9], where both genes
atypical antipsychotics have less pronounced extrapyramidal,
and conditions is challenging particularly to find subgroups of
anticholinergic, parkinsonian and dystonic side effects [3-5].
genes and subgroups of conditions where the genes exhibit
However, more detailed studies are needed for complete
highly correlated activities over a range of conditions. Next to
assessment of preponderance over the TN. Since the frontal
mention is the branch of pattern recognition name Logical
cortex is one of the most important regions for antipsychotics
Combinatorial Pattern Recognition [1], which works
action, in this study we compare the effects of treatment by
effectively with nonstandard classification problems. We
typical and atypical neuroleptics on functional pathways in
design and extend logic-combinatorial scheme to overcome
frontal cortex of the brain. The "Typical and atypical
the difficulty raised by our practical problem. Elementary
antipsychotic drugs effect on brain" dataset ℑ of Gene
classifiers, cluster analysis, testing and greedy solvers are
Expression Omnibus (GEO) repository
considered and applied.
has been used. This dataset contains gene expressions from frontal cortex of 13-week male mice treated for 28 days with
2. ALGORITHM
antipsychotics. Chlorpromazine and thioridazine were used as
Pattern recognition deals with classes given by limited sets of
typical antipsychotics, and olanzapine and quetiapine as
classified examples and possibly by some hypotheses of the
atypical antipsychotics. Gene expression profiles in GEO
classes themselves. The main goal is to find an algorithm-
dataset were obtained using Agilent 011978 Mouse
classifier which extends the known classification to the area
Microarray G4121A (GEO Platform ID: GPL891, Agilent
of unclassified objects. Formally, conditions of correct
classification of all objects might be composed and then the
Generally the basic approach of diverse type of analysis of
problem of maximization of number of satisfied conditions
multidimensional experimental data sets is mathematical
appears. For linear hyperplane classifiers for example we
statistics (MS). Having satisfactory amount of experimental
receive systems of linear inequalities, - unnecessarily
data (statistics) it helps to form conclusions that some
compatible in general. The question is in determining the
properties and postulations take place in some probabilistic
maximal compatible subset of such systems, which is
level. Simple correlation, regression and hypothesis
computationally a known NP hard problem. The situation
estimation algorithms are components of the statistical
with classes of typical and atypical antipsychotics given by
data ℑ is relatively different. ℑ , containing data on gene expression, is a numerical array of four 1700 long numerical
ℑ = {
S ,
S ,
S ,
S }. Classes consist of two Generally
k -classifiers examine
k columns, construct
convex hulls in areas of two considered classes, consider the
members each:
S ,
S - typical, and
S ,
S - atypical. It is
geometrical centers and balanced middle point, which serves as the value for classification. In our case convex hulls are
evident that almost any unique column
S (
i),
S (
i),
just intervals of a multidimensional vector space. The force
S (
i),
S (
i) of ℑ can correctly classify the two drug sets
f (
i ,.,
i ) is defined through the projections of intervals
even using a simple hyperplane. And the number of such
into the separating hyperplans. The projection area, divided
columns might be very large among the 17000. The same
on length of interval projections, provides a comparable for
time, it is realistic that different sets of columns are
all
k measure of force of separation.
classifying the classes differently. Formally, a collection of
Future Work –
Growing Support Systems.
subsets of the set
n is known as a set of support
Among 2 elementary classifiers defined above, we intend
systems Ω [1]. Support system is the unit used in comparison
to find those ones for which corresponding subsets of genes
of a pair of object descriptions. This is when a set of
are most differentially expressed by drug groups. The simplest
distances, - each by a member of Ω is defined. The
way is to start by a 1-classifier, and growing it step by step to
application counterpart is that a set of features – not smaller
k -classifiers so that the forces are strictly increasing, with
and not larger than a support system is very effective in
interruption in the
k th step. Any k-classifier may be
describing a particular classification. This brings us to the problems of determining the proper column subsets (support
considered as a composition of one
k − 1 -classifier together
systems), which provide the maximal difference between
with one new column. Concepts
c (
i ,.,
i ) and
classes (quality vs. accuracy of classification). In doing this
we will eliminate the equivalent (in some sense) columns
f (
i ,.,
i ) in this way introduce monotonity relation
from one side; and will compose the sets of columns
between gene sets, put into 1-1 correspondences to the
representing different equivalency subsets as approximations
vertices of an
n dimensional unit cube. However, this might
to the proper support systems. Last general note we bring is that for classes we consider, support systems are presented -
be rather hard to fulfill because of for some large values of
k
by two vectors in each class. We connect these two vectors
it will become impossible to consider all 2 sub-classifiers.
into the intervals and consider the best hyperplane separation
The search area for these subsets is very large, and
of these two intervals. We receive a simplest geometry
appropriate heuristics to combat this complexity is necessary.
separation problem. The advantage is that we are able to
We consider several heuristics:
compare support systems finding the most effective ones among them.
Sorting 1-classifiers by decreasing forces
f (
i) ,
Classifiers
and eliminating from the further treatment columns with
At first we define
Elementary classifiers.
forces lower than the threshold selected. Let the columns in
These are hyperplane classifiers by small number of columns.
sorted sequence are as
i ,
i ,.,
i ,.,
i . An important
1-classifier is defined through a single column (let say the
i th) and its expression values
S (
i),
S (
i), and
property of this sequence is the first index
i so that forces
S (
i),
S (
i) . Denote by
t
( ) and
a (
i) the average
f (
i ,.,
i ) are increasing for
i <
i and this increase
values on the intervals (
S (
i),
S (
i)) and interrupts at the point
i . Besides, sorting may also be
(
S (
i),
S (
i))
applied to the mixed sets of classifiers because of the note on
respectively, and let
t (
i) and
a (
i) is
comparability of forces for different
k 's.
the lengths of these intervals.
1-classifier c (
i) by
i th
Consider an arbitrary hyperplane elementary
column,
i ∈ ,
1
n ( n is the number of gene expressions), is
classifier
c (
i ,.,
i ) . Compose
n -dimensional binary
defined as the balanced (by values
t (
i) and
a (
i) )
vector, evaluating coordinates
i ,.,
i as 1. Completing by 0
middle point of interval (
t
(
i),
a (
i)) , and
all the coordinates, not used in
c (
i ,.,
i ) we create a 1-1
correspondence between classifiers and
n -cube vertices.
t (
i) −
a (
i) =
f (
i) is called the power/force of
Applying hierarchical clustering in n-cube layers we split k-
c (
i) .
classifiers by the equivalency relation (after some cut of
dendrogram). Similarity measure used is some correlation
2-classifier considers pair of genes and expression values.
between the hyperplanes (their coefficient vectors). We
Logically 2-classifiers are to be composed by pairs of genes,
consider the representatives sets of clusters. Some of them
higher ranked by corresponding
1-classifiers. Arranging
may give the same force of classifying drug groups by gene
columns by decreasing order of values
f (
i) we rank the
expressions as the whole descriptive table does. In this way
gene expressions by their forces for differentiating two drug
we reduce the dimensionality combating the exponential explosion for large
n .
groups. 2-classifiers and in general
k -classifiers consider any
As it was mentioned, 1-classifiers might be directly
k columns, construct average values on corresponding
sorted by their forces. Any
k -classifier may be considered as
intervals in classes (intervals by row vector pairs) and define
a composition of one
k − 1-classifier
c (
i ,.,
i
structures
c (
i ,.,
i ) and
f (
i ,.,
i ) .
c (
i ,.,
i )
defines the hyperplane, separating the average expressions by
together with one new column
i . In terms of class vectors
drug groups and gene collections, and
f (
i ,.,
i ) defines
this change means concatenation of a new dimension in
the quality of this separation.
direction
i . Concepts
c (
i ,.,
i ) and
f (
i ,.,
i ) in
this way introduce monotonity relation between gene sets in
the same way as the vertices of
n dimensional unit cube
which are in 1-1 correspondence to elementary classifiers.
Considering subsets of different n-cube layers and taking into
account monotonity we may apply the chain split technology
[10] in finding the best separating gene sets. It is important to
note that chain split (and other known frequent subsets
growing algorithms of association rule mining) work with
random objects otherwise with overall structure of all objects
which is computationally hard. Instead, the representatives set
mentioned above are a valuable heuristic that may help in
reducing the computational complexity in growing.
The results obtained suggest that AN, as compared to TN,
Consider the convex hull Ξ of all classifiers
have less influence on regulatory pathways contributing to
neurodegeneration (Huntington's disease) and neuronal
c ,
j ∈ ,
1 2 in
n dimensional vector space. The volume
architectonics (Axon guidance, Gap junction), cell
and shape of Ξ appears as a sophisticated measure of drug
proliferation (Glioma). Moreover, according to our findings,
groups' differences, characterized by the gene expressions.
TN strongly affects GnRH signaling pathway, as well as immune response regulatory reactions (Focal adhesion and
Approximation of Ξ by smaller groups of genes might be
Cell adhesion molecules), whereas AN have very week
achieved in different ways. Such smaller subsets are effective
influence on these processes. In addition, our study revealed
candidates for separating the drug group-driven expression
that AN possess less pronounced effects on cognition,
differences. These subsets might be compared to functional
particularly related to learning memory (Long-term
gene subsets describing the drug influences. A satisfactory
potentiation, Long-term depression) than TN do.
approximation of Ξ by gene sets or by classifiers sets shows that these subsets keep the diversity of drug groups. The
approximation we considered is greedy algorithm, given in
5. CONCLUSION
The benefit of AN, compared to TN, includes less side effects
related to functional pathways of brain frontal cortex.
3. APPLIED MODEL
Extended classification algorithms are designed to analyze the
To generate ranked gene list, first, average intragroup (TN
applied data which are of very specific structure.
and AN) expression levels for each gene were calculated.
Then, for each gene the average level for TN group was
REFERENCES
subtracted from the average level of AN group. Finally, the
1. Yu. Zhuravlev, Selected research publications, Magistr,
gene list was sorted according to decrease in the average
Moscow, 1998, 420p (in russian).
differences between groups (from largest to smallest). Further,
2. L. Aslanyan, J. Castellanos, Logic based Pattern
Gene Set Enrichment Analysis (GSEA) [6] was applied to
Recognition - Ontology content (1), Int. Journal "Information
identify functional pathways (together with genes involved in
Technologies and Knowledge", v.1, 2007.
each) affected by typical and atypical antipsychotic treatment.
3. R. Galili-Mosberg, et al. "Haloperidol-induced neuro-
Given an a priori defined set of genes (e.g., genes encoding
toxicity-possible implications for tardive dyskinesia", J.
proteins of metabolic pathway, located in the same
Neural Transm., 107 (4), pp. 479-490, 2000.
cytogenetic band, or sharing the same GO category), the goal
4. I. Gil-ad, et al. "Evaluation of the neurotoxic activity of
of GSEA is to determine whether the members of set are
typical and atypical neuroleptics: relevance to iatrogenic
randomly distributed throughout the ranked gene list or
extrapyramidal symptoms", Cell. Mol. Neurobiol., 21 (6), pp.
primarily found at the top or bottom. Whenever a gene
belonging to the functional set is found, an enrichment
5. S. Hakobyanet al. "Classical pathway complement activity
statistic (ES) is increased by a certain amount, otherwise the
in schizophrenia", Neuroscience Letters 374, pp 35-37, 2005.
ES is decreased. The enrichment score is the maximum
6. A. Subramanian, et al. "Gene set enrichment analysis: A
deviation from zero in the random walk and corresponds to a
knowledge-based approach for interpreting genome-wide
support set elementary classifiers statistic. The minimum and
expression profiles", PNAS, 102(43), pp. 15545-155502005,
maximum of this enrichment score are used to estimate the
significance of the enrichment.
7. C. Backes et al. "GeneTrail - advanced gene set enrichment
For GSEA analysis of the generated ranked gene list the
analysis", Nucleic Acid Research, Web Server Issue 2007.
GeneTrail web-based software developed by Center of
8. G.F. Von Borries, Partition clustering of high dimensional
Bioinformatics of Saarland University was used [7]. For
low sampling size data base on p-values, PhD dissertation,
multiple testing adjustment Benjamini and Hochberg's false
Kansas State University, 2008, p. 139.
discovery rate (FDR) was used. P values less than 0.05 (after
9. F. Divina and J. S. Aguilar-Ruiz, Biclustering of expression
FDR correction) were considered as significant. GeneTrail
data with evolutionary computation", IEEE Transactions on
covers a wide variety of biological categories and pathways,
Knowledge and Data Engineering, vol. 18, pp. 590–602,
from which we chose KEGG.
10. L. Aslanyan and H. Sahakyan, Chain split and
4. RESULTS AND DISCUSSION
computation in practical rule mining, Information Science and
The results of this study showed different patterns of gene
Computing, International book series no. 8., Classification,
expression in frontal cortex of mice treated with typical and
forecasting, data mining, 2009, pp.132-135.
atypical antipsychotics. All identified functional pathways
11. H. Sahakyan, L. Aslanyan, Differential Balanced Trees
were up-regulated in TN group compared to AN group (table
and (0,1)-Matrices, International Journal "Information
Theories and Applications", ISSN 1310-0513, 2003, Volume
10, Number 4, pp. 363-369.
Table 1. Functional pathways affected by treatment with typical and atypical neuroleptics
Source: https://csit.am/2009/proceedings/7PRIP/1.pdf
Material Safety Data Sheet Revision Number: 008.0 Issue date: 05/06/2014 1. PRODUCT AND COMPANY IDENTIFICATION Product name: LOCTITE® 271™ THREADLOCKER IDH number: HIGH STRENGTH PART NO. 27131 Product type: Anaerobic Sealant Item number:
Hindawi Publishing CorporationEvidence-Based Complementary and Alternative MedicineVolume 2013, Article ID 502131, 17 pageshttp://dx.doi.org/10.1155/2013/502131 Review ArticleTai Chi Chuan in Medicine and Health Promotion Ching Lan,1 Ssu-Yuan Chen,1 Jin-Shin Lai,1 and Alice May-Kuen Wong2 1 Department of Physical Medicine and Rehabilitation, National Taiwan University Hospital,