Cs.sfsu.edu
Determination of Quantal Dose-Response Characteristics in Phenotypic
Assays using Supervised Classification
Daniel Asarnow1, and Rahul Singh1*
1Department of Computer Science, San Francisco State University
*Corresponding author R. Singh,
[email protected]
Abstract
We have designed and implemented a fully automatic, high-throughput screen against the causative parasite of the devastating illness, schistosomiasis, using computer vision and machine learning. The computer vision component segments (individually recognizes and delineates) schistosomula in bright-field micrographs, including touching and partially overlapping parasites. A learning model employing support vector machines to identify schistosomula which differ significantly from controls. Classification is performed in a high-dimensional feature space, the dimensions of which correspond to measurements of appearance, shape and texture. Because variation between different populations of schistosomula unavoidably creates different baselines for different experiments, classification is conducted in two stages: one in which putatively "normal" parasites are identified within each control image and used to derive an estimated control centroid, and one in which all parasites are classified as "normal" or "degenerate" on the basis of tuples composed of a given parasite's feature vector and corresponding control centroid. Finally, a continuous measurement of the phenotypic response to a particular experimental condition (such as the concentration of a certain drug) is produced using the notion of quantal response, or the proportion of individuals which differ from controls in a particular subpopulation. The learning model is demonstrated to be highly effective, and for test data has an accuracy of 0.89. Dose-response curves produced for four compounds (fluvastatin, niclosamide, praziquantel and simvastatin) with the automated method are also tightly correlated to those produced by human experts, with high statistical significance. Correlation values are > 0.97 in all cases with
p-values << 10-3.
Background
Schistosomiasis is a devastating, parasitic illness which is widely considered to be the second most socio-
economically devastating disease (after malaria) [1]. The World Health Organization has declared an
urgent need for development of new drugs for treatment of schistosomiasis. In order to support rapid,
phenotypic screening against schistosomiasis, we have developed a fully automated assay for
determination of the quantal phenotypic responses of schistosomula (juvenile schistosomes) populations
to varying experimental conditions. In contrast to our previous work in automated, image-based
classification of parasites [2,3], as well as other work focused on hit detection within a single
experimental condition [4], the assay presented below produces quantitative measurements of drug
responses across multiple conditions. These data may be used to construct complete dose-response (or
time-response) curves, without sacrificing any functionality with respect to hit detection. In order to
demonstrate the efficacy of the novel automated assay, we present an application of the assay to screening
against four compounds (fluvastatin, niclosamide, praziquantel and simvastatin), including the
determination of complete dose-response curves for each drug.
Challenges and recent advances
Development of automated, phenotypic screens against complex macroparasites, such as schistosomula,
poses several challenges:
Segmentation of individual parasites and tracking individuals across time
Definition and accurate measurements of salient phenotypic features (descriptors)
Definition of phenotype as a function of descriptors
Methodology for automated classification of parasites based on phenotypic descriptors
Validation and/or comparison of the automated assay values vis-à-vis manual assays
Recently significant progress has been made in addressing these challenges in the context of screens against Schistosomiasis [2,3,5] and introduction to the basic issues have become available [6]. Specifically, we have reported a novel segmentation method for micrographs of wells containing large numbers of schistosomula, which is effective even for touching and partially overlapping parasites [5]. While existing segmentation methods have also been applied to schistosomula, these methods are generally incapable of separating touching or overlapping parasites [4]. This leads directly to two problems. First, schistosomula must be carefully re-suspended to minimize their contact with one another, and second, any touching parasites which remain must be identified (e.g. by anomalous size) and discarded. Sets of visual features for schistosomula as well as approaches to phenotype using on such features have also been proposed [3,4]. Here we propose a new set of features partially overlapping with those used in past works. These features are designed to effectively represent the appearance, shape and texture of schistosomula. Compared with previously used features, this proposed set is significantly expanded in terms of statistical analysis of images of single (segmented) parasites as well as the multi-scale representation of textures. The above advances are all essential for the automated assay described in this paper.
Data collection
Parasites were harvested, distributed into 96-well plates and exposed to compounds following the method
of [7]. Four compounds capable of generating phenotypic response dynamics were tested:
PZQ, the current therapy for schistosomiasis (with unknown mechanism) [8].
Two compounds from the statin family of drugs, simvastatin and fluvastatin. Statins have been
found to kill schistosomes by preventing mevalonate synthesis [9].
The anthelmintic salicylanilide, niclosamide, used previously against the disease vector and
secondary host snail [10].
Each drug was prepared in a 10-fold serial dilution from a 20 mM stock in DMSO, starting from 2 mM. One microliter of each dilution was spotted into flat-bottom 96-well plates. Schistosomula (400 units/well) were then added in 200 µL Basch medium 169 [11] containing 5% FBS, 1x penicillin/streptomycin solution (yielding final drug concentrations between 0.001 μM and 10 μM). The final concentration of DMSO was 0.5% and plates were incubated at 37 °C and 5% CO2. Parasites were photographed every 24 h for 4 days [7] using a Zeiss Axiovert 40 C inverted microscope (5X objective) and a Zeiss AxioCam MRc digital camera controlled by AxioVision 40 (version 4.8.1.0) software. This protocol was used to obtain 88 images containing approximately 4,056 schistosomules, representing four complete sets of concentration points for each drug (plus controls).
Manual annotation
A human expert manually classified each of the segmented parasites as either "normal" or "degenerate"
relative to controls using a custom, web-based interface. Non-segmented parasites were ignored. This is
the extent of the "supervision" applied during the machine learning approach used herein. The
classifications were entered into a relational database which permitted association of each parasite with
the corresponding set of control parasites. In addition to training and validation for machine learning,
these data allow manually determined quantal responses to be computed by counting the degenerate
annotations.
Image segmentation
We use a previously described method, designed to surmount the difficulties presented by image
segmentation of schistosomula [5]. These include gross variation in morphology, texture and behavior as
well as the tendency for multiple parasites to touch or overlap one another, each of which may be
dramatically and unpredictably modulated by drug exposure. The segmentation algorithm avoids any
a
priori shape models in favor of a purely signals based, bottom-up approach. A novel binary edge
classifier employing phase congruency and grayscale morphological thinning is used to split any
foreground regions containing multiple schistosomula. This method was shown to individually segment
95.9% of parasites with boundaries deviating from ground an average of 1.3 pixels [5]. This high level of
accuracy is essential for extraction of high-quality features for efficient representation of parasite
phenotype. The reader may note that with the use of this method, data sets need not be culled of
abnormally sized (i.e. erroneously split or merged) parasites, in contrast [4].
Feature extraction
A feature set partially overlapping with those of [3,4] provides a numerical representation of segmented
parasites. The descriptors compactly represent parasites in terms of appearance, shape and texture, and are
listed in Table 3. The features are briefly described below; more details can be found in the given
references.
Pixel intensity distributions (as well as wavelet texture responses,
vide infra) are summarized using the mean, standard deviation and standardized central moments of up to fifth order (i.e. skewness, kurtosis and tail asymmetry). The standard central moment
Mα of order
α for a distribution
x is calculated from the mean
μx and standard deviation
σ,
following (1).
Recognizing that drug exposure alters internal anatomic features of schistosomula, we represent the appearance and distribution of the visible internal anatomy using 1) a threshold separating pixels occupied by anatomical structures within the parasite from parasite body and 2) the proportion of the parasite area occupied by anatomical structures. Two thresholds are used, one obtained using the EM algorithm [12] and one obtained using Otsu's method [13].
Another aspect of parasite appearance is general shape. The relative convexity of a parasite is measured using solidity, the ratio of the area of an object to its convex hull, while the shape of the spatial intensity distribution is captured using the eight invariant image moments of up to order three [14]. Analogous to moments of inertia, but with gray-level intensity taking the place of mass, these moments provide a description of the spatial distribution of image intensities which is invariant to translation, rotation and scaling.
Texture
Two approaches to the extraction and representation of texture are employed, based on gray-level co-
occurrence matrices (GLCM) [15] and log-Gabor wavelet transforms [16–18].
Gray-level co-occurrence matrices hold the joint probability for a given pair of gray-levels to be separated by a particular displacement vector. Texture is then represented via measurements of the statistical properties of the GLCM (contrast, correlation, energy, entropy, homogeneity). Textures at different image scales effects are captured using GLCM computed with five displacement scales (3, 7, 15, 29 and 59 pixels), and orientation independence within each scale is obtained by summing the GLCM across four equally spaced orientations. In addition to GLCM entropy, the raw image entropy is also computed.
Wavelet response statistics are effective representation of visual texture [16]. In particular, we use log-Gabor wavelets, which have properties in common with some neurons in the mammalian visual cortex [18]. Such filters optimally localize frequency and phase information in images and provide a natural framework for multiscale analysis. Five scales with center frequencies roughly corresponding to the displacement vector magnitudes used for GLCM construction are used. These log-Gabor filters are constructed using (2).
ω0 indicates center frequency, and the scale progression and a constant
κ are selected such that the ratio
κ/ω0 remains constant.
In a manner similar to the representation of parasite appearance, wavelet textures are approximated using mean, standard deviation and statistical moments of the filter as determined by (4). The five scales used here have pixel wavelengths of , leading to 25 wavelet texture dimensions.
In aggregate, the descriptors described above correspond to a feature space with 71 separate dimensions.
Table 1. Features used to represent phenotype.
Area : convex area
Foreground-background
Pixel intensities
ML, Otsu, threshold
occurrence matrix
Internal thresholds
Stat. moments of log-
Gabor filter responses
All 8 invariant moments
Automated classification of phenotypes
Schistosomula are automatically classified as "normal" or "degenerate" using the soft-margin SVM [19]
in conjunction with the sequential minimal optimization algorithm [20] and a Gaussian radial basis
function (RBF) kernel [21]. The scale of the Gaussian RBF
σ and the magnitude of the soft-margin box
constraint
C, constitute the key parameters of this method. We use values of
σ = 6.9 and
C = 3.28.
Training is conducted using 10-fold cross-validation and all 10 SVM obtained through cross-validation
training are applied to prediction (using majority voting between the classifiers). Cross-validation folds
were sampled using stratification in order to preserve the distributions of the "normal" and "degenerate"
classes.
Due to significant natural variations between subpopulations of parasites (e.g. those harvested on different dates), we find that classification using parasite feature vectors directly is insufficiently accurate across such subpopulations. In order to correct for these confounding variations, feature vectors for healthy
("normal") parasites from the control images for each subpopulation are averaged to yield a single vector representative of that subpopulation. When parasites are classified using a tuple of vectors containing their unique features as well as the average features of their corresponding controls, classification performance is significantly enhanced.
Automated classification of parasite phenotypes is thus conducted using a two-step approach. First, "normal" control parasites are identified using a set of SVM trained on raw feature vectors and are then used to construct a subpopulation-specific set of control features. Subsequently, all parasites are classified using a new set of SVM trained to identify tuples of parasite and control feature vectors.
Quantal phenotypic assay
Automated and manual versions of the phenotypic assay are conducted analogously based on the (manual
or automated) classifications of individual parasites. In either case, numerical phenotypic response values
are determined using the concept of the quantal response. Quantal responses are commonly used in
biology when a binary phenotype observed (e.g. "normal" versus "degenerate" schistosomula). The
quantal response is defined as the proportion of affected individuals (degenerate parasites) in a
subpopulation (plate well). Given
n replicate experiments each containing
Di degenerate parasites and
Ni parasites, the quantal response
R is computed from (3).
This definition gives a population weighted average across the imaged plate wells. In addition, the standard deviation between these experiments is also calculated. When plotted against compound concentration, these response values (naturally ranging from zero to one) yield complete dose-response curves which are suitable for direct analysis and comparison.
The complete architecture of the automated assay, from imaging and manual annotation to output of complete dose-response curves is depicted in Figure 1.
Figure 1. Architecture of the proposed automated high-throughput assay. The right side of the figure depicts feature extraction
and manual annotation of schistosomula images as used to train and test the dual-layer SVM bank classifier. The SVM bank in
the first layer is used to identify normal control parasites, which are used in the second layer to account for variation between
different populations of schistosomula. The left half depicts feature extraction and phenotype prediction used to generate drug
response information. Black arrows indicate data processing and data input steps, while blue arrows indicate training data used to
construct the classifiers. Solid red arrows indicate prediction by the SVM banks; dashed red arrows indicate simulated prediction
of unknown data using cross-validation. Note micrograph thumbnails have been subjected to lighting correction and while
segmented thumbnails show actual segmentation results, the manual annotations thumbnail is only schematic.
Results
We applied the automated assay presented in this work to the determination of dose-response curves for
fluvastatin, niclosamide, praziquantel and simvastatin based on the 88 images mentioned previously. The
SVM classifier was trained using 44 images representing two complete dose-response series for each
compound, and containing 2,044 segmented parasites. The remaining 44 images, also representing
duplicate sets of experiments, contained 2,012 parasites. These were placed into a test set, and although
expert annotations were available for these images, neither they nor the images were used in any way for
training of the classifier. Assay performance was then quantitated using two general approaches. First,
confusion analysis was applied to classification in the training set (using cross-validation), and also
directly to classification within the test set. Second, dose-response curves generated from test
classifications were compared to those obtained from expert annotations (which were not used for
training).
Confusion analysis
In the confusion analysis, manual classifications are compared to those produced by the SVM classifier.
The agreement between these sets of classifications is estimated using five measures of similarity between
binary vectors (i.e. sets of classifications as "normal" or "degenerate"). These measures, precision, recall,
F1-measure, Accuracy and Matthew's correlation coefficient (MCC), are all defined in terms of the
number of true positive (TP), false positive (FP), true negative (TN) and false negative (FN)
classifications in (4)-(8).
recall TP
precision TP
TP TN
accuracy TPTN FP
2 PPV TPR
TP TN FP FN
TP FPTP FN TN FPTN FN
The values of these measures, as estimated for the training set using 10-fold cross-validation, are given in Table 2. The analysis was repeated using the classifications for the test set, and the results are presented in Table 3. Again, these 2,012 parasites were not used in any way for training the classifier.
As demonstrated by Table 2 and Table 3, the classification performance is excellent during both cross-validation and testing. In particular, in the test setting, only 11% of parasites were misclassified (the accuracy was 0.890). These results strongly corroborate the efficacy of the two-stage SVM classification procedure.
Table 2. Cross-validated classification performance (training set).
Matthew's correlation
Table 3. Classification performance (test set).
Matthew's correlation
Generation of Dose-response curves
The most important test of the automated assay described in this paper is its ability to reproduce accurate
dose-response curves from data which were not used for training. The dose response curves for the four
compounds tested (fluvastatin, niclosamide, praziquantel and simvastatin) as determined by both the
automated and manual versions of the assay are plotted in Figure 3. Visually, there is a close
correspondence between the dose-responses curves derived from the manual screen and those obtained
using the fully automated assay developed above. Pearson's correlation coefficients between these curves
across all exposure times, as well as the corresponding p-values, are listed in Table 4. The p-values were
estimated using Student's t-distribution under the alternative hypothesis that the correlation is not zero.
The correlations presented in Table 4 are all ≥ 0.97 and are highly significant (p-values << 10-3) for all
compounds.
It should be noted that the manual and automated dose-response curves for praziquantel deviates qualitatively, because the phenotype induced by this drug includes subtle changes in parasite's tegument, which are not fully recognized using the computer vision approach in this paper. Nevertheless, the correlation and the p-value for praziquantel are still quite good at 0.971 and 5.9 * 10-4, respectively. Although the algorithm does not identify all of the degenerate parasites, the same relative proportions exist across concentrations, and the automated response values thus represent a scaling of the manually determined ones.
Figure 3. Dose-Response Curves. Manual (dashed) and automated (solid) dose-response curves after 4 days of exposure to
fluvastatin, niclosamide, praziquantel and simvastatin. Error bars represent one standard deviation centered on the median of
duplicate observations.
Table 4. Comparison of manual and automated dose-response curves.
Discussion
The final learning model demonstrates excellent performance for classifying phenotypes of schistosomula
both during cross-validation and in a test setting (Table 2 and Table 3, respectively). When used for
determination of quantal response values, the resultant dose-response curves are very close to those
produced manually (Figure 3). This agreement between manual and automated versions of the assay is
also supported by the tight correlation and low p-values reported in Table 4.
Acknowledgement
The authors would like to sincerely thank Liliana Arreola-Rojo, Brian M. Suzuki and Conor R. Caffrey
for collecting data and providing much needed biological insight.
DA and RS were funded by NIH award 1R01AI089896 and by NSF through grant IIS 0644418 (CAREER).
References
1. World Health Organization. Dept. of Control of Neglected Tropical Diseases., Crompton DWT,
Daumerie D, Peters P, Savioli L (2010) Working to overcome the global impact of neglected tropical diseases first WHO report on neglected tropical diseases. Geneva, Switzerland: World Health Organization. Available: http://site.ebrary.com/id/10430901. Accessed 20 March 2012.
2. Singh R, Pittas M, Heskia I, Fengyun Xu, McKerrow J, et al. (2009) Automated image-based
phenotypic screening for high-throughput drug discovery. 22nd IEEE International Symposium on Computer-Based
3. Lee H, Moody-Davis A, Saha U, Suzuki BM, Asarnow D, et al. (2012) Quantification and clustering
of phenotypic screening data using time-series analysis for chemotherapy of schistosomiasis. BMC Genomics 13: S4. doi:10.1186/1471-2164-13-S1-S4.
4. Paveley RA, Mansour NR, Hallyburton I, Bleicher LS, Benn AE, et al. (2012) Whole Organism
High-Content Screening by Label-Free, Image-Based Bayesian Classification for Parasitic Diseases. PLoS Negl Trop Dis 6: e1762. doi:10.1371/journal.pntd.0001762.
5. Asarnow DE, Singh R (2013) Segmenting the Etiological Agent of Schistosomiasis for High-Content
Screening. IEEE Trans Med Imaging 32: 1007–1018. doi:10.1109/TMI.2013.2247412.
6. Singh R (2012) Quantitative High-Content Screening-Based Drug Discovery against Helmintic
Diseases. In: Caffrey CR, editor. Parasitic Helminths. Wiley-VCH Verlag GmbH & Co. KGaA. pp. 159–179.
onlinelibrary.wiley.com.opac.sfsu.edu/doi/10.1002/9783527652969.ch10/summary. Accessed 21 July 2013.
7. Abdulla M-H, Ruelas DS, Wolff B, Snedecor J, Lim K-C, et al. (2009) Drug Discovery for
Schistosomiasis: Hit and Lead Compounds Identified in a Library of Known Drugs by Medium-Throughput Phenotypic Screening. PLoS Negl Trop Dis 3: e478. doi:10.1371/journal.pntd.0000478.
8. Caffrey CR, Secor WE (2011) Schistosomiasis: from drug deployment to drug development. Curr
Opin Infect Dis 24: 410–417. doi:10.1097/QCO.0b013e328349156f.
9. Rojo-Arreola L, Long T, Asarnow D, Suzuki BM, Singh R, et al. (2014) Chemical and Genetic
Validation of the Statin Drug Target to Treat the Helminth Disease, Schistosomiasis. PLoS ONE 9: e87594. doi:10.1371/journal.pone.0087594.
10. Jordan P, Webbe G, others (1969) Human schistosomiasis. Hum Schistosomiasis.
11. Basch PF (1981) Cultivation of Schistosoma mansoni In vitro. I. Establishment of Cultures from
Cercariae and Development until Pairing. J Parasitol 67: 179–185. doi:10.2307/3280632.
12. Glasbey CA (1993) An Analysis of Histogram-Based Thresholding Algorithms. CVGIP Graph
Models Image Process 55: 532–537. doi:10.1006/cgip.1993.1040.
13. Otsu N (1979) A Threshold Selection Method from Gray-Level Histograms. IEEE Trans Syst Man
Cybern 9: 62–66. doi:10.1109/TSMC.1979.4310076.
14. Hu M-K (1962) Visual pattern recognition by moment invariants. IRE Trans Inf Theory 8: 179–187.
15. Haralick RM, Shanmugam K, Dinstein I (1973) Textural Features for Image Classification. IEEE
Trans Syst Man Cybern SMC-3: 610–621. doi:10.1109/TSMC.1973.4309314.
16. Manjunath BS, Ma WY (1996) Texture features for browsing and retrieval of image data. IEEE Trans
Pattern Anal Mach Intell 18: 837–842. doi:10.1109/34.531803.
17. Carpenter AE, Jones TR, Lamprecht MR, Clarke C, Kang IH, et al. (2006) CellProfiler: image
analysis software for identifying and quantifying cell phenotypes. Genome Biol 7: R100. doi:10.1186/gb-2006-7-10-r100.
18. Field DJ (1987) Relations between the statistics of natural images and the response properties of
cortical cells. J Opt Soc Am A 4: 2379–2394. doi:10.1364/JOSAA.4.002379.
19. Cortes C, Vapnik V (1995) Support-vector networks. Mach Learn 20: 273–297.
doi:10.1007/BF00994018.
20. Platt JC (1998) Sequential Minimal Optimization: A Fast Algorithm for Training Support Vector
Machines. ADVANCES IN KERNEL METHODS - SUPPORT VECTOR LEARNING.
21. Boser BE, Guyon IM, Vapnik VN (1992) A training algorithm for optimal margin classifiers.
Proceedings of the fifth annual workshop on Computational learning theory. COLT '92. New York, NY,
doi.acm.org.opac.sfsu.edu/10.1145/130385.130401. Accessed 4 October 2013.
Source: http://cs.sfsu.edu/sites/default/files/technical-reports/QDREC.TechReport.14.01.pdf
International Journal of Physical Sciences Vol. 3 (1), pp. 001-011, January, 2008 Available online at http://www.academicjournals.org/IJPS ISSN 1992 - 1950 © 2008 Academic Journals Ful Length Research Paper Groundwater fluoride levels in villages of Southern Malawi and removal studies using bauxite Sajidu, S. M. I.1, Masamba, W. R. L.1,2*, Thole, B.3 and Mwatseteza, J. F.1
The Open Clinical Chemistry Journal, 2009, 2, 7-11 7 Open Access Improvement of Phencyclidine-Induced Cognitive Deficits in Mice by Subsequent Subchronic Administration of Fluvoxamine, but not Sertraline Tamaki Ishima1, Yuko Fujita1, Mami Kohno1, Shinsui Kunitachi1, Mao Horio1, Yuto Takatsu1, Takahiko Minase1, Yuko Tanibuchi1,2, Hiroko Hagiwara1,2, Masaomi Iyo2 and Kenji Hashimoto1,*