Percentile Analysis for Differential Gene Expression (PADGE) is a novel tool for identifying genes differentially expressed between two groups of heterogeneous samples. PADGE was designed to compare expression profiles of sample subgroups at a series of percentile cutoffs and to examine the trend of relative expression between sample classes as expression level increases. For comparative analysis, all sample statistical analysis (t test or Wilcoxon rank sum test), the COPA approach (Tomlins et al, Science, 2005) and Kurtosis (Teschendorff, et al, Bioinformatics, 2006) are also implemented and their results are displayed side by side with PADGE, providing a resource of web-based tools for analysis of heterogeneous expression patterns
PADGE analyzes expression values from microarray experiments in which samples were classified into two groups, such as "normal" or "cancer". The procedure consists of three components:
1) Subset statistical tests For each probe set, expression values from normal samples and cancer samples are stratified by a series of user-defined percentile cutoffs c1, c2, ., cn. Pairs of cancer and normal sample subsets are constructed with samples having expression values above each percentile cutoff. Statistical tests (e.g. t-test, Wilcoxon test) are selected by users to compare expression values between all samples from normal and cancer tissues as well as between each pair of sample subsets.
2) Percentile plot to visualize differential expression Ratios of expression values between cancer and normal samples at the corresponding percentile cutoff are used to generate a percentile plot that shows the magnitude and trend of relative expression as expression level increases in both cancer and normal samples (view examples). Analogous to quantile-quantile (Q-Q) plots, percentile plots compare the distributions of normal and cancer expression, but they are more intuitive and easily interpretable by biologists.
3) Summary score to prioritize candidates To prioritize candi-date oncogenes, we designed a summary score S to measure both the significance of over-expression in sample subgroups and the increase of relative expression between cancer and normal samples across percentiles. We define
     S = max[- (rn/r1)log(pn)]
where rn is the expression ratio between cancer and normal and pn is the p value for subset comparison at percentile n. Besides rank-ing genes by summary scores, PADGE allows users to specify thresholds for q values, ratios and fold increase of ratios across percentiles.
PADGE requires users to upload two files as tab-delimited text file. One is the data file that contains expression values. The first column of the file are gene names and the first row are sample names. See example file. The other is the sample file that contains sample classification. The first column are sample names that match exactly with those in the expression file and the second column are class labels, such as "cancer" or "normal". See example file. PADGE will look for over-expression in subsets of "cancer" samples compared to "normal" samples.
A set of parameters are required to run PADGE. If you leave the field blank, PADGE will run with default parameters which are pre-populated in the form. Percentiles are used as cutoffs to stratifify sample groups for subset statistical tests as well as x-axis in the PADGE plots. A minimal sample size of 5 is required for subset statistical tests. One can choose to perform t-test or Wilcoxon rank sum test (enter "t" or "wilcox" in Statistical test respectively). q value cutoff is the minimum q values obtained from subset statitical tests for filtering gene candidates. Set an exceedly large value like 10 if no q value threshold is preferredFold change cutoff is the maximum ratio between sample groups at various percentile cutoffs specified above for PADGE, and mean ratio between sample groups for other methods. Set the value to be 0 is no fold change threshold is preferred.# top genes to plot specifies the number of top ranking genes (ranked by S defined above) for which a PADGE plot will be generated. Besides PADGE, users can choose to perform other methods such as t test, Wilcoxon test, COPA and Kurtosis separately to analyze their data.
As the analysis may take a while to finish, a valid email address is required only for notification of the results. A link for viewing the results and PADGE plots will be sent to the user and appear in the interative web session once the analysis is done. The results are presented in a similar way as the example data set.
Please send Li Li your questions and/or feedback.