Tell me more about the statistics displayed in Nephroseq.
p Value: The p value is a measure of the significance of the analysis result. It has a value between 0 and 1 and describes the probability of finding a result this extreme under the null hypothesis. A small p-value indicates that the result is unlikely to have been caused by random chance; e.g. a p value of 0.02 suggests that under the null hypothesis a result this extreme would occur 1 in 50 times.
Reporter: Many of the microarray platforms in Nephroseq include multiple probes ("reporters") that measure the same gene. When this occurs in a dataset, Nephroseq computes the statistics for each reporter separately. The reporter with the best p value is displayed by default.
Differential Expression Analyses
Nephroseq pre-computes differential expression profiles using a two-sided Student’s t-test for two class differential expression analyses (e.g. diseased vs. control tissues).
Fold Change: Fold change describes how much a quantity changes when comparing two groups; in differential expression analysis the fold change is comparing the means of the two classes. Fold change is calculated as the ratio of the difference between the final value and the initial value over the original value. When measuring gene expression, Nephroseq shows the log2 of fold change log2(FC); this transformation actually makes most comparisons more intuitive:
- log2(FC) = 0 implies no difference between the classes
- log2(FC) = 3 implies the test group is 8 (23) times higher than the control
- log2(FC) = -3 implies the test group is 8 (23) times lower than the control
t Statistic (used for microarray datasets): For a single gene, the t statistic considers the means and variance of gene expression between two classes (sample subsets) and characterizes the difference in means. The t statistic is calculated using the Welch's t-test and is a ratio of the difference between the two populations divided by the square root of the sum of the variances normalized by sample size.
[t = (m1 - m2) / sqrt(var1/n1 + var2/n2)]
The Welch's t-test is similar to Student's t-test but does not assume equal sample size or variance between the compared classes.
F-test (used for RNA-Seq datasets): For each differential expression analysis, the edgeR Bioconductor library compares two classes (sample subsets) and evaluates the difference using a generalized linear model, reporting the difference as a quasi-likelihood F-test. For more information on edgeR and F-tests, see: Robinson MD et al. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data (Bioinformatics 2010).
Clinical Correlations
Nephroseq pre-computes Pearson correlations for all genes (in each dataset) against a variety of clinical properties, such as GFR, Proteinuria, Age, and BMI.
r Value: The r Value, also called the population Pearson correlation coefficient, measures the linear correlation between two variables (in our case, a clinical property and gene expression values). It has a value between +1 and −1; r=1 implies a perfect positive linear correlation, r=0 implies no linear correlation, and r=−1 implies a perfect negative linear correlation.
r2 (r-squared) Value: The r2 value, also called the coefficient of determination, measures how close the data are to the fitted regression line; it is the percentage of variation explained by the linear model. r2 ranges from 0 and 1; r2=0 implies the linear model does not explain any variation in the data and r2=1 implies the linear model explains the data perfectly.