There are 2 major types of regression models one can specify in DESeq2 to explore the raw count matrices from an RNA-seq experiment -
- Mean-reference model for Factors
- Regression model for Covariates
Mean-reference model for Factors -
Factors typically represent categorical variable such as Gender, Ethnicity, Race etc.
The mean-reference model is the most commonly used one in DESeq2. This model is denoted as follows in the case of a single factor 'Treatment' with 2 levels (WT / Mutant) :
DESeq2 will automatically convert the above notation to a mean-reference model. Here, we set the reference to WT. The equation is as follows:
Expression of a gene = β0 + β1 (Mutant)
Where β0 represents the mean expression of the gene in reference (WT) and β1 represents the difference (log2FoldChange) between the Mutant and WT condition.
Regression model for Covariates -
Covariates represent continuous variables such as age, BMI, RIN scores, weight and so on. Regression models are denoted as follows:
If we would like to investigate the relationship between the expression of each gene across samples and the age of samples, we use such a model. Statistically, the equation is denoted as follows:
Expression of a gene = β0 + β1 (Age)
Looks familiar? This model is similar to the equation of a straight line where β0 is the y-intercept and β1 is the coefficient (a.k.a covariate / slope). The slope β1 tells us how correlated the age and expression is; similar to the linear regression line in scatter-plots. Note: Such models can only capture linear relationships.