1 min read

Types of models in DESeq2

Types of models in DESeq2

There are 2 major types of regression models one can specify in DESeq2 to explore the raw count matrices from an RNA-seq experiment -

  • Mean-reference model for Factors
  • Regression model for Covariates

Mean-reference model for Factors -

Factors typically represent categorical variable such as Gender, Ethnicity, Race etc.

The mean-reference model is the most commonly used one in DESeq2. This model is denoted as follows in the case of a single factor 'Treatment' with 2 levels (WT / Mutant) :

~ Treatment

DESeq2 will automatically convert the above notation to a mean-reference model. Here, we set the reference to WT. The equation is as follows:

Expression of a gene = β0 + β1 (Mutant)

Where β0 represents the mean expression of the gene in reference (WT) and β1 represents the difference (log2FoldChange) between the Mutant and WT condition.

Regression model for Covariates -

Covariates represent continuous variables such as age, BMI, RIN scores, weight and so on. Regression models are denoted as follows:

~ Age

If we would like to investigate the relationship between the expression of each gene across samples and the age of samples, we use such a model. Statistically, the equation is denoted as follows:

Expression of a gene = β0 + β1 (Age)

Looks familiar? This model is similar to the equation of a straight line where β0 is the y-intercept and β1 is the coefficient (a.k.a covariate / slope). The slope β1 tells us how correlated the age and expression is; similar to the linear regression line in scatter-plots. Note: Such models can only capture linear relationships.

Source - A guide to creating design matrices for gene expression experiments