# Types of models in DESeq2

Aarthi Ramakrishnan

There are 2 major types of regression models one can specify in DESeq2 to explore the raw count matrices from an RNA-seq experiment -

• Mean-reference model for Factors
• Regression model for Covariates

## Mean-reference model for Factors -

Factors typically represent categorical variable such as Gender, Ethnicity, Race etc.

The mean-reference model is the most commonly used one in DESeq2. This model is denoted as follows in the case of a single factor 'Treatment' with 2 levels (WT / Mutant) :

~ Treatment

DESeq2 will automatically convert the above notation to a mean-reference model. Here, we set the reference to WT. The equation is as follows:

Expression of a gene = β0 + β1 (Mutant)

Where β0 represents the mean expression of the gene in reference (WT) and β1 represents the difference (log2FoldChange) between the Mutant and WT condition.

## Regression model for Covariates -

Covariates represent continuous variables such as age, BMI, RIN scores, weight and so on. Regression models are denoted as follows:

~ Age

If we would like to investigate the relationship between the expression of each gene across samples and the age of samples, we use such a model. Statistically, the equation is denoted as follows:

Expression of a gene = β0 + β1 (Age)

Looks familiar? This model is similar to the equation of a straight line where β0 is the y-intercept and β1 is the coefficient (a.k.a covariate / slope). The slope β1 tells us how correlated the age and expression is; similar to the linear regression line in scatter-plots. Note: Such models can only capture linear relationships.

Bioinformatics

## Interesting Bioinformatics Articles

Following is a collection of articles which I feel every Bioinformatician must be aware of. I will keep updating this list from time to time - 1. All biology is computational biology 2. Core services: Reward bioinformaticians 3. Importance of stupidity in scientific research

## Screen command in UNIX

Screen is a very useful command to have in your toolbox if you frequently use interactive sessions on your supercomputer logged in through a VPN. A VPN typically has a time limit, and you may get disconnected from it without any warning when you have poor internet connection. Screen program

## Steps in DESeq function

For any RNA-seq data analysis project, one would most probably end up using the R package DESeq2. The function 'DESeq' is the main function that is called to obtain the differentially expressed genes. But what exactly does this function do? In short, DESeq function combines the following 3 steps -