For any RNA-seq data analysis project, one would most probably end up using the R package DESeq2. The function 'DESeq' is the main function that is called to obtain the differentially expressed genes. But what exactly does this function do?
In short, DESeq function combines the following 3 steps -
- estimateSizeFactors - This step calculates the size factors for each sample (explained here).
- estimateDispersions - This step obtains the dispersions for each gene. Here, dispersion does not mean the variance. Rather, it represents the deviation of the variance from the mean. This value is required by DESeq as it assumes a negative binomial distribution, and dispersion is one of the model's parameters. This link has a good explanation on estimating dispersions.
- nbinomWaldTest - This step fits the normalized counts data with a negative binomial model and runs a Wald test to find the differentially expressed genes in the dataset.