# Blog

## K-means clustering

I wanted to remind myself how the k-means clustering algorithm worked. Following are the steps involved in K-means clustering - 1. Start with a vector of 12 data points. For instance, [1, 2, 3, 4, 7, 8, 9, 10, 20, 21, 22, 23] 2. Randomly select 3 data points. These

## Interesting Bioinformatics Articles

Following is a collection of articles which I feel every Bioinformatician must be aware of. I will keep updating this list from time to time - 1. All biology is computational biology 2. Core services: Reward bioinformaticians 3. Importance of stupidity in scientific research

## The Art of Reading a Technical Paper

I came across an interesting Bioinformatics paper recently and wanted to read and understand it in its entirety. Reading the paper seemed intimidating at first, as the technical jargon that was being used seemed quite overwhelming. But reading a paper is not as difficult as it may first seem. Following

## Screen command in UNIX

Screen is a very useful command to have in your toolbox if you frequently use interactive sessions on your supercomputer logged in through a VPN. A VPN typically has a time limit, and you may get disconnected from it without any warning when you have poor internet connection. Screen program

## Types of models in DESeq2

There are 2 major types of regression models one can specify in DESeq2 to explore the raw count matrices from an RNA-seq experiment - * Mean-reference model for Factors * Regression model for Covariates Mean-reference model for Factors - Factors typically represent categorical variable such as Gender, Ethnicity, Race etc. The mean-reference

## The Manga Guide to Statistics

I finished the book 'The Manga Guide to Statistics' by Shin Takahashi this weekend. Verdict This book is a good read for intermediate statisticians wanting to brush up on the basics taught in high school / early undergraduate degree. This is not a beginner-friendly book as it requires prior understanding of

## Steps in DESeq function

For any RNA-seq data analysis project, one would most probably end up using the R package DESeq2. The function 'DESeq' is the main function that is called to obtain the differentially expressed genes. But what exactly does this function do? In short, DESeq function combines the following 3 steps -

## Normalize Counts Matrix in DESeq2

I followed Josh Starmer's YouTube video on how DESeq2 normalizes the raw counts matrix, and attempted to reproduce the steps in R. This normalization procedure accounts for both the library size (total no. of reads in each sample) as well as the library composition (enables us to compare between different

## Advice for Bioinformaticians

Keith Bradnam has an excellent series of interviews with 39 notable Bioinformaticians at this link. Anyone interested in the field of Bioinformatics should check it out! Following is a summary of the advice from interviewees - 1. There is always something new to learn in Bioinformatics everyday. It is an

## Getting Started With Bioinformatics

When I first heard about the field of Bioinformatics, I thought about how cool it would be to work in this field. The idea of analyzing sequencing data using computers fascinated me. I was not good at running microbiology experiments anyway, and the scope for variability with performing such experiments