What is the process of data analysis?
The MMLV reverse transcriptase works just like the reverse transcriptase of the second strand synthesis method, but it adds on a set of nucleotides onto the 3’ end of the cDNA it just synthesized, which then becomes the anchor for the template switching oligonucleotide. The presence of the template switching oligo causes the reverse transcriptase to switch to the other strand, where it starts forming the second strand.
The next step is amplification. This can be done via PCR or IVT (in vitro transcription). PCR is nonlinear, resulting in bias because its efficiency is based on the sequence. IVT, though it is linear, produces a stronger 3’ coverage bias because the process itself needs a more reverse transcription of the amplified RNA, unlike PCR amplification. Finally, the actual sequencing needs to be performed via a sequencing platform, of which there are many with a variety of different strengths and weaknesses. One additional step that is often taken, but could
be performed at nearly every step of the process, is barcoding the cells using UMIs (unique molecular identifiers).
Before the data can be analyzed, it needs to be checked for quality. It is difficult to adjust for all the differences that most likely cause alternation to read counts (like capture inefficiency and total differential RNA), so when there is no UMIs, the only source of variance that is normalized ends up being adjusting for sequencing depth. With UMIs the total RNA content can be measured, which can then also be normalized. Technical and biological noise needs also to be accounted for and reduced. It is tricky, however, to differentiate the meaningless noise from actual biological variation, particularly when ‘characterizing sub-populations, identifying highly heterogeneous genes, and comparing expression levels among groups of cells.’ There are ways to try and tackle this problem. One way is to try and estimate the technical variability so that the genes that have variability much greater than the estimated technical variability are identified. Another way is to try and adjust for genes that oscillate, for example throughout the cell-cycle.