Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: CHESS 3: an improved, comprehensive catalog of human genes and transcripts based on large-scale expression data, phylogenetic analysis, and protein structure

Fig. 1

Computational pipeline used to create CHESS 3. First, 9814 GTEx samples were aligned with HISAT2. Second, the alignments were either directly assembled with StringTie2 or aggregated by tissue with TieBrush. StringTie2’s resulting transcripts were merged and compared to the reference annotation using gffcompare. Low coverage alignments in the “TieBrush”-ed files were filtered out, and the remaining alignments were assembled with StringTie2. Only transcripts that were assembled directly from the individual samples and from “TieBrush”-ed files were retained, and further filtered with an intron classifier designed to recognize introns that resemble most the introns in the reference annotation. ORFanage [18] and ColabFold were used to assign and score ORFs to protein-coding transcripts, and pLDDT scores produced by ColabFold were used to filter out low-scoring protein-coding transcripts

Back to article page