How to Analyze CRISPR Screen Data to Find Targets

Scientist.com
Approved Supplier
×

Subscribe Us

By subscription, you consent to allow Ubigene Biosciences to store and process the information provided above to deliver the latest news, research spotlight, and promotions. You can unsubscribe from these communications at anytime.

Location:Home > Application > Expert Insights | How to Analyze CRISPR Library Data to Find Targets

Published on: February 24, 2025

Expert Insights|How to Analyze CRISPR Library Data to Find Targets

CRISPR screening data analysis

CRISPR screening is a high-throughput gene screening method based on the CRISPR/Cas9 system. After constructing a cell pool transduced by the library with multiple sgRNAs, target cells are enriched under specific conditions, and then NGS sequencing and bioinformatics analysis are used to identify phenotype-related target genes. Many people may already understand the principles and procedures of CRISPR library screening, but still have many questions about the analysis of screening results and the identification of target genes. Today, we will systematically introduce the process of CRISPR library analysis and answer your questions.

CRISPR screeninng Analysis Process

1. Quality Control of Sequencing Data

The raw sequencing files (raw reads) obtained from NGS sequencing contain some low-quality reads with adapters. To ensure the quality of the analysis, it is necessary to filter the raw reads to obtain clean reads. Subsequently, the quality of the sequencing data is assessed based on Q20 and Q30. Typically, if Q20 > 90% or Q30 > 85% (Figure 1), the sequencing data is considered as qualified. If the values are below these thresholds, it indicates low sequencing quality and high data error rates, and re-sequencing is required.

Sequencing Data Quality Assessment
Figure 1: Sequencing Data Quality Assessment

2. Data Alignment to the Corresponding sgRNA Library

Due to factors such as sgRNA library quality, mutations introduced in NGS library construction and sequencing, some sequences in the clean reads cannot be matched to the corresponding sgRNA library. To ensure the effectiveness of the analysis, it is necessary to align the clean reads that can be matched to the sgRNA library to obtain valid data (mapped reads) from the CRISPR library screening results. To ensure the accuracy and reliability of the sequencing results, the sequencing depth (mean depth) of the mapped reads should be evaluated, with a recommended sequencing depth of over 300x (sequencing depth = mapped reads/number of sgRNAs).

sgRNA Sequencing Depth Analysis
Figure 2: sgRNA Sequencing Depth Analysis

3. Differential Gene Analysis

For CRISPR library screening results, the RRA (Robust Rank Aggregation) algorithm in the MAGeCK software [1,2] is typically used to analyze sgRNAs in the experimental and control groups to identify differential genes. As a comprehensive ranking algorithm, RRA scores and ranks each gene. The lower the RRA score, the higher the ranking, indicating a higher likelihood that the gene is a target gene. Additionally, in the bioinformatics analysis, both positive and negative screening results are analyzed. Positive screening results indicate that the gene is significantly enriched in the experimental group, while negative screening results indicate that the gene is significantly depleted in the experimental group.

Analysis Results of RRA Algorithm
Figure 3: Analysis Results of RRA Algorithm

4. Enrichment Analysis

The identified target genes are further subjected to GSEA enrichment analysis (Figure 4) and GO enrichment analysis (Figure 5) to reveal the signaling pathways targeted by the enriched or depleted genes.

GSEA Enrichment Analysis
Figure 4: GSEA Enrichment Analysis

GO Enrichment Analysis
Figure 5: GO Enrichment Analysis

CRISPR Library Potential Target Analysis Methods

As a large-scale gene screening method, CRISPR libraries inevitably produce some false-positive results. Therefore, during the target gene screening process, it is recommended to select multiple genes as candidate genes and verify them through downstream experiments.

1. To Find Targets through RRA Algorithm Rank

RRA algorithm ranking screens target gene Cop1 [3]
Figure 6: RRA algorithm ranking screens target gene Cop1 [3]

As mentioned earlier, the CRISPR library screening results are usually analyzed using the RRA algorithm. The higher the ranking of a gene, the greater the likelihood that it is a target gene. If it is not possible to effectively identify the target gene, the top 20 or 30 genes can be selected as candidate genes and verified through downstream gene knockout or overexpression experiments. For example, Liu et al. identified the target gene Cop1 through RRA algorithm ranking [3].

2. Screen by p-value, FDR, and LFC values

Firstly, as we know that FDR = Q value = adjusted p-value. The p-value reflects the probability of finding a significant difference between the experimental and control groups for a particular gene, while FDR represents the false discovery rate, i.e., the proportion of false discoveries among all findings. Simply put, when p-value < 0.05, it indicates that the likelihood of a significant difference between the experimental and control groups for that gene is greater than 95%, and when FDR < 0.05, it indicates that the likelihood of the aforementioned judgment being true is greater than 95%.

Typically, genes screened using FDR < 0.05 are more likely to be target genes. However, due to the large number of genes screened in the library, a single gene's p-value usually needs to be less than 1*10^-7 to achieve FDR < 0.05. Screening solely based on FDR often leads to the omission of many true positive genes. Therefore, in the vast majority of library screening cases, p-value rather than FDR is used to screen target genes.

LFC represents the fold change in sgRNA between the experimental and control groups. When LFC > 1, it means that the number of sgRNAs targeting a specific gene in the experimental group is twice that of the control group. When LFC > 2, it means that the number of sgRNAs targeting that gene in the experimental group is four times that of the control group, and so on.

In addition to the ranking method mentioned above for screening target genes, researchers can also combine p-value and LFC to screen for potential target genes. For example, Guo et al. identified the target gene CDC7 using the conditions p < 0.01 and LFC ≤ -2 [4].

Screening Target Gene CDC7 [4] by Combining p-value and LFC Conclusion
Figure 7: Screening Target Gene CDC7 [4] by Combining p-value and LFC Conclusion

We hope that today's introduction to the CRISPR library analysis process will help eliminate some of the doubts you may have regarding the analysis of screening results and the identification of target genes. If you have any further questions during the actual operation, feel free to communicate with us at any time.

Ubigene’s One-stop CRISPR Screen Service, start at 8K USD

Paired with 400+ Premade Library Cell Pools, fast as 8 wks to screen targets

Screening-ready Library Cell Pools are available now, from $2290
Inquire now by clicking the 'Contact Us' button on the right.>>>

References

[1]Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics. 2012 Feb 15;28(4):573-80.

[2]Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, Liu XS. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15(12):554.

[3]Wang X, Tokheim C, Gu SS, Wang B, Tang Q, Li Y, Traugh N, Zeng Z, Zhang Y, Li Z, Zhang B, Fu J, Xiao T, Li W, Meyer CA, Chu J, Jiang P, Cejas P, Lim K, Long H, Brown M, Liu XS. In vivo CRISPR screens identify the E3 ligase Cop1 as a modulator of macrophage infiltration and cancer immunotherapy target. Cell. 2021 Oct 14;184(21):5357-5374.e22.

[4]Deng L, Yang L, Zhu S, Li M, Wang Y, Cao X, Wang Q, Guo L. Identifying CDC7 as a synergistic target of chemotherapy in resistant small-cell lung cancer via CRISPR/Cas9 screening. Cell Death Discov. 2023 Feb 2;9(1):40. 

Get the same cited cell lines

CRISPR-iScreen™ Cell Pool

  • (hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293)

    hGeCKO(Brunello) Library#1 in HEK293(hGeCKO(Brunello) Library#1 in HEK293)

    Catalog#: LIBR-H001A-C300D154
    Size: 300x
    Instruction: hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293
  • (hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293)

    hGeCKO(Brunello) Library#1 in HEK293(hGeCKO(Brunello) Library#1 in HEK293)

    Catalog#: LIBR-H001A-C300D154
    Size: 300x
    Instruction: hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293

Related service

CRISPR screening library has become the preferred platform for large-scale gene function screening benefited from the characteristics of CRISPR/Cas9 such as versatility, low noise, high knockout efficiency, and less off-target effect.

Expert Insights | How to Analyze CRISPR Library Data to Find Targets

Published on: February 24, 2025

Expert Insights|How to Analyze CRISPR Library Data to Find Targets

CRISPR screening data analysis

CRISPR screening is a high-throughput gene screening method based on the CRISPR/Cas9 system. After constructing a cell pool transduced by the library with multiple sgRNAs, target cells are enriched under specific conditions, and then NGS sequencing and bioinformatics analysis are used to identify phenotype-related target genes. Many people may already understand the principles and procedures of CRISPR library screening, but still have many questions about the analysis of screening results and the identification of target genes. Today, we will systematically introduce the process of CRISPR library analysis and answer your questions.

CRISPR screeninng Analysis Process

1. Quality Control of Sequencing Data

The raw sequencing files (raw reads) obtained from NGS sequencing contain some low-quality reads with adapters. To ensure the quality of the analysis, it is necessary to filter the raw reads to obtain clean reads. Subsequently, the quality of the sequencing data is assessed based on Q20 and Q30. Typically, if Q20 > 90% or Q30 > 85% (Figure 1), the sequencing data is considered as qualified. If the values are below these thresholds, it indicates low sequencing quality and high data error rates, and re-sequencing is required.

Sequencing Data Quality Assessment
Figure 1: Sequencing Data Quality Assessment

2. Data Alignment to the Corresponding sgRNA Library

Due to factors such as sgRNA library quality, mutations introduced in NGS library construction and sequencing, some sequences in the clean reads cannot be matched to the corresponding sgRNA library. To ensure the effectiveness of the analysis, it is necessary to align the clean reads that can be matched to the sgRNA library to obtain valid data (mapped reads) from the CRISPR library screening results. To ensure the accuracy and reliability of the sequencing results, the sequencing depth (mean depth) of the mapped reads should be evaluated, with a recommended sequencing depth of over 300x (sequencing depth = mapped reads/number of sgRNAs).

sgRNA Sequencing Depth Analysis
Figure 2: sgRNA Sequencing Depth Analysis

3. Differential Gene Analysis

For CRISPR library screening results, the RRA (Robust Rank Aggregation) algorithm in the MAGeCK software [1,2] is typically used to analyze sgRNAs in the experimental and control groups to identify differential genes. As a comprehensive ranking algorithm, RRA scores and ranks each gene. The lower the RRA score, the higher the ranking, indicating a higher likelihood that the gene is a target gene. Additionally, in the bioinformatics analysis, both positive and negative screening results are analyzed. Positive screening results indicate that the gene is significantly enriched in the experimental group, while negative screening results indicate that the gene is significantly depleted in the experimental group.

Analysis Results of RRA Algorithm
Figure 3: Analysis Results of RRA Algorithm

4. Enrichment Analysis

The identified target genes are further subjected to GSEA enrichment analysis (Figure 4) and GO enrichment analysis (Figure 5) to reveal the signaling pathways targeted by the enriched or depleted genes.

GSEA Enrichment Analysis
Figure 4: GSEA Enrichment Analysis

GO Enrichment Analysis
Figure 5: GO Enrichment Analysis

CRISPR Library Potential Target Analysis Methods

As a large-scale gene screening method, CRISPR libraries inevitably produce some false-positive results. Therefore, during the target gene screening process, it is recommended to select multiple genes as candidate genes and verify them through downstream experiments.

1. To Find Targets through RRA Algorithm Rank

RRA algorithm ranking screens target gene Cop1 [3]
Figure 6: RRA algorithm ranking screens target gene Cop1 [3]

As mentioned earlier, the CRISPR library screening results are usually analyzed using the RRA algorithm. The higher the ranking of a gene, the greater the likelihood that it is a target gene. If it is not possible to effectively identify the target gene, the top 20 or 30 genes can be selected as candidate genes and verified through downstream gene knockout or overexpression experiments. For example, Liu et al. identified the target gene Cop1 through RRA algorithm ranking [3].

2. Screen by p-value, FDR, and LFC values

Firstly, as we know that FDR = Q value = adjusted p-value. The p-value reflects the probability of finding a significant difference between the experimental and control groups for a particular gene, while FDR represents the false discovery rate, i.e., the proportion of false discoveries among all findings. Simply put, when p-value < 0.05, it indicates that the likelihood of a significant difference between the experimental and control groups for that gene is greater than 95%, and when FDR < 0.05, it indicates that the likelihood of the aforementioned judgment being true is greater than 95%.

Typically, genes screened using FDR < 0.05 are more likely to be target genes. However, due to the large number of genes screened in the library, a single gene's p-value usually needs to be less than 1*10^-7 to achieve FDR < 0.05. Screening solely based on FDR often leads to the omission of many true positive genes. Therefore, in the vast majority of library screening cases, p-value rather than FDR is used to screen target genes.

LFC represents the fold change in sgRNA between the experimental and control groups. When LFC > 1, it means that the number of sgRNAs targeting a specific gene in the experimental group is twice that of the control group. When LFC > 2, it means that the number of sgRNAs targeting that gene in the experimental group is four times that of the control group, and so on.

In addition to the ranking method mentioned above for screening target genes, researchers can also combine p-value and LFC to screen for potential target genes. For example, Guo et al. identified the target gene CDC7 using the conditions p < 0.01 and LFC ≤ -2 [4].

Screening Target Gene CDC7 [4] by Combining p-value and LFC Conclusion
Figure 7: Screening Target Gene CDC7 [4] by Combining p-value and LFC Conclusion

We hope that today's introduction to the CRISPR library analysis process will help eliminate some of the doubts you may have regarding the analysis of screening results and the identification of target genes. If you have any further questions during the actual operation, feel free to communicate with us at any time.

Ubigene’s One-stop CRISPR Screen Service, start at 8K USD

Paired with 400+ Premade Library Cell Pools, fast as 8 wks to screen targets

Screening-ready Library Cell Pools are available now, from $2290
Inquire now by clicking the 'Contact Us' button on the right.>>>

References

[1]Kolde R, Laur S, Adler P, Vilo J. Robust rank aggregation for gene list integration and meta-analysis. Bioinformatics. 2012 Feb 15;28(4):573-80.

[2]Li W, Xu H, Xiao T, Cong L, Love MI, Zhang F, Irizarry RA, Liu JS, Brown M, Liu XS. MAGeCK enables robust identification of essential genes from genome-scale CRISPR/Cas9 knockout screens. Genome Biol. 2014;15(12):554.

[3]Wang X, Tokheim C, Gu SS, Wang B, Tang Q, Li Y, Traugh N, Zeng Z, Zhang Y, Li Z, Zhang B, Fu J, Xiao T, Li W, Meyer CA, Chu J, Jiang P, Cejas P, Lim K, Long H, Brown M, Liu XS. In vivo CRISPR screens identify the E3 ligase Cop1 as a modulator of macrophage infiltration and cancer immunotherapy target. Cell. 2021 Oct 14;184(21):5357-5374.e22.

[4]Deng L, Yang L, Zhu S, Li M, Wang Y, Cao X, Wang Q, Guo L. Identifying CDC7 as a synergistic target of chemotherapy in resistant small-cell lung cancer via CRISPR/Cas9 screening. Cell Death Discov. 2023 Feb 2;9(1):40. 

Get the same cited cell lines

CRISPR-iScreen™ Cell Pool

  • (hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293)

    hGeCKO(Brunello) Library#1 in HEK293(hGeCKO(Brunello) Library#1 in HEK293)

    Catalog#: LIBR-H001A-C300D154
    Size: 300x
    Instruction: hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293
  • (hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293)

    hGeCKO(Brunello) Library#1 in HEK293(hGeCKO(Brunello) Library#1 in HEK293)

    Catalog#: LIBR-H001A-C300D154
    Size: 300x
    Instruction: hGeCKO(Brunello) Library#1 in HEK293hGeCKO(Brunello) Library#1 in HEK293

Related service

CRISPR screening library has become the preferred platform for large-scale gene function screening benefited from the characteristics of CRISPR/Cas9 such as versatility, low noise, high knockout efficiency, and less off-target effect.
More details
×

Search documentation

Literature:

Name: *

Company: *

Telephone: *

Email:

Notes: After submitting the order, we will contact you as soon as possible.

request now

Contact us

Contact us

If we are unable to reach you via email, how else can we contact you?
Submission Succeeded
×

Search documentation

Literature:

Name: *

Company: *

Telephone: *

Email:

Notes: After submitting the order, we will contact you as soon as possible.

request now