Introduction of ‘geneNR’

R Markdown

Introduction:

The geneNR package is designed to streamline the post-GWAS (Genome-Wide Association Studies) and QTL (Quantitative Trait Loci) analysis by automating the identification of candidate genes within a user-defined search window based on the identified SNPs (Single Nucleotide Polymorphisms)or QTLs. This package provides robust support for candidate gene analysis specifically for wheat and rice, making it an invaluable tool for researchers working in the field of genomics.

Key Features:

Automated Search and Retrieval: geneNR simplifies the labor-intensive process of manually searching and retrieving candidate genes. By leveraging the package, researchers can save time and focus on interpreting results rather than conducting repetitive search.

User-Defined Search Window: The package allows users to define their own search window parameters, providing flexibility and customization to suit various research needs.

Support for Wheat and Rice: With a focus on two of the most important staple crops, wheat and rice, geneNR ensures targeted and relevant candidate gene analysis, aiding in crop improvement and genetic research efforts.

How It Works:

Importing GWAS Results: Users begin by importing their GWAS or QTL results into the geneNR package. The package includes detailed instructions and a sample_data file to guide users through this process.

Identification of Candidate Genes: Once the GWAS or QTL results are imported, geneNR identifies candidate genes within the specified search window based on the SNPs provided. This automated process not only eliminates the need for manual searches but also significantly reduces the time required, ensuring both accuracy and efficiency.

Exporting Results: After the candidate genes are identified, the package exports the results into a ready-to-use output. This format is convenient for further analysis, reporting, and sharing with collaborators.

Installation:

To install geneNR package, use the following commands in R:

The geneNR package is currently available in CRAN. Users can install the package directly from CRAN using the following command in R:

# Install the geneNR package (when published on CRAN)
install.packages("geneNR")

or 
# Installation from GitHub (if not available on CRAN)
# Uncomment and run the following commands manually:
# if (!requireNamespace("devtools", quietly = TRUE)) {
#   install.packages("devtools")
# }
# devtools::install_github("Nirmalaruban/geneNR")

Note:

As the geneNR package has now been officially published on CRAN, users are kindly requested to duly cite the package in their studies. Proper citation ensures acknowledgment of the efforts behind its development and facilitates further usage by the research community. If users require further assistance or have any inquiries, they are encouraged to contact R. Nirmalaruban (Maintainer) at nirmalaruban97@gmail.com.

Data:

The geneNR package includes sample data for both wheat and rice to demonstrate its functionality. Users can provide their GWAS result files in .csv format to mine candidate genes.

geneQTL():

Identifies candidate genes based on QTL (Quantitative Trait Loci) analysis results. Users need to provide input data specifying QTLs, their chromosomal positions, and regions of interest.

result <- geneQTL("sample_data_wheat_qtl", crop="wheat")
result <- geneQTL("sample_data_rice_qtl", crop="rice")

Here is a sample data skeleton for QTL results to be given as input :

| traits  | Chr | start   |   stop   |
|------------------------------------|
|  PH     | 7A  |93007534 |95007534  |
|         |     |         |          |

Parameters:

The function accepts the following parameters:

data_file: Input QTL data in .csv format (which needs to contains columns as detailed above in sample_data_wheat_qtl).

Crop: Either “wheat” or “rice”.

Output A .csv file containing candidate genes retrieved for the specified QTL regions.

Output Format Example:

| traits  |  QTL  |  gene_size | gene_id | gene_type    |
|-------------------------------------------------------|
|         |       |            |         |protein coding|

geneSNP()

Identifies candidate genes based on identified SNPs from GWAS results.

result <- geneSNP("sample_data_wheat", 10000, 10000, crop = "wheat")
result <- geneSNP("sample_data_rice", 10000, 10000, crop = "rice")

Here is a sample data skeleton for GWAS results:

| traits  |   SNP       | Chr  |   Pos   |
|----------------------------------------|
|  PH     | AX-94490431 | 7A   |93007534 |
|         |             |      |         |

Parameters:

The function accepts the following parameters:

data_file: The input data in .csv format.

upstream: The search window upstream of the current position of the SNP.

downstream: The search window downstream of the current position of the SNP.

Crop: Either “wheat” or “rice”.

Output A .csv file containing candidate genes retrieved for the specified SNP regions.

Note:

Both upstream and downstream can be specified by the user in base pairs. By default, the search window is set to 10^6 bp (1 Mbp).

Output:

Upon running the function, the results will be saved in a filtered_gene_id.csv file in the current working directory. This file will contain the following columns: traits, SNP, search_window, gene_size, gene_id, and gene_type. The candidate genes for the respective regions are retrieved from Ensembl Plants (https://plants.ensembl.org/index.html).

Output Format Example:

| traits  |  SNP  |  search_window | gene_size | gene_id | gene_type    |
|-----------------------------------------------------------------------|
|         |       |                |           |         |protein coding|

geneSNPcustom()

Similar to geneSNP, this function provides enhanced customization options for identifying candidate genes (different search window can be provided by users of each SNP).

result <- geneSNPcustom("sample_data_wheat_custom", crop = "wheat")

import_hmp()

Imports Hapmap genotypic data files into a format usable by the package.

Input: Hapmap file (.hmp.txt).

Output: A processed data frame for SNP analysis.

import_vcf()

Imports Variant Call Format (VCF) files.

Input: .vcf file.

Output: A processed data frame for SNP analysis.

plot_SNP()

Visualizing SNP Distributions with `plot_SNP`

Generates a chromosome map showing SNP distributions. The map includes customization aesthetics.

summariseSNP()

Calculates and summarizes the distribution of SNPs across chromosomes based on the provided data.

##   Chr SNP_Count
## 1  1A         5
## 2  1B         8
## 3  1D         7
## 4  2A        31
## 5  2B        35
## 6  2D        11
## 7  3A         8
## 8  3B        28
## 9  3D        80

summariseSNP_vcf()

Similar to summariseSNP, this function processes data from VCF files.

plot_summariseSNP()

Plots a bar chart summarizing SNP distributions across chromosomes.

Parameters

bar_color: Color of the bars.

label_size: Size of the text labels.

label_color: Color of the text labels.

Sample Data

Sample Data for Wheat and Rice Several sample datasets for wheat and rice are included in the package. These data demonstrate the functionality of the geneNR package.

Note:

For smooth functioning of the package, please ensure that the data files are placed in the working directory.

By using the geneNR package, researchers can significantly enhance the efficiency and accuracy of their post-GWAS analyses, ultimately contributing to the advancement of genomic research and crop improvement.

Glossary:

- GWAS: Genome-Wide Association Studies
- SNP: Single Nucleotide Polymorphism
- API: Application Programming Interface
- Mbp: Megabase Pair (1,000,000 base pairs)
- CSV: Comma-Separated Values - QTL: Quantitative Trait Loci

Introduction of ‘geneNR’

Rajamani Nirmalaruban

2025-02-20

R Markdown

Visualizing SNP Distributions with `plot_SNP`

Introduction of ‘geneNR’

Rajamani Nirmalaruban

2025-02-20

R Markdown

Visualizing SNP Distributions with plot_SNP

Visualizing SNP Distributions with `plot_SNP`