| Type: | Package | 
| Title: | A Modified Fisher’s Method to Test Overall Gene-Level Effect | 
| Version: | 1.0 | 
| Author: | Qi Yan | 
| Maintainer: | Qi Yan <qiy17@pitt.edu> | 
| Description: | The separate p-values of SNPs, RNA expressions and DNA methylations are calculated by KM regression. The correlation between different omics data are taken into account. This method can be applied to either samples with all three types of omics data or samples with two types. | 
| License: | GPL-2 | GPL-3 [expanded from: GPL (≥ 2)] | 
| Encoding: | UTF-8 | 
| LazyData: | true | 
| Depends: | CompQuadForm, stringr | 
| Imports: | survey | 
| RoxygenNote: | 6.0.0 | 
| Collate: | 'OmnibusFisher_iteration_GM.R' 'OmnibusFisher_iteration_GMR.R' 'OmnibusFisher_iteration_GR.R' 'OmnibusFisher_iteration_MR.R' 'OmnibusFisher_outer.R' 'example_data.R' | 
| Packaged: | 2018-12-10 15:52:18 UTC; qiyan | 
| NeedsCompilation: | no | 
| Repository: | CRAN | 
| Date/Publication: | 2018-12-19 08:30:03 UTC | 
A modified Fisher’s method (Omnibus-Fisher) to combine separate p-values of SNPs, RNA expressions and DNA methylations into an overall gene-level p-value
Description
* Each sample does not have to have all three types of omics data; each gene needs to have all three types of omics data mapped to it. 
* For example, 1,000 samples have SNPs mapped to 20,000 genes; 500 samples have methylated sites mapped to 18,000 genes; 300 samples have 16,000 expressed genes. 
* Then, all 1,000 samples (500 and 300 samples are included in the total 1,000 samples) contribute to the test. We are interested in the overlapped genes (e.g., 16,000 genes with SNPs, methylated sites and expression genes mapped to them)
Usage
OmnibusFisher(pheno, full_id, G = NULL, M = NULL, R = NULL,
  exprs_G = NULL, exprs_M = NULL, exprs_R = NULL, type, optimal = FALSE,
  perturb_iteration = NULL, method = "kuonen")
Arguments
| pheno | A matrix of sample ID, trait (i.e., y) and covariates (class: data.frame). | 
| full_id | A vector of sample ID. This vector should include all IDs. In other words, samples with all 3 types of omics data, with 2 types and with 1 type should have their IDs in the vector (class: data.frame). | 
| G | A matrix of genotypes in a gene. The 1st column is sample ID. Each column is a SNP in the following columns (class: data.frame). | 
| M | A matrix of methylated sites in a gene. The 1st column is sample ID. Each column is a methylated site in the following columns (class: data.frame). | 
| R | A matrix of RNA expression probes in a gene (for microarray, one gene could have multiple probes mapped; for RNAseq, one gene always has one value). The 1st column is sample ID. For microarray, each column is a probe in the following columns; for RNAseq, the 2nd column is the expression value (only two columns) (class: data.frame). | 
| exprs_G | Regression model for SNPs under the null hypothesis (i.e., SNP effect is zero), y = cov1 + cov2 + ... + covp. | 
| exprs_M | Regression model for DNA methylation under the null hypothesis (i.e., methylation effect is zero), y = cov1 + cov2 + ... + covp. | 
| exprs_R | Regression model for RNA expression under the null hypothesis (i.e., RNA expression effect is zero), y = cov1 + cov2 + ... + covp. | 
| type | Either type="binary" or type="continuous" for binary or continuous traits. | 
| optimal | Whether use optimal method to automatically search for the disease model (perturbation invovled). | 
| perturb_iteration | The number of perturbation iterations when using optimal method. For example, 1,000,000, then the lowest p-value can be obtained for perturbation method is 1/1,000,001. | 
| method | Method used to approximately calculate p-values: "kuonen" or "davies". Default "kuonen". | 
Value
1. pval_GMR_pert:   the overall gene-level p-value automatically searching for the optimal disease model, when inputting three types of data. 
2. pval_GMR_tri:   the overall gene-level p-value assuming all three types of data in the disease model, when inputting three types of data. 
3. pval_GM_tri/pval_GR_tri/pval_MR_tri:   the overall gene-level p-value automatically searching for the optimal disease model, when inputting two types of data. 
4. pval_GM_tri/pval_GR_tri/pval_MR_tri:   the overall gene-level p-value assuming two types of data in the disease model, when inputting two types of data. 
5. pval_G/pval_M/pval_R:   the gene-level p-value for single type of data. 
Examples
################
### Examples ###
################
data("example_data")
set.seed(123)
exprs_G = exprs_M = exprs_R = "aff ~ age + sex"
### SNPs (G), DNA methylations (M) and RNA expressions (R) ###
results<-list()
for(i in 1:1){ #change to 1:3 for 3 genes
 results[[i]]<-OmnibusFisher(pheno=pheno, full_id=All_header, G=G[[i]], M=M[[i]],
 R=R[[i]], exprs_G=exprs_G, exprs_M=exprs_M, exprs_R=exprs_R, type="binary")
 # G[[1]] includes SNPs in gene1;
 # M[[1]] includes methylated sites in gene1;
 # R[[1]] includes gene expression probes in gene1 (or single gene1 expression value).
}
### SNPs (G) and DNA methylations (M) ###
results<-list()
for(i in 1:1){
 results[[i]]<-OmnibusFisher(pheno=pheno, full_id=All_header, G=G[[i]], M=M[[i]],
 exprs_G=exprs_G, exprs_M=exprs_M, type="binary")
}
### SNPs (G) and RNA expressions (R) ###
# results[[i]]<-OmnibusFisher(pheno=pheno, full_id=All_header, G=G[[i]], R=R[[i]],
# exprs_G=exprs_G, exprs_R=exprs_R, type="binary")
### DNA methylations (M) and RNA expressions (R) ###
# results[[i]]<-OmnibusFisher(pheno=pheno, full_id=All_header, R=R[[i]], M=M[[i]],
# exprs_R=exprs_R, exprs_M=exprs_M, type="binary")
This is the data for examples
Description
- All_header. ID for all samples that are available in at least one type of data 
- pheno. phenotype file. 1st column is ID, 2nd column is disease status, 3rd column is age, 4th column is gender 
- G. genotypes for 3 genes. G[[i]] is for the ith genes. In each G[[i]], the 1st column is ID and the rest columns are genotypes 
- M. methylated sites 3 genes. M[[i]] is for the ith genes. In each M[[i]], the 1st column is ID and the rest columns are methylated sites 
- R. RNA expression for 3 genes. R[[i]] is for the ith genes. In each R[[i]], the 1st column is ID and the rest columns (usually one column) are RNA expression 
Usage
data(example_data)