RHybridFinder is a package for the analysis of Mass spectrometry (MS) for the discovery of putative hybrid peptides. For the analysis of your sample, please note that the proposed workflow in the context of this package consists of two major steps:
After installing the package and in order to be able to use the package, it has to be loaded
library(RHybridFinder)
For demonstration purposes, the data showcased in this vignette, which is also available in the package (denovo sequencing and database search results, in .csv format) is from HLA Ligand Atlas (Human liver, Autonomous Donor 17) (Marcu et al., 2020). In order to download the human proteome database .fasta file, please visit the uniProt website.
In order to access the example denovo sequencing results and database search results through the package:
#retrieve the denovo sequencing results for the example data
data(package="RHybridFinder", denovo_Human_Liver_AUTD17)
#retrieve the database search results for the example data
data(package="RHybridFinder", db_Human_Liver_AUTD17)
The RAW Mass Spectrometry (MS) Files for the dataset are provided by the authors on Proteomics Identifications Database (PRIDE): PXD019643
The step 1 consists of running the HybridFinder function.
The HybridFinder function is based on the workflow proposed by Faridi et al. (2018), with some modifications. Whereby, while using denovo sequencing results with database search results and the proteome database, HybridFinder extracts High confidence denovo peptides and then goes through a 3-step search of these into the proteome. If peptide sequences are matched fully within proteins then they are considered as being “Linear” and Linear peptides within a given spectrum are filtered based on the highest ALC (Average Local Confidence: a significance score for the sequence) score. The rest of the spectra go through the second step during which lists of pair fragments from each peptide sequence are created and then searched in the proteome database, if pair fragments are matched within one protein, these are considered to be potentially cis-spliced. Then, only the highest ALC peptides from each spectrum group are kept. The rest of the spectra goes through the last step, which consists of searching for pair combination matches within two proteins, those that match are considered as being potentially trans-spliced, and only the highest ALC peptides within each spectrum group are kept. Finally, the list of hybrid candidates are concatenated into different ‘fake’ proteins, with the goal being the creation of a hybrid proteome which would mimic the actual proteome. And this hybrid proteome is merged with the reference proteome.
In order to run HybridFinder, three inputs must be provided to HybridFinder
Please note that it is recommended to have a folder structure that looks as follows, as it helps keep all results organized:
<- file.path("./data/Human_Liver_AUTD17")
folder_Human_Liver_AUTD17 <- read.csv(file.path(folder_Human_Liver_AUTD17, "first_run","all de novo candidates.csv"), sep=",", head=TRUE,stringsAsFactors = FALSE)
denovo_Human_Liver_AUTD17 <- read.csv(file.path(folder_Human_Liver_AUTD17, "first_run","DB search psm.csv"), sep=",", head=TRUE,stringsAsFactors = FALSE)
db_Human_Liver_AUTD17 <- file.path(folder_Human_Liver_AUTD17, "uniprot-proteome-human_UP000005640-reviewed_validated.fasta") proteome_Human_Liver_AUTD17
Once the inputs are loaded, running HybridFinder is a piece of cake. Please note that the HybridFinder function can use parallel computing in order to obtain results fast. It will be good to make sure whether the PC used can support that.
It is possible to set the amount of cores (customCores) for the HybridFinder function to run (given that these are >5). Additionally, it is possible to set a custom ALC cutoff (through the customALCcutoff parameter), setting this allows to filter unassigned spectra based on the newly set custom ALC cutoff (instead of it being calculated). The minimum customALCcutoff score that can be set is 85. Anything set lower than 85 will be set at 85.
<- HybridFinder(denovo_candidates = denovo_Human_Liver_AUTD17, db_search = db_Human_Liver_AUTD17, proteome_db = proteome_Human_Liver_AUTD17, customALCcutoff = NULL, with_parallel = FALSE, customCores = 8, export_files = TRUE, export_dir=folder_Human_Liver_AUTD17) results_HybridFinder_Human_Liver_AUTD17
The function returns a list composed of 3 elements
#display HybridFinder(HF) step1 output
print(head(results_HybridFinder_Human_Liver_AUTD17[[1]]))
#> Fraction Scan m/z RT Peptide Length Potential_spliceType ALC
#> 394 3 F3:8061 584.3079 45.51 SLVMTQTPKF 10 trans 86
#> 23 1 F1:16116 575.7930 72.95 SYLEHLFEL 9 Linear 95
#> 375 3 F3:15872 558.8188 70.76 LPVDLATFQL 10 trans 83
#> 130 3 F3:13743 533.3030 64.36 SYLLRPVAF 9 Linear 79
#> 175 1 F1:11624 531.2999 59.30 YPEDKLLLA 9 cis 78
#> 15 1 F1:15777 575.7930 72.95 SYLEHLFEL 9 Linear 92
#> proteome_database_used
#> 394 human_proteome_dropbox_for_HF_validation.fasta
#> 23 human_proteome_dropbox_for_HF_validation.fasta
#> 375 human_proteome_dropbox_for_HF_validation.fasta
#> 130 human_proteome_dropbox_for_HF_validation.fasta
#> 175 human_proteome_dropbox_for_HF_validation.fasta
#> 15 human_proteome_dropbox_for_HF_validation.fasta
#display list of candidate hybrid peptides
print(head(results_HybridFinder_Human_Liver_AUTD17[[2]]))
#> [1] "YPEDKLLLA" "ANAVLARFY" "YNLPWLENL" "LLYYALMPY" "LPVDLQRYL" "MEDLLKLLA"
#display the merged proteome
print(tail(results_HybridFinder_Human_Liver_AUTD17[[3]]))
#> $`sp|Q8TDM6-2|DLG5_HUMAN`
#> [1] "MEPQRRELLAQCQQSLAQAMTEVEAVLGLLEAAGALSPGERRQLDEEAGGAKAELLLKLLLAKERDHFQDLRAALEKTQPHLLPLLYLNGVVGPPQPAEGAGSTYSVLSTMPSDSESSSSLSSVGTTGKAPSPPPLLTDQQVNEKVENLSLQLRLMTRERNELRKRLAFATHGTAFDKRPYHRLNPDYERLKLQCVRAMSDLQSLQNQHTNALKRCEEVAKETDFYHTLHSRLLSDQTRLKDDVDMLRRENGQLLRERNLLQQSWEDMKRLHEEDQKELGDLRAQQQQVLKHNGSSELLNKLYDTAMDKLEVVKKDYDALRKRYSEKVALHNADLSRLEQLGEENQRLLKQTEMLTQQRDTALQLQHQCALSLRRFEALHHELNKATAQNKDLQWEMELLQSELTELRTTQVKTAKESEKYREERDAVYSEYKLLMSERDQVLSELDKLQTEVELAESKLKSSTSEKKAANEEMEALRQLKDTVTMDAGRANKEVELLRKQCKALCQELKEALQEADVAKCRRDWAFQERDKLVAERDSLRTLCDNLRRERDRAVSELAEALRSLDDTRKQKNDVSRELKELKEQMESQLEKEARFRQLMAHSSHDSALDTDSMEWETEVVEFERETEDLDLKALGFDMAEGVNEPCFPGDCGLFVTKVDKGSLADGRLRVNDWLLRLNDVDLLNKDKKQALKALLNGEGALNMVVRRRKSLGGKVVTPLHLNLSGQKDSGLSLENGVYAAAVLPGSPAAKEGSLAVGDRLVALNGLALDNKSLNECESLLRSCQDSLTLSLLKEQKCVPASGELSPELQEWAPYSPGHSSRHSNPPLYPSRPSVGTVPRSLTPSTTVSSLLRNPLYTVRSHRVGPCSSPPAARDAGPQGLHPSVQHQGRLSLDLSHRTCSDYSEMRATHGSNSLPSSARLGSSSNLQFKAERLKLPSTPRYPRSVVGSERGSVSHSECSTPPQSPLNLDTLSSCSQSQTSASTLPRLAVNPASLGERRKDRPYVEEPRHVKVQKGSEPLGLSLVSGEKGGLYVSKVTVGSLAHQAGLEYGDQLLEFNGLNLRSATEQQARLLLGQQCDTLTLLAQYNPHVHQLSSHSRSSSHLDPAGTHSTLQGSGTTTPEHPSVLDPLMEQDEGPSTPPAKQSSSRLAGDANKKTLEPRVVFLKKSQLELGVHLCGGNLHGVFVAEVEDDSPAKGPDGLVPGDLLLEYGSLDVRNKTVEEVYVEMLKPRDGVRLKVQYRPEEFTKAKGLPGDSFYLRALYDRLADVEQELSFKKDDLLYVDDTLPQGTFGSWMAWQLDENAQKLQRGQLPSKYVMDQEFSRRLSMSEVKDDNSATKTLSAAARRSFFRRKHKHKRSGSKDGKDLLALDAFSSDSLPLFEDSVSLAYQRVQKVDCTALRPVLLLGPLLDVVKEMLVNEAPGKFCRCPLEVMKASQQALERGVKDCLFVDYKRRSGHFDVTTVASLKELTEKNRHCLLDLAPHALERLHHMHLYPLVLFLHYKSAKHLKEQRDPLYLRDKVTQRHSKEQFEAAQKLEQEYSRYFTGVLQGGALSSLCTQLLAMVNQEQNKVLWLPACPL"
#> attr(,"name")
#> [1] "sp|Q8TDM6-2|DLG5_HUMAN"
#> attr(,"Annot")
#> [1] ">sp|Q8TDM6-2|DLG5_HUMAN Isoform 2 of Disks large homolog 5 OS=Homo sapiens OX=9606 GN=DLG5"
#> attr(,"class")
#> [1] "SeqFastaAA"
#>
#> $`sp|Q8TDM6-3|DLG5_HUMAN`
#> [1] "MEPQRRELLAQCQQSLAQAMTEVEAVLGLLEAAGALSPGERRQLDEEAGGAKAELLLKLLLAKERDHFQDLRAALEKTQPHLLPLLYLNGVVGPPQPAEGAGSTYSVLSTMPSDSESSSSLSSVGTTGKELKEQMESQLEKEARFRQLMAHSSHDSALDTDSMEWETEVVEFERETEDLDLKALGFDMAEGVNEPCFPGDCGLFVTKVDKGSLADGRLRVNDWLLRLNDVDLLNKDKKQALKALLNGEGALNMVVRRRKSLGGKVVTPLHLNLSGQKDSGLSLENGVYAAAVLPGSPAAKEGSLAVGDRLVALNGLALDNKSLNECESLLRSCQDSLTLSLLKVFPQSSSWSGQNLFENLKDSDKMLSFRAHGPEVQAHNKRNLLQHNNSTQTDLFYTDRLEDRKEPGPPGGSSSFLHKPFPGGPLQVCPQACPSASERSLSSFRSDASGDRGFGLVDVRGRRPLLPFETEVGPCGVGEASLDKADSEGSNSGGTWPKAMLSSTAVPEKLSVYKKPKQRKSLFDPNTFKRPQTPPKLDYLLPGPGPAHSPQPSKRAGPLTPPKPPRRSDSLKFQHRLETSSESEATLVGSSPSTSPPSALPPDVDPGEPMHASPPRKARVRLASSYYPEGDGDSSHLPAKKSCDEDLTSQKVDELGQKRRRPKSAPSFRPKLAPVVLPAQFLEV"
#> attr(,"name")
#> [1] "sp|Q8TDM6-3|DLG5_HUMAN"
#> attr(,"Annot")
#> [1] ">sp|Q8TDM6-3|DLG5_HUMAN Isoform 3 of Disks large homolog 5 OS=Homo sapiens OX=9606 GN=DLG5"
#> attr(,"class")
#> [1] "SeqFastaAA"
#>
#> $`sp|Q8TDM6-4|DLG5_HUMAN`
#> [1] "MPSDSESSSSLSSVGTTGKAPSPPPLLTDQQVNEKVENLSLQLRLMTRERNELRKRLAFATHGTAFDKRPYHRLNPDYERLKLQCVRAMSDLQSLQNQHTNALKRCEEVAKETDFYHTLHSRLLSDQTRLKDDVDMLRRENGQLLRERNLLQQSWEDMKRLHEEDQKELGDLRAQQQQVLKHNGSSELLNKLYDTAMDKLEVVKKDYDALRKRYSEKVALHNADLSRLEQLGEENQRLLKQTEMLTQQRDTALQLQHQCALSLRRFEALHHELNKATAQNKDLQWEMELLQSELTELRTTQVKTAKESEKYREERDAVYSEYKLLMSERDQVLSELDKLQTEVELAESKLKSSTSEKKAANEEMEALRQLKDTVTMDAGRANKEVELLRKQCKALCQELKEALQEADVAKCRRDWAFQERDKLVAERDSLRTLCDNLRRERDRAVSELAEALRSLDDTRKQKNDVSRELKELKEQMESQLEKEARFRQLMAHSSHDSALDTDSMEWETEVVEFERETEDLDLKALGFDMAEGVNEPCFPGDCGLFVTKVDKGSLADGRLRVNDWLLRLNDVDLLNKDKKQALKALLNGEGALNMVVRRRKSLGGKVVTPLHLNLSGQKDSGLSLENGVYAAAVLPGSPAAKEGSLAVGDRLVALNGLALDNKSLNECESLLRSCQDSLTLSLLKVFPQSSSWSGQNLFENLKDSDKMLSFRAHGPEVQAHNKRNLLQHNNSTQTDLFYTDRLEDRKEPGPPGGSSSFLHKPFPGGPLQVCPQACPSASERSLSSFRSDASGDRGFGLVDVRGRRPLLPFETEVGPCGVGEASLDKADSEGSNSGGTWPKAMLSSTAVPEKLSVYKKPKQRKSLFDPNTFKRPQTPPKLDYLLPGPGPAHSPQPSKRAGPLTPPKPPRRSDSLKFQHRLETSSESEATLVGSSPSTSPPSALPPDVDPGEPMHASPPRKARVRLASSYYPEGDGDSSHLPAKKSCDEDLTSQKVDELGQKRRRPKSAPSFRPKLAPVVLPAQFLEEQKCVPASGELSPELQEWAPYSPGHSSRHSNPPLYPSRPSVGTVPRSLTPSTTVSSLLRNPLYTVRSHRVGPCSSPPAARDAGPQGLHPSVQHQGRLSLDLSHRTCSDYSEMRATHGSNSLPSSARLGSSSNLQFKAERLKLPSTPRYPRSVVGSERGSVSHSECSTPPQSPLNLDTLSSCSQSQTSASTLPRLAVNPASLGERRKDRPYVEEPRHVKVQKGSEPLGLSLVSGEKGGLYVSKVTVGSLAHQAGLEYGDQLLEFNGLNLRSATEQQARLLLGQQCDTLTLLAQYNPHVHQLSSHSRSSSHLDPAGTHSTLQGSGTTTPEHPSVLDPLMEQDEGPSTPPAKQSSSRLAGDANKKTLEPRVVFLKKSQLELGVHLCGGNLHGVFVAEVEDDSPAKGPDGLVPGDLLLEYGSLDVRNKTVEEVYVEMLKPRDGVRLKVQYRPEEFTKAKGLPGDSFYLRALYDRLADVEQELSFKKDDLLYVDDTLPQGTFGSWMAWQLDENAQKLQRGQLPSKYVMDQEFSRRLSMSEVKDDNSATKTLSAAARRSFFRRKHKHKRSGSKDGKDLLALDAFSSDSLPLFEDSVSLAYQRVQKVDCTALRPVLLLGPLLDVVKEMLVNEAPGKFCRCPLEVMKASQQALERGVKDCLFVDYKRRSGHFDVTTVASLKELTEKNRHCLLDLAPHALERLHHMHLYPLVLFLHYKSAKHLKEQRDPLYLRDKVTQRHSKEQFEAAQKLEQEYSRYFTGVLQGGALSSLCTQLLAMVNQEQNKVLWLPACPL"
#> attr(,"name")
#> [1] "sp|Q8TDM6-4|DLG5_HUMAN"
#> attr(,"Annot")
#> [1] ">sp|Q8TDM6-4|DLG5_HUMAN Isoform 4 of Disks large homolog 5 OS=Homo sapiens OX=9606 GN=DLG5"
#> attr(,"class")
#> [1] "SeqFastaAA"
#>
#> $`sp|Q8TDM6-5|DLG5_HUMAN`
#> [1] "MRATHGSNSLPSSARLGSSSNLQFKAERLKLPSTPRYPRSVVGSERGSVSHSECSTPPQSPLNLDTLSSCSQSQTSASTLPRLAVNPASLGERRKDRPYVEEPRHVKVQKGSEPLGLSLVSGEKGGLYVSKVTVGSLAHQAGLEYGDQLLEFNGLNLRSATEQQARLLLGQQCDTLTLLAQYNPHVHQLSSHSRSSSHLDPAGTHSTLQGSGTTTPEHPSVLDPLMEQDEGPSTPPAKQSSSRLAGDANKKTLEPRVVFLKKSQLELGVHLCGGNLHGVFVAEVEDDSPAKGPDGLVPGDLLLEYGSLDVRNKTVEEVYVEMLKPRDGVRLKVQYRPEEFTKAKGLPGDSFYLRALYDRLADVEQELSFKKDDLLYVDDTLPQGTFGSWMAWQLDENAQKLQRGQLPSKYVMDQEFSRRLSMSEVKDDNSATKTLSAAARRSFFRRKHKHKRSGSKDGKDLLALDAFSSDSLPLFEDSVSLAYQRVQKVDCTALRPVLLLGPLLDVVKEMLVNEAPGKFCRCPLEVMKASQQALERGVKDCLFVDYKRRSGHFDVTTVASLKELTEKNRHCLLDLAPHALERLHHMHLYPLVLFLHYKSAKHLKEQRDPLYLRDKVTQRHSKEQFEAAQKLEQEYSRYFTGVLQGGALSSLCTQLLAMVNQEQNKVLWLPACPL"
#> attr(,"name")
#> [1] "sp|Q8TDM6-5|DLG5_HUMAN"
#> attr(,"Annot")
#> [1] ">sp|Q8TDM6-5|DLG5_HUMAN Isoform 5 of Disks large homolog 5 OS=Homo sapiens OX=9606 GN=DLG5"
#> attr(,"class")
#> [1] "SeqFastaAA"
#>
#> $`sp|denovo_HF_fake_protein1`
#> [1] "YPEDKLLLANYGELFEKFDYGELFEKFDYGELFQKFANAVLARFYKLADFRLLYKLADFLRLYSSYFLLEAFYNLPWLENLLSLNYCLLLLEPFLLPTLPELFLLPTLPLEFLLPTLLTTSWMSLKMEDLLKLLAMENLLKLLALHLSRLQYFLPVDLQRYLLPVNLQRYLLWDLSLTRLFPYYAPELLLLYYASNYRRFLVGSLPKFENGEWRELQLADLFRLYEHVVKVFSLLYEVLLKNFFNLPWTQRF"
#>
#> $`sp|denovo_HF_fake_protein2`
#> [1] "FEVPWTQRFDQDLRSMATALLYYASNRYLLYYASRNYLLYYAPLMYLLYYALMPYSLVMTQTPKFYKVYTSVSWMMALLTHGLVLPRRFPQFNYLPWLELNKYPSDEFVFYKPSDEFVFLLYYANSRYLLYYARSNYTVVMTQTPKFTVVMTQTPFKVVYPSMTQRFVVYPAMTQRFRYFSTSVSWYRFSTSVSWLPVDLTAFQLLPVNLATFQLLPVDLATFQLLPVNLKSLTMSYLEHLFYTSYLHELFELANVENLGYLTF"
If export is set to TRUE
and a valid directory is provided in export_dir
, then the results are exported .csv, .csv and .fasta format, respectively.
Even if the export parameters were not set at the beginning, the results returned can always be exported with the export_HybridFinder_results
function as long as as the results obtained from the HybridFinder function are stored which is also indicated in the results_list parameter of the export_HybridFinder_results
function.
After finishing this, a second database search has to be done on the raw MS however with the merged proteome (.fasta) exported from the HybridFinder function results.
The second step in RHybridFinder consists of either using checknetMHCpan
or step2_wo_netMHCpan
, while using the results from step 1 in order to retrieve for the final list of peptides which includes the hybrid candidates, their potential splice types.
the checknetMHCpan
function represents step 2 of Faridi et al. (2018)’s workflow and also features the use of netMHCpan (Jurtz et al., 2017, Reynisson et al., 2020) for obtaining the peptide-MHC-I predicted binding affinities. Please note that netMHCpan needs to be installed in order to be able to run this function. The package also contains a function that runs step2 without netMHCpan (Please refer to the step2_wo_netMHCpan
part).
In order to run checknetMHCpan, four inputs must be provided to checknetMHCpan
HybridFinder
output.<- '/usr/bin/'
netmhcpan_dir
<- c("HLA-A*03:01", "HLA-A*24:02", "HLA-B*35:03", "HLA-B*45:01", "HLA-C*04:01", "HLA-C*16:01")
alleles_Human_liver_AUTD17
<- read.csv(file.path(folder_Human_Liver_AUTD17, "second_run","DB search psm.csv"), sep=",", head=TRUE,stringsAsFactors = FALSE)
db_rerun_Human_liver_AUTD17
<- results_HybridFinder_Human_Liver_AUTD17[[1]] HF_output_Human_liver_AUTD17
Once the inputs are loaded, running checknetMHCpan is easier than ABC.
<- checknetMHCpan(netmhcpan_directory = netmhcpan_dir, netmhcpan_alleles = alleles_Human_liver_AUTD17, peptide_rerun = db_rerun_Human_liver_AUTD17, HF_step1_output = HF_output_Human_liver_AUTD17, export_files = TRUE, export_dir=folder_Human_Liver_AUTD17) results_checknetMHCpan_Human_Liver_AUTD17
The function returns a list composed of 3 elements: - the netMHCpan results in long format, that is the binding affinity results are displayed for each peptide with a given allele from those chosen. - the netMHCpan results in wide format, that is the binding affinity levels per peptide summarized for all HLA alleles chosen. - the database results with the respective potential splice types retrieved from step 1
#display netmhcpan output(long version)
print(head(results_checknetMHCpan_Human_Liver_AUTD17[[1]]))
#> Pos HLA Peptide Core Of Gp Gl Ip Il Icore
#> 431 1 HLA-A*03:01 VTTYPQGFKL VTTYPQGFK 0 0 0 0 0 VTTYPQGFK
#> 2846 1 HLA-C*16:01 VTTYPQGFKL VTYPQGFKL 0 1 1 0 0 VTTYPQGFKL
#> 2789 1 HLA-C*16:01 EYLLKVNEL EYLLKVNEL 0 0 0 0 0 EYLLKVNEL
#> 512 1 HLA-A*24:02 GLFEVGAGWLGK GLFAGWLGK 0 3 3 0 0 GLFEVGAGWLGK
#> 804 1 HLA-A*24:02 LPVDLATFQL LVDLATFQL 0 1 1 0 0 LPVDLATFQL
#> 1687 1 HLA-B*45:01 FEVPWTQRF FEVPWTQRF 0 0 0 0 0 FEVPWTQRF
#> Identity Score Aff(nM) %Rank BindLevel strongBinder weakBinder
#> 431 PEPLIST 0.2355170 3911.0 4.0177 Non binder
#> 2846 PEPLIST 0.3118960 1711.5 3.3500 Non binder
#> 2789 PEPLIST 0.1848720 6764.9 9.4126 Non binder
#> 512 PEPLIST 0.0277830 37018.5 36.2242 Non binder
#> 804 PEPLIST 0.0686400 23792.1 14.4521 Non binder
#> 1687 PEPLIST 0.1722820 7752.1 3.4859 Non binder
#> noneBinder Potential_spliceType
#> 431 HLA-A*03:01 Linear
#> 2846 HLA-C*16:01 Linear
#> 2789 HLA-C*16:01 Linear
#> 512 HLA-A*24:02 Linear
#> 804 HLA-A*24:02 trans
#> 1687 HLA-B*45:01 trans
#display netmhcpan output tidied version (wide)
print(head(results_checknetMHCpan_Human_Liver_AUTD17[[2]]))
#> Peptide strongBinder weakBinder
#> 799 GMEGANSLFSGF
#> 2311 VPLDEKLLPQL
#> 2017 SYNNFFRMF HLA-A*24:02,HLA-C*04:01
#> 2341 VVYPSMTQRF HLA-A*24:02 HLA-C*04:01,HLA-C*16:01
#> 1117 LFVPSYLNLF HLA-A*24:02 HLA-C*04:01
#> 529 FAAFEEPEL HLA-B*35:03,HLA-C*16:01
#> noneBinder
#> 799 HLA-A*03:01,HLA-A*24:02,HLA-B*35:03,HLA-B*45:01,HLA-C*04:01,HLA-C*16:01
#> 2311 HLA-A*03:01,HLA-A*24:02,HLA-B*35:03,HLA-B*45:01,HLA-C*04:01,HLA-C*16:01
#> 2017 HLA-A*03:01,HLA-B*35:03,HLA-B*45:01,HLA-C*16:01
#> 2341 HLA-A*03:01,HLA-B*35:03,HLA-B*45:01
#> 1117 HLA-A*03:01,HLA-B*35:03,HLA-B*45:01,HLA-C*16:01
#> 529 HLA-A*03:01,HLA-A*24:02,HLA-B*45:01,HLA-C*04:01
#> %Rank.HLA-C*16:01 %Rank.HLA-B*45:01 %Rank.HLA-B*35:03 %Rank.HLA-A*03:01
#> 799 23.1799 6.9626 27.5516 17.2915
#> 2311 25.8328 84.5494 2.6277 66.2649
#> 2017 2.6310 16.6140 20.2119 19.6749
#> 2341 1.8172 55.9481 15.6904 3.0010
#> 1117 7.2039 39.4305 13.8101 15.3994
#> 529 0.4844 67.7349 0.1084 72.6386
#> %Rank.HLA-C*04:01 %Rank.HLA-A*24:02 strongBinder_count weakBinder_count
#> 799 12.6396 8.4755 0 0
#> 2311 23.1188 37.9233 0 0
#> 2017 0.3332 0.0133 2 0
#> 2341 1.6691 0.0367 1 2
#> 1117 0.7323 0.0734 1 1
#> 529 2.8687 23.6958 2 0
#> noneBinder_count Potential_spliceType
#> 799 6 Linear
#> 2311 6 Linear
#> 2017 4 Linear
#> 2341 3 trans
#> 1117 4 Linear
#> 529 4 Linear
#display the updated database search results with the categorizations from step1
print(head(results_checknetMHCpan_Human_Liver_AUTD17[[3]]))
#> Peptide X.10lgP Mass Length ppm m.z Z RT Area
#> 1758 AELLNGKELSA 33.77 1143.6135 11 0.5 572.8143 2 40.00 114240
#> 4124 AYLERM(+15.99)NYL 32.28 1187.5645 9 0.3 594.7897 2 46.69 600890
#> 3001 VYTVVDEM(+15.99)F 35.37 1117.5001 9 0.3 559.7575 2 69.18 354900
#> 3305 YPWTQRF 23.22 996.4818 7 0.5 499.2484 2 57.02 0
#> 4095 LYQELLHYF 32.28 1224.6179 9 0.5 613.3165 2 70.50 87683
#> 5386 AYPHNLM(+15.99)TF 24.19 1108.5011 9 0.4 555.2581 2 48.85 426960
#> Fraction Id Scan from.Chimera
#> 1758 3 37741 F3:6305 No
#> 4124 1 2628 F1:7714 No
#> 3001 2 25937 F2:15065 No
#> 3305 1 14824 F1:10950 No
#> 4095 3 43871 F3:15652 No
#> 5386 1 3035 F1:8562 No
#> Source.File
#> 1758 171002_AM_BD-ZH17_Liver_W_10%_DDA_#3_400-650mz_msms6.mzML
#> 4124 171002_AM_BD-ZH17_Liver_W_10%_DDA_#1_400-650mz_msms4.mzML
#> 3001 171002_AM_BD-ZH17_Liver_W_10%_DDA_#2_400-650mz_msms5.mzML
#> 3305 171002_AM_BD-ZH17_Liver_W_10%_DDA_#1_400-650mz_msms4.mzML
#> 4095 171002_AM_BD-ZH17_Liver_W_10%_DDA_#3_400-650mz_msms6.mzML
#> 5386 171002_AM_BD-ZH17_Liver_W_10%_DDA_#1_400-650mz_msms4.mzML
#> Accession PTM AScore
#> 1758 P11586
#> 4124 P06241:P06241:P06241:P07947 Oxidation (M) M6:Oxidation (M):1000.00
#> 3001 P53680 Oxidation (M) M8:Oxidation (M):1000.00
#> 3305 P68871
#> 4095 Q14185
#> 5386 Q15629:Q15629 Oxidation (M) M7:Oxidation (M):1000.00
#> Found.By Peptide_no_mods Potential_spliceType
#> 1758 PEAKS DB AELLNGKELSA Linear
#> 4124 PEAKS DB AYLERMNYL Linear
#> 3001 PEAKS DB VYTVVDEMF Linear
#> 3305 PEAKS DB YPWTQRF Linear
#> 4095 PEAKS DB LYQELLHYF Linear
#> 5386 PEAKS DB AYPHNLMTF Linear
If export is set to TRUE
and a valid directory is provided in export_dir
, then the results are exported .csv, .tsv (tab-separated) and .csv format, respectively.
Even if the export parameters were not set at the beginning, the results returned can always be exported with the export_checknetMHCpan_results
function as long as as the results obtained from the checknetMHCpan function are stored which is also indicated in the results_list parameter of the export_checknetMHCpan_results
function.
The step2_wo_netMHCpan
, removes peptide modifications and prepare a peptide (.pep) file for use in webversion of netMHCpan, in case netMHCpan is not installed, OS is windows or the user would like to run in another software. Additionally, the function matches peptide sequences in the database search rerun (the second database search where the merged proteome was used), with the predicted splice type obtained from step 1.
The step2_wo_netMHCpan, removes peptide modifications and runs netMHCpan on peptides between 9 and 12-mers. Additionally, the function matches peptide sequences in the database search rerun (the second database search where the merged proteome was used), with predicted splice type obtained from step 1.
In order to run checknetMHCpan, four inputs must be provided to checknetMHCpan
HybridFinder
output.<- read.csv(file.path(folder_Human_Liver_AUTD17, "second_run","DB search psm.csv"), sep=",", head=TRUE,stringsAsFactors = FALSE)
db_rerun_Human_liver_AUTD17
<- results_HybridFinder_Human_Liver_AUTD17[[1]] HF_output_Human_liver_AUTD17
Once the inputs are loaded, running step2_wo_netMHCpan is easier than ABC.
<- step2_wo_netMHCpan(peptide_rerun = db_rerun_Human_liver_AUTD17, HF_step1_output = HF_output_Human_liver_AUTD17, export_files = TRUE, export_dir=folder_Human_Liver_AUTD17) results_step2_Human_Liver_AUTD17
The function returns a list composed of 2 elements: - a character vector containing the list of unique peptides from the database search rerun without modifications and of length 9 to 12 amino acids - the database results with the respective potential splice types retrieved from step 1
#display the netmhcpan-ready input / list of all peptides 9-12 aa, without
#modifications
print(head(results_step2_Human_Liver_AUTD17[[1]]))
#> [1] "LYPDSFTVL" "LDFPKPLLA" "YYTPLTPHL" "LYEPNFLFF" "VAHVDDMPNAL"
#> [6] "FLPLTPQFVTE"
#display the updated database search results table with the categorizations from
#step1
print(head(results_step2_Human_Liver_AUTD17[[2]]))
#> Peptide X.10lgP Mass Length ppm m.z Z RT Area
#> 3078 LRVAPEEHPVL 29.89 1258.7034 11 0.0 420.5751 3 43.15 1350800
#> 5633 NEQVKNFVA 23.25 1047.5349 9 -0.7 524.7744 2 29.43 65130
#> 4017 SALPVTYVF 29.62 995.5328 9 0.2 498.7738 2 75.76 101290
#> 707 VVYPWTQRF 31.33 1194.6185 9 0.6 598.3169 2 61.47 3779700
#> 5028 AYLERMNYL 33.65 1171.5696 9 0.6 586.7924 2 51.78 66751
#> 5789 VVYPAMTQRF 24.10 1210.6168 10 -2.4 606.3142 2 56.05 349750
#> Fraction Id Scan from.Chimera
#> 3078 3 38292 F3:7364 No
#> 5633 1 322 F1:3088 No
#> 4017 3 44945 F3:17185 No
#> 707 1 5670 F1:13116 No
#> 5028 3 39990 F3:9972 No
#> 5789 3 40918 F3:10996 No
#> Source.File
#> 3078 171002_AM_BD-ZH17_Liver_W_10%_DDA_#3_400-650mz_msms6.mzML
#> 5633 171002_AM_BD-ZH17_Liver_W_10%_DDA_#1_400-650mz_msms4.mzML
#> 4017 171002_AM_BD-ZH17_Liver_W_10%_DDA_#3_400-650mz_msms6.mzML
#> 707 171002_AM_BD-ZH17_Liver_W_10%_DDA_#1_400-650mz_msms4.mzML
#> 5028 171002_AM_BD-ZH17_Liver_W_10%_DDA_#3_400-650mz_msms6.mzML
#> 5789 171002_AM_BD-ZH17_Liver_W_10%_DDA_#3_400-650mz_msms6.mzML
#> Accession PTM AScore Found.By Peptide_no_mods
#> 3078 P63261:P60709 PEAKS DB LRVAPEEHPVL
#> 5633 P04114 PEAKS DB NEQVKNFVA
#> 4017 Q9BZW5:Q9BZW5 PEAKS DB SALPVTYVF
#> 707 P68871 PEAKS DB VVYPWTQRF
#> 5028 P06241:P06241:P06241:P07947 PEAKS DB AYLERMNYL
#> 5789 |denovo_HF_fake_protein2 PEAKS DB VVYPAMTQRF
#> Potential_spliceType
#> 3078 Linear
#> 5633 Linear
#> 4017 Linear
#> 707 Linear
#> 5028 Linear
#> 5789 trans
If export is set to TRUE
and a valid directory is provided in export_dir
, then the results are exported .csv, .csv and csv format, respectively.
Even if the export parameters were not set at the beginning, the results returned can always be exported with the export_step2_results
function as long as as the results obtained from the step2_wo_netMHCpan function are stored which is also indicated in the results_list parameter of the export_step2_results
function.
Faridi, P., Li, C., Ramarathinam, S. H., Vivian, J. P., Illing, P. T., Mifsud, N. A., Ayala, R., Song, J., Gearing, L. J., Hertzog, P. J., Ternette, N., Rossjohn, J., Croft, N. P., & Purcell, A. W. (2018). A subset of HLA-I peptides are not genomically templated: Evidence for cis- and trans-spliced peptide ligands. Science Immunology, 3(28), eaar3947. /sciimmunol.aar3947, Link
Hanada K, Yewdell JW, Yang JC. Immune recognition of a human renal cancer antigen through post-translational protein splicing. Nature. 2004 Jan 15;427(6971):252-6. DOI /nature02240, Link
Marcu A, Bichmann L, Kuchenbecker L, et al HLA Ligand Atlas: a benign reference of HLA-presented peptides to improve T-cell-based cancer immunotherapyJournal for ImmunoTherapy of Cancer 2021;9:e002071. /jitc-2020-002071, Link
Birkir Reynisson, Bruno Alvarez, Sinu Paul, Bjoern Peters, Morten Nielsen, NetMHCpan-4.1 and NetMHCIIpan-4.0: improved predictions of MHC antigen presentation by concurrent motif deconvolution and integration of MS MHC eluted ligand data, Nucleic Acids Research, Volume 48, Issue W1, 02 July 2020, Pages W449–W454, /nar/gkaa379, Link
Jurtz V, Paul S, Andreatta M, Marcatili P, Peters B, Nielsen M. NetMHCpan-4.0: Improved Peptide-MHC Class I Interaction Predictions Integrating Eluted Ligand and Peptide Binding Affinity Data. J Immunol. 2017 Nov 1;199(9):3360-3368. Epub 2017 Oct 4. PMID: 28978689; PMCID: PMC5679736 /jimmunol.1700893, Link
The UniProt Consortium, UniProt: the universal protein knowledgebase in 2021, Nucleic Acids Research, Volume 49, Issue D1, 8 January 2021, Pages D480–D489, /nar/gkaa1100, Link