Given a repository of mutations, the method allPfamAnalysis
launches the analysis of all the Pfams and single sequences which
are involved with at least one mutation.
either a data.frame or a filename containing the data to analyze
allLowMACAObjects
filename of a RData file to save all
the LowMACA object allPfamsLM produced by the function. It can be usefull for plotting
a specific Pfam after the analysis, but it can be a pretty large object. Default NULL
mutation_type
type of mutation to be considered for the analysis.
Default to missense.
NoSilent
logical indicating if Silent mutations should be deleted or not.
Default TRUE
mail
if not NULL, it must be a valid email address to use EBI
clustalo web service. Default is to use a local clustalo installation
perlCommand
a character string containing the path to Perl executable.
if missing, "perl" will be used as default. Only used if mail is set
verbose
logical. verbose output or not
conservation
a number between 0 and 1. Represents the minimum level of conservation to test a mutation
use_hmm
When analysing Pfam sequences, it is possible to
use the Hidden Markov Model (HMM) of the
specific Pfam to align the sequences.
Default is FALSE.
datum
When analysing Pfam sequences, use all the genes
that belong to the Pfam to generate the
alignment. This creates a unique mapping between
individual residues and consensus sequence,
disregarding the set of sequences that are
selected for the analysis.
Default is FALSE.
clustal_cmd
path to clustalomega executable. default is to check "clustalo" in the PATH
BPPARAM
An object of class BiocParallelParam specifiying parameters related to
the parallel execution of some of the tasks and calculations within this function.
See function bpparam() from the BiocParallel package.
Details
This function takes a data.frame or a tab delimited text file in LowMACA format (see LowMACA_AML)
and perform a full analysis of the dataset. It basically divide the mutations into their Pfam and launch many LowMACA
analysis as many Pfam are hit by mutations up to the lfm function. Every significant position after lfm
is tested at gene level. A binomial test is performed to see if the ratio between the number of mutations
in the significant position over the total number of mutations is higher than expected by chance at gene level.
The significant mutations of all the lfm functions are aggregated in one single data.frame.
Value
A list of two dataframes named 'AlignedSequence' and 'SingleSequence'
The first dataframe is the result of the alignment based analysis.
Every gene is aggregated by its corresponding Pfam domain.
Gene_Symbol
gene symbols of the analyzed genes
Multiple_Aln_pos
positions in the consensus
relatively to the sequence analyzed.
Pfam_ID
Pfam name analyzed
binomialPvalue
pvalue of the single gene test, See details
Amino_Acid_Position
amino acidic positions relative to original protein
Amino_Acid_Change
amino acid changes in hgvs format
Sample
Sample barcode where the mutation was found
Tumor_Type
Tumor type of the Sample
Envelope_Start
start of the pfam domain in the protein
Envelope_End
end of the pfam domain in the protein
metric
qvalue of the position in the multiple alignment of Pfam domains
Entrez
entrez ids of the mutations
Entry
Uniprot entry of the protein
UNIPROT
other protein names for Uniprot
Chromosome
cytobands of the genes
Protein.name
extended protein names
The second dataframe represent the result of LowMACA on every couple gene-domain
when it is not aligned with any other member of the same Pfam ID.
Gene_Symbol
gene symbols of the analyzed genes
Amino_Acid_Position
amino acidic positions relative to original protein
Amino_Acid_Change
amino acid changes in hgvs format
Sample
Sample barcode where the mutation was found
Tumor_Type
Tumor type of the Sample
Envelope_Start
start of the pfam domain in the protein
Envelope_End
end of the pfam domain in the protein
Multiple_Aln_pos
positions in the consensus
relatively to the sequence analyzed. See warnings section
Entrez
entrez ids of the mutations
Entry
Uniprot entry of the protein
UNIPROT
other protein names for Uniprot
Chromosome
cytobands of the genes
Protein.name
extended protein names
Author(s)
Stefano de Pretis , Giorgio Melloni
See Also
lfm, LowMACA_AML
Examples
#Load Homeobox example
data(lmObj)
#Extract the data inside the object as a toy example
myData <- lmMutations(lmObj)$data
#Run allPfamAnalysis on every mutations
significant_muts <- allPfamAnalysis(repos=myData)
#Show the result of alignment based analysis
head(significant_muts$AlignedSequence)
#Show all the genes that harbor significant mutations
unique(significant_muts$AlignedSequence$Gene_Symbol)
#Show the result of the Single Gene based analysis
head(significant_muts$SingleSequence)
#Show all the genes that harbor significant mutations
unique(significant_muts$SingleSequence$Gene_Symbol)
Results
R version 3.3.1 (2016-06-21) -- "Bug in Your Hair"
Copyright (C) 2016 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> library(LowMACA)
Checking if clustalo is in the PATH...
Checking perl installation...
Checking perl modules XML::Simple and LWP...
Can't locate XML/Simple.pm in @INC (you may need to install the XML::Simple module) (@INC contains: /etc/perl /usr/local/lib/x86_64-linux-gnu/perl/5.22.1 /usr/local/share/perl/5.22.1 /usr/lib/x86_64-linux-gnu/perl5/5.22 /usr/share/perl5 /usr/lib/x86_64-linux-gnu/perl/5.22 /usr/share/perl/5.22 /usr/local/lib/site_perl /usr/lib/x86_64-linux-gnu/perl-base .).
BEGIN failed--compilation aborted.
Warning messages:
1: In .ClustalChecks(ClustalCommand = "clustalo") :
Clustal Omega is not in the PATH:
You can either change clustalo command using lmParams function or use the web service. See ?setup
2: running command '/usr/bin/perl -MXML::Simple -e 1' had status 2
3: In .PerlModuleChecks(stop = FALSE, perl = "perl") :
XML::Simple module for perl is not installed.
If you don't want to install a local clustal omega and use the web service, XML::Simple is required
> png(filename="/home/ddbj/snapshot/RGM3/R_BC/result/LowMACA/allPfamAnalysis.Rd_%03d_medium.png", width=480, height=480)
> ### Name: allPfamAnalysis
> ### Title: Global analysis of a repository of mutations
> ### Aliases: allPfamAnalysis
>
> ### ** Examples
>
> #Load Homeobox example
> data(lmObj)
> #Extract the data inside the object as a toy example
> myData <- lmMutations(lmObj)$data
> #Run allPfamAnalysis on every mutations
> significant_muts <- allPfamAnalysis(repos=myData)
Error in .clustalOAlign(genesData, clustal_cmd, clustalo_filename, mail, :
Clustal Omega command not found. clustalo is not in your PATH or it was not installed
Calls: allPfamAnalysis ... tryCatch -> tryCatchList -> tryCatchOne -> <Anonymous>
Execution halted