bga {made4} | R Documentation |
Discrimination of samples using between group analysis as described by Culhane et al., 2002.
bga(dataset, classvec, type = "coa", ...) plot.bga(x, axes1=1, axes2=2, arraycol=NULL, genecol="gray25", nlab=10, genelabels= NULL, ...)
dataset |
Training dataset. A matrix , data.frame ,
exprSet or marrayRaw .
If the input is gene expression data in a matrix or data.frame . The
rows and columns are expected to contain the variables (genes) and cases (array samples)
respectively.
|
classvec |
A factor or vector which describes the classes in the training dataset |
type |
Character, "coa", "pca" or "nsc" indicating which data transformation is required. The default value is type="coa" |
x |
An object of class bga . The output from bga or
bga.suppl . It contains the projection coordinates from bga ,
the $ls, $co or $li coordinates to be plotted |
arraycol, genecol |
Character, colour of points on plot. If arraycol is NULL,
arraycol will obtain a set of contrasting colours using getcol , for each classes
of cases (microarray samples) on the array (case) plot. genecol is the colour of the
points for each variable (genes) on gene plot |
nlab |
Numeric. An integer indicating the number of variables (genes) at the end of axes to be labelled, on the gene plot. |
axes1 |
Integer, the column number for the x-axis. The default is 1. |
axes2 |
Integer, the column number for the y-axis, The default is 2. |
genelabels |
A vector of variables labels, if genelabels =NULL the row.names
of input matrix dataset will be used |
... |
further arguments passed to or from other methods |
bga
performs a between group analysis on the input dataset. This function
calls between
. The input format of the dataset
is verified using array2ade4
.
Between group analysis is a supervised method for sample discrimination and class prediction.
BGA is carried out by ordinating groups (sets of grouped microarray samples), that is,
groups of samples are projected into a reduced dimensional space. This is most easily
done using PCA or COA, of the group means. The choice of PCA, COA is defined by the parameter type
.
The user must define microarray sample groupings in advance. These groupings are defined using
the input classvec
, which is a factor
or vector
.
Cross-validation and testing of bga results:
bga results should be validated using one leave out jack-knife cross-validation using
bga.jackknife
and by projecting a blind test datasets onto the bga axes
using suppl
.
bga
and suppl
are combined in bga.suppl
which requires input of both a training and test dataset.
It is important to ensure that the selection of cases for a training and test set are not biased, and
generally many cross-validations should be performed. The function randomiser
can be used to randomise the selection of training and test samples.
Plotting and visualising bga results:
2D plots:
Use plot.bga
to plot results from bga
. plot.bga calls the functions
s.var
and s.groups
to draw an xy plot of cases ($ls).
s.var
and s.groups
are modifications of the ADE4 graphing functions
s.label
and s.class
.
plotgenes
, is used to draw an xy plot of the variables (genes).
3D plots:
3D graphs can be generated using do3D
and html3D
.
html3D
produces a web page in which a 3D plot can be interactively rotated, zoomed,
and in which classes or groups of cases can be easily highlighted.
1D plots, show one axis only:
1D graphs can be plotted using between.graph
and
graph1D
. between.graph
is used for plotting the cases,
and required both the co-ordinates of the cases ($ls) and their centroids ($li). It accepts an object bga
.
graph1D
can be used to plot either cases (microarrays) or variables (genes) and only requires
a vector of coordinates.
Analysis of the distribution of variance among axes:
It is important to know which cases (microarray samples) are discriminated by the axes.
The number of axes or principal components from a bga
will equal the number of classes - 1
,
that is length(levels(classvec))-1.
The distribution of variance among axes is described in the eigenvalues ($eig) of the bga
analysis.
These can be visualised using a scree plot, using scatterutil.eigen
as it done in plot.bga
.
It is also useful to visualise the principal components from a using a bga
or principal components analysis
dudi.pca
, or correspondence analysis dudi.coa
using a
heatmap. In MADE4 the function heatplot
will plot a heatmap with nicer default colours.
Extracting list of top variables (genes):
Use topgenes
to get list of variables or cases at the ends of axes. It will return a list
of the top n variables (by default n=5) at the positive, negative or both ends of an axes.
sumstats
can be used to return the angle (slope) and distance from the origin of a list of
coordinates.
For more details see Culhane et al., 2002 and http://bioinf.ucd.ie/research/BGA.
A list with a class bga
containing:
ord |
Results of initial ordination. A list of class "dudi" (see dudi ) |
bet |
Results of between group analysis. A list of class "dudi" (see dudi ),
"between" (see between ) |
fac |
The input classvec, the factor or vector which described the classes in the input dataset |
Aedin Culhane
Culhane AC, et al., 2002 Between-group analysis of microarray data. Bioinformatics. 18(12):1600-8.
See Also bga
,
suppl
, suppl.bga
, between
,
bga.jackknife
data(khan) if (require(ade4, quiet = TRUE)) { khan.bga<-bga(khan$train, classvec=khan$train.classes) } khan.bga plot(khan.bga, genelabels=khan$annotation$Symbol) # Provide a view of the principal components (axes) of the bga heatplot(khan.bga$bet$ls, dend=FALSE,lowcol="blue")