On UrbanBaby: Nanny vs. Daycare. Discuss!
Find Articles in:
all
Business
Reference
Technology
News
Sports
Health
Autos
Arts
Home & Garden
advertisement
advertisement

Content provided in partnership with
Thomson / Gale

Principal Curves: A New Technique For Indirect And Direct Gradient Analysis

Ecology,  Oct, 1999  by Glenn De Ath

INTRODUCTION

The principals and objectives of ordination have been widely discussed in the literature (e.g., Dale 1975, Prentice 1977, Orloci 1978, ter Braak and Prentice 1988, Jongman et al. 1995). Ordination techniques use abundance or presence-absence data of species, and often environmental data, in order to reveal various aspects of community structure, such as ecological gradients and relationships between species and their environment. In this paper I focus on two common objectives of ordination:

1) Gradient analysis aims to find an ecological gradient(s) assumed to be influencing species abundances in a systematic way, and to locate sites along the gradient (ter Braak and Prentice 1988, Jongman et al. 1995). If only species data are used to determine the gradient, the ordination is referred to as indirect gradient analysis (IGA). Site locations may subsequently be related to environmental variables, usually by linear regression. For direct gradient analysis (DGA), species and environmental data are jointly used to determine the gradient and locate the sites; hence systematic species variation is limited to that explained by the environmental data. IGA can be predominantly seen as an explorative, hypothesis-generating procedure (ter Braak and Prentice 1988), whereas DGA can test specific hypotheses of how environmental variables determine species abundances.

2) Species composition representation can be achieved by mapping sites from the high dimensional species data onto a low dimensional space, based on some measure of similarity between species (or, equivalently, dissimilarity). The arrangement of sites attempts to reflect the species composition, in that distances between sites in the reduced space are proportional to (or ordered as) differences in species composition. The ecological interpretation of the representation is determined by any scalings or transformations of the data, the measure of similarity, and the method of dimension reduction. Though many measures of species composition are used, I will focus on Euclidean distance. Thus, sites are similar if they have equal abundances of all species, or equal relative abundances, if the data are site standardized, as is a common practice.

Principal components analysis (PCA), correspondence analysis (CA), including its detrended, canonical, and detrended canonical forms, and metric and nonmetric multidimensional scaling (MDS) are the most often used ordination techniques (Kenkel and Orloci 1986, Birks et al. 1996). All can be expressed as dissimilarity-based methods, whereby a matrix of dissimilarities between sites is mapped into a low dimensional space. The measure of dissimilarity and the mapping method vary between techniques. If used to determine gradients, these three groups of techniques all require either a linear or ordinal relationship between the chosen (or implied) measure of site dissimilarity and the Euclidean distance between sites along the ecological gradient(s) (Faith et al. 1987, ter Braak and Prentice 1988).

An alternative to the dissimilarity-based methods of gradient analysis was suggested by Orloci (1978) who noted that species data may form a single cluster that curves through high dimensional space. Several methods have been proposed to model such curves (e.g., Shephard and Carrol 1966, Carrol 1969, Gnanadesikan 1977, Phillips 1978, Etezadi-Amoli and McDonald 1983, Hastie and Stuetzle 1989); however most have significant limitations (Hastie 1984, Hastie and Stuetzle 1989), and none has been widely used by ecologists.

In this paper, I further develop the idea of basing gradient analysis on the premise that species data may form a curve in high dimensional space. To find the curve and locate sites along it, I use the method of principal curves (Hastie 1984, Hastie and Stuetzle 1989, Banfield and Rafferty 1992, Tibshirani 1992, Duchamp and Stuetzle 1996). Principal curves (PCs) are smooth one-dimensional curves in high dimensional space, hence, as the following argument shows, are ideally suited for gradient analysis of coenocline data. Consider the ecological model whereby several species are driven by a single ecological gradient [ILLUSTRATION FOR FIGURE 1A OMITTED]. If the individual response curves of the species are smooth, though not necessarily Gaussian, then successive points along the gradient will trace out a smooth curve in the high dimensional space defined by the species (assuming no noise) [ILLUSTRATION FOR FIGURE 1B OMITTED]. Sites will be located along the curve in the same order as they occur along the ecological gradient. If one can somehow trace out this curve in high dimensional space, and locate the positions of sites along the curve, then one can correctly order sites on the gradient.

PCs are formulated as an explicit model, with the response variables (species abundances, in this case) explained by smooth functions of a latent variable, namely distance along the curve (Hastie and Stuetzle 1989), and the curves are fitted by an iterative alternating two-step procedure. PCA and CA can also be formulated and fitted in similar ways, with the smooth curves of PCs replaced by more restrictive linear and Gaussian models, respectively (ter Braak 1985, Jongman et al. 1995). Since PCs are based on local smoothers, the linear (or ordinal) relationship between site dissimilarity and the distance between sites along the gradient, as required by PCA, CA, and MDS, is no longer necessary.