UvA-DARE ( Digital Academic Repository ) Chemotaxonomy of domesticated grasses : a pathway to understanding the origins of agriculture

The grass family (Poaceae) is one of the most economically important plant groups in the world today. In particular many major food crops, including rice, wheat, maize, rye, barley, oats and millet, are grasses that were domesticated from wild progenitors during the Holocene. Archaeological evidence has provided key information on domestication pathways of different grass lineages through time and space. However, the most abundant empirical archive of floral change – the pollen record – has been underused for reconstructing grass domestication patterns because of the challenges of classifying grass pollen grains based on their morphology alone. Here, we test the potential of a novel approach for pollen classification based on the chemical signature of the pollen grains measured using Fourier transform infrared (FTIR) microspectroscopy. We use a dataset of eight domesticated and wild grass species, classified using k-nearest neighbour classification coupled with leave-one-out cross validation. We demonstrate a 95 % classification success rate on training data and an 82 % classification success rate on validation data. This result shows that FTIR spectroscopy can provide enhanced taxonomic resolution enabling species level assignment from pollen. This will enable the full testing of the timing and drivers of domestication and agriculture through the Holocene.


Introduction
The transition from a mobile, hunter-gatherer lifestyle to a sedentary lifestyle centred on agriculture was one of the major shifts in the history of human civilisation. While many different species of plants were successfully domesticated over the course of the Holocene, grasses (Poaceae) have been a particular focus of exploitation, both for human consumption and as animal feed. Principal grass crops include wheat (Triticum), rice (Oryza), maize (Zea), barley (Hordeum), rye (Secale), oats (Avena), Sorghum, sugarcane (Saccharum), and the millets (principally Eleusine, Panicum, Pennisetum and Setaria, but also including other genera from within the subfamilies Chloridoideae and Panicoideae) (Fig. 1a) (Kel-logg, 1998;Leff et al., 2004;Meyer et al., 2012), which together account for ∼ 63 % of total cultivated land area (Leff et al., 2004). Understanding the initial exploitation and domestication of grasses and the spread of these lineages and agricultural practices is therefore a key research endeavour, including questions of when, where and why particular lineages were domesticated and what impact this had on human societies and the landscapes in which they lived. A fuller understanding of domestication may also play an important role in securing future food security (Charmet, 2011).
Evidence for the domestication of grasses and the spread of agriculture has come from a variety of sources. A combination of modern and ancient DNA analysis (Dvorak et al., Soreng et al. (2015). BOP clade: subfamilies Bambusoideae, Oryzoideae and Pooideae. PACMAD clade: subfamilies Panicoideae, Aristidoideae, Chloridoideae, Micrairoideae, Arundinoideae and Danthonioideae. The genus names in parentheses are crop taxa mentioned in the text, and those in red are included in this study. (b) Hybridisation and domestication pathway for wheat, showing relationships among taxa and their ploidy levels. All taxa except Aegilops speltoides are included in this study (see also Table 1). Redrawn from International Wheat Genome Sequencing Consortium (2014), with additional information from Petersen et al. (2006Petersen et al. ( ). 2006International Wheat Genome Sequencing Consortium, 2014;Marcussen et al., 2014;Mascher et al., 2016;Meyer and Purugganan, 2013;Petersen et al., 2006), controlled growth experiments (Cunniff et al., 2014;Preece et al., 2015Preece et al., , 2017Preece et al., , 2018, and archaeobotanical remains of grasses and grain processing (Crowther et al., 2016;Fuller, 2007;Piperno et al., 2004;Savard et al., 2006;Vignola et al., 2017;Willcox et al., 2007) have together provided a detailed picture of when and where different taxa were domesticated, including the hybridisation pathways that produced the modern domesticated types (Fig. 1b), and why certain species may have been selected for cultivation over others. A key outstanding challenge is the need for better spatially and temporally resolved data on the first appearances of domesticated species and their subsequent diffusion to other regions, which will in turn enable a fuller understanding of the role that climatic and environmental changes played in shaping the origins and spread of agriculture (Larson et al., 2014).
Of the different sources of empirical evidence for historical vegetation change, pollen provides the most abundant and widespread records, but is also one of the most challenging to utilise for tracking grass domestication and agricultural practices through time. This is because pollen morphology is highly similar across the ∼ 12 000 species (Soreng et al., 2015) within Poaceae (Strömberg, 2011). Poaceae pollen grains are all morphologically simple, being more or less spherical with a single, annulate pore and a scabrate to areolate surface sculpture (Köhler and Lange, 1979;Mander et al., 2013). Attempts to distinguish among and between wild grasses and their domesticated relatives have relied on a combination of grain size and shape, pore diameter and position, annulus width and thickness, and exine structure and microsculpture (Andersen, 1979;Beug, 1961Beug, , 2004Bottema, 1992;Dickson, 1988;Firbas, 1937;Joly et al., 2007;Köhler and Lange, 1979;Rowley, 1960;Tweddle et al., 2005), although of these various characters grain size has been most commonly relied upon in routine palynological studies (Bottema, 1992). Pollen grain size varies between 30 and 100 µm among Poaceae species and broadly correlates with genome size (Bennett, 1972). Since domesticated grasses are typically polyploid they have larger pollen grains than their wild type relatives, and this has led to a size-based circumscription of a "Cerealia" type (Andersen, 1979;Beug, 1961Beug, , 2004Bottema, 1992;Firbas, 1937;Joly et al., 2007;Tweddle et al., 2005). In practice, however, there is often an overlap in size between wild and domesticated pollen types, especially outside of northwest Europe where wild grasses frequently exceed the 40 µm Cerealia size cut-off (Bottema, 1992;Joly et al., 2007;Tweddle et al., 2005). Pollen size is also known to be influenced by processing procedures, and storage and mounting media, with both acetolysis and storage in glycerol leading to a size increase (Christensen, 1946;Cushing, 1961;Faegri and Deuse, 1960;Reitsma, 1969;Sluyter, 1997; see also Jardine and Lomax, 2017, for review).
Variations in pollen grain surface sculpture have been used as a basis for dividing grains up into broad types, such as the Hordeum type, Triticum type, Avena type and Setaria type of Köhler and Lange (1979). Additional information on grain morphology has then been used to separate out individual taxa, such as Secale cereale L. (rye) being distinguished from other members of the Hordeum type by grain shape and pore position (Dickson, 1988;Köhler and Lange, 1979). These sculpturing types are, however, not phylogenetically or taxonomically meaningful and occur across the grass phylogeny (Mander and Punyasena, 2016). Triticum monococcum (domesticated einkorn) has been assigned to the Hordeum rather than the Triticum type based on scanning electron microscopy (SEM) analysis of its exine microsculpture (for example Köhler and Lange, 1979), and the Hordeum, Triticum, Avena and Setaria types all occur in both the PACMAD and BOP clades ( Fig. 1a) (Mander and Punyasena, 2016). Surface and structural micro-features are also challenging to unambiguously identify using light microscopy, even if high magnification and/or phase contrast are employed (Andersen and Bertelsen, 1972;Mander et al., 2013;Rowley, 1960;Tweddle et al., 2005). While sophisticated computational image-based classification approaches have been developed on Poaceae pollen (Mander et al., 2013) they have not yet been applied to crop domestication problems, and since they rely on SEM imaging would be costly and time consuming to perform on the large numbers of specimens and samples required to make meaningful insight into the topic (Julier et al., 2016). Finally, all size, shape and sculpturing characters may be influenced by taphonomy (Bottema, 1992;, and it is not clear how useful they are, even in combination, for examining the earliest phases of wild grass hybridisation and domestication. Recent research has demonstrated the potential of using the chemical signature of pollen grains as an alternative means of classification. Non-destructive vibrational spectroscopic methods such as Fourier transform infrared (FTIR) and Raman spectroscopy have been applied to pollen and spore classification problems in both plants and fungi (Bagcıoglu et al., 2015;Dell'Anna et al., 2009;Ivleva et al., 2005;Julier et al., 2016;Pappas et al., 2003;Schulte et al., 2008;Zimmermann, 2010Zimmermann, , 2018Zimmermann and Kohler, 2014;Zimmermann et al., 2015aZimmermann et al., , b, 2016 and have demonstrated a high degree of accuracy in discriminating between closely related taxa. Zimmermann (2018) applied FTIR to chemically profile and separate two morphological similar species of pine (Pinus mugo Turra and Pinus sylvestris L.). A further FTIR study based on grasses of tropical West Africa (Julier et al., 2016) revealed a subfamily level classification success rate of 80 %, suggesting that this approach might be useful for distinguishing among different species of domesticated and wild grasses. Here we test this possibility on a dataset of grass crops and their wild progenitors, with a view to developing pollen chemotaxonomy as a viable archaeobotanical tool.

Materials and methods
Anthers from eight species of domesticated and wild grasses (Table 1) were collected from the John Innes Centre, Norwich, UK (52.62 • N, 1.22 • E), in May and June 2014. All species are from the subfamily Pooideae (Fig. 1a), with Triticum (wheat), Aegilops (goat grass) and Hordeum (barley) from the tribe Triticeae and Avena (oat) from the tribe Poeae. The Triticum and Aegilops plants were grown outdoors as part of the bread wheat domestication demonstration at the John Innes Centre, and the Avena and Hordeum plants were grown and sampled in greenhouses. We sampled three plants from each species, except for T. monococcum (domesticated einkorn) where only two plants were in anthesis at the time of sampling. Four to six anthers were collected from each plant and combined together to give a sufficient quantity of pollen for chemical analysis. This means that each sample represents a whole plant, rather than an individual anther.
We used Fourier transform infrared (FTIR) microspectroscopy to generate chemical data, because it is an efficient method that has demonstrated success in pollen characterisation and classification studies (Bagcıoglu et al., 2015(Bagcıoglu et al., , 2017Bell et al., 2018;Bernard et al., 2015;Dell'Anna et al., 2009;Depciuch et al., 2018;Domínguez et al., 1998;Fraser et al., 2012Fraser et al., , 2014Gottardini et al., 2007;Jardine et al., 2015Julier et al., 2016;Lomax et al., 2008;Pappas et al., 2003;Steemans et al., 2010;Watson et al., 2007;Zimmermann, 2010Zimmermann, , 2018Zimmermann and Kohler, 2014;Zimmermann et al., 2015aZimmermann et al., , b, 2016. The pollen samples were picked onto ZnSe windows for FTIR analysis, which was carried out using a Thermo Scientific (Waltham, MA, USA) Nicolet Nexus FTIR bench unit, attached to a Continuum IR microscope fitted with an MCT-A liquid nitrogen-cooled detector, run in transmission mode using a Reflachromat 15× objective. To remove atmospheric H 2 O and CO 2 interference within spectra, the entire system (bench unit, microscope and sample stage) was purged with air that has been dried and scrubbed of CO 2 using a Peak Scientific (Billerica, MA, USA) ML85 purge unit. We collected eight replicate spectra from each sample using an aperture size of 100 µm × 100 µm at 256 scans per replicate and a resolution of 4 cm −1 . A background scan of a blank region of the ZnSe window was taken before each sample scan.
Data analyses were carried out in R v.3.4.2 (R Core Team, 2017) using the packages baseline v.1.2-1 (Liland and Mevik, 2015), caret v.6.0-77 (Kuhn, 2017), class v.7.3-14 (Venables and Ripley, 2002), corrplot v.0.84 (Wei and Simko, 2017), e1071 v.1.6-8 (Meyer et al., 2017) and prospectr v.0.1.3 (Stevens and Ramirez-Lopez, 2013). Although some pollen chemistry studies (e.g. Bagcıoglu et al., 2015Bagcıoglu et al., , 2017Zimmermann and Kohler, 2014) have cropped FTIR spectra to the 1900 to 700 cm −1 region (i.e. the "fingerprint" region), preliminary analysis showed that with our dataset this decreased classification success by 1 % to 2 %. While this suggests that the majority of relevant information for classification is in the fingerprint region, we proceeded with the full 4000 to 690 cm −1 range to maximise classification success and analyse inter-taxon differences across the entire spectra. We corrected baseline drift by subtracting a second-order polynomial baseline from each spectrum. Variations in sample thickness will control the absolute height of spectra, so each spectrum was z score standardised by subtracting the mean value and dividing by the variance. We used principal component analysis (PCA) and cluster analysis for data visualisation and exploration (Varmuza and Filzmoser, 2009). The PCA was run on the sample spectra, and the cluster analysis was run on the mean spectrum for each taxon, with the aim of exploring the main inter-taxon chemical relationships. The cluster analysis was run using the Euclidean distance and with the unweighted pair group method with arithmetic mean (UPGMA) linkage algorithm.
To test the classification potential of the chemical data, the dataset was split randomly into training (two-thirds of the samples) and validation (one-third of the samples) subsets. For the taxa with three plants sampled this involved selecting one plant at random and assigning the spectra from it to the validation set. For T. monococcum, where only two plants were sampled, 5 of the 16 spectra were randomly selected and assigned to the validation set.
We used k nearest neighbour (k-nn) classification, which assigns unknown spectra to groups (species) based on the most chemically similar spectra (the "nearest neighbours") from the training set (Julier et al., 2016;Varmuza and Filzmoser, 2009). The parameter k is the number of nearest neighbours used for classification and is user selected. Where k > 1 the classification is determined by majority vote, and in the case of a tie the group assignment is chosen at random (Venables and Ripley, 2002). We used the Euclidean distance measure to determine between-sample chemical similarity.
In addition to running the k-nn classification on the unprocessed IR spectra, we tested the impact of Savitzky-Golay smoothing and taking first and second derivatives of the smoothed spectra. Savitzky-Golay smoothing fits polynomial curves to successive windows across a data series, with a larger window size increasing the amount of smoothing (Julier et al., 2016;Varmuza and Filzmoser, 2009). Taking derivatives of the spectra can bring out smaller spectral details to aid in classification but also increases the level of noise in the analysis (Julier et al., 2016;Varmuza and Filzmoser, 2009) and so was only carried out on the Savitzky-Golay smoothed spectra.
We used leave-one-out cross validation (LOOCV) on the training dataset to find the best combination of parameters for classification. LOOCV treats each spectrum as an unknown sample and classifies it based on the group membership of the rest of the dataset. The classification success rate can then be calculated as the percentage of correct classi-fications from the cross validation procedure (Varmuza and Filzmoser, 2009). We varied k (the number of nearest neighbours in the k-nn classification) and w (the window size of the Savitzky-Golay smoothing), and tested classification success for all combinations of k = 1 to k = 20 and w = 5 to w = 43. For simplicity we kept p, the polynomial degree for the Savitzky-Golay smoothing, fixed to 3. Once the best combination of parameters had been selected from the training set, this was then applied to the validation set and k-nn classification run using the training set to classify the spectra. Again, the classification success rate was calculated as the percentage of correct classifications. We used confusion matrices to investigate among-taxon patterns in classification success rates. The raw data and R code for running the analyses are available from figshare (Jardine et al., 2019).
The first two axes of a PCA on the unprocessed (i.e. z score standardised but not smoothed or differentiated) spectra account for 65 % of the variation in the dataset (Figs. 3a and S1a). Clear within-taxon groupings of samples are discernible in ordination space, although there is considerable overlap among the taxa. Most Triticum samples plot out at the lower end of axis two, along with Avena. Wild emmer and einkorn (T. dicoccoides and T. urartu, respectively) also occur close to their domesticated varieties (T. dicoccon and T.monococcum, respectively). Loading plots (Fig. 3b) show that samples occurring higher on axis one have higher peaks in the 1700 to 1000 cm −1 region, and lower values at higher wavenumbers; this may in part be driven by residual baseline effects. Axis two loadings show less of a clear signal except for a peak at 1000 cm −1 , which again likely relates to variations in carbohydrates (Bagcıoglu et al., 2015). Higher PCA axes, each successively accounting for a smaller percentage of the variation in the dataset, demonstrate further separation among taxa (Fig. S2c and e).
A cluster analysis of the unprocessed taxon mean spectra (Fig. 3c) shows two main clusters. One comprises H. vulgare, Avena sativa, T. dicoccon, T. aestivum and T. dicoccoides, and the other comprises Aegilops tauschii, T. monococcum and T. urartu, with the Triticum species forming subclusters within the two main clusters. As with the PCA the wild varieties of einkorn and emmer occur close to the domesticated varieties within the cluster dendrogram.
For the training dataset, the maximum classification success rate was 95 %, with Savitzky-Golay smoothed first derivative spectra, k = 1 and w = 27, 29, 31 or 33 (Figs. 4 and S1a, Table 2). Misclassified samples from this combination of parameters were from T. dicoccon (four samples) and T. monococcum (two samples), with all samples from the remaining six taxa being correctly classified (Fig. 5a). The next best combination of parameters was with Savitzky-Golay smoothed second derivative spectra (94 %; Fig. S1b), then unprocessed spectra (79 %) and finally Savitzky-Golay smoothed spectra (78 %) ( Fig. 4 and Table 2). In all cases the choice of k was more critical than the choice of w for maximising the classification success rate, although there was a relatively broad tolerance for both when first or second derivative spectra were used (Fig. 4).
Based on the results from the training dataset, the validation dataset was classified using Savitzky-Golay smoothed first derivative spectra, with k = 1 and w = 29 (Fig. S1a). A success rate of 82 % was achieved with this combination of parameters. As with the training dataset the majority of mis-  Table 2. Maximum classification success rate on the training dataset, with the best parameter combinations from the different spectral processing approaches. k is the number of nearest neighbours in the k-nn classification algorithm, and w is the window size in the Savitzky-Golay smoothing algorithm.

Processing
Maximum success rate k w  (Fig. 5b). A PCA of the whole dataset using the classification parameters shows a pronounced arch effect, with one gradient on axis one curving round onto axis two (Fig. 6a). Samples of H. vulgare and Aegilops tauschii occur at either end of the gradient, with samples from the other taxa appearing in overlapping groups in the middle. Together these two axes account for 45 % of the variation in the data (Fig. S2b), and as with the unprocessed data further separation among taxa occurs on higher PCA axes ( Fig. S2d and f). Loading plots for axes one and two (Fig. 6b) confirm that most of the variation among spectra occurs in the fingerprint region, with differences in the 1200 to 900 cm −1 region again being important.
A cluster analysis of the taxon mean spectra processed with the classification parameters (Fig. 6c) shows one cluster comprising the Triticum and Avena samples, with H. vulgare and Aegilops tauschii branching off from this. The wild varieties of einkorn and emmer occur in subclusters with the domesticated varieties within the main Triticum and Avena cluster.

Discussion
Our results demonstrate a classification success rate of 95 % in the training dataset and 82 % in the validation dataset.  These findings show that FTIR-based chemotaxonomy has considerable potential as a means of classifying Poaceae pollen to study grass domestication, and the spread of agriculture and landscape change over the last 10 kyr. The ability to generate species level count data from grass pollen would allow a much fuller use of the palynological record in archaeobotanical studies and would provide an additional tool to complement other lines of evidence such as grass seeds and chaff, starch grains and, more recently, DNA data from sedimentary deposits and ancient grains (Crowther et al., 2016;Fuller, 2007;Mascher et al., 2016;Piperno et al., 2004;Savard et al., 2006;Vignola et al., 2017;Willcox et al., 2007). More generally, these results show that closely related taxa can be successfully classified based on their chemical signature alone. In the present dataset this includes discriminating between ancestors and their direct descendants (e.g. T. urartu and T. monococcum; T. dicoccoides and T. dicoccon; Fig. 1b), and between parent taxa and their hybridised offspring (e.g. T. aestivum as a hybrid of T. dicoccon and Aegilops tauschii; Fig. 1b), although it was also in these cases that the majority of misclassifications occurred (Fig. 5). The decrease in classification success rate from the training dataset to the validation dataset is mostly accounted for by T. dicoccon being misclassified as T. dicoccoides, T. aestivum and Aegilops tauschii. The PCA plots (Figs. 3 and 6) also show that the T. dicoccon spectra are more widely dispersed in ordination space versus those from the other taxa, suggesting that T. dicoccon is more chemically variable than the other species considered here and therefore more challenging to classify successfully. Nevertheless, the classification accuracy demonstrated here is comparable to that of other chemical and morphological grass pollen classification studies (Julier et al., 2016;Mander et al., 2013Mander et al., , 2014, justifying further research in this area. Consistent with previous studies (Julier et al., 2016;Woutersen et al., 2018), we have found that processing the FTIR spectra with smoothing and differentiation improves the classification success rate ( Table 2). The taxa in this study are closely related and chemically highly similar (Fig. 2), and working with derivatised spectra (Fig. S1) allows small-scale spectral features to be enhanced for use in multivariate data exploration and classification approaches. Our results also show that not only does processing improve the separation among taxa, it also influences their relative multivariate similarity and dissimilarity, and therefore the position of taxa relative to each other in the PCA ordination and cluster analysis (Figs. 3 and 6). The processing approaches selected therefore have implications for exploring phylogenetic patterns in chemical data (Julier et al., 2016), or even using chem-istry as a phylogenetic estimation tool in the fossil record. In the present case processing the spectra with Savitzky-Golay smoothing and differentiation brings all Triticum species together into one group, which makes sense from a classification and a phylogenetic point of view. However, Aegilops tauschii is then fully separated out from Triticum even though the two are very closely related, and Avena sativa is grouped in with Triticum even though it belongs to a separate tribe from all other taxa in the dataset. The presence of a phylogenetic signal in pollen chemical data, and its recoverability in FTIR spectra, whilst intriguing, requires further investigation with larger datasets.
To make this technique fully applicable to palaeoecological and archaeological settings, three relatively simple limitations in the current study will need to be addressed. First, to maximise the quality of the FTIR spectra the data here were generated on groups of grains in a 100 µm × 100 µm window, resulting in identification being made on a number of grains (8-10) rather than on the individual specimens that would need to be classified in fossil and subfossil samples. While FTIR spectra from individual pollen grains have typically suffered from scattering effects or increased noise (Zimmermann et al., 2016(Zimmermann et al., , 2015aZimmermann, 2018), this may be overcome by specific mounting approaches, such as a layer of soft paraffin between sheets of polyethylene foil (Zimmermann et al., 2016). An alternative to this methodological approach is the application of new instrumentation such as FTIR imaging systems, which combine high sample throughput with a high spatial resolution (achievable pixel resolution of ∼ 0.5 µm 2 ), enabling multiple high-quality spectra to be gathered from individual pollen grains. Raman microspectroscopy might form a valuable alternative data acquisition approach, where the resolution can also be ≤ 2 µm and multiple measurements are obtainable from single specimens (Bagcıoglu et al., 2015;Gottardini et al., 2007;Ivleva et al., 2005;Pummer et al., 2013;Schulte et al., 2008Schulte et al., , 2009Schulte et al., , 2010Zimmermann, 2010;Zimmermann et al., 2015a).
Second, the dataset presented here was generated from fresh pollen grains that include proteins, lipids and carbohydrates, as opposed to the isolated pollen walls that are present in sedimentary deposits (Jardine et al., 2015. The success of this technique therefore needs to be tested on isolated sporopollenin, either by acetolysing the pollen grains or via other processing methods (e.g. Domínguez et al., 1998;Gonzalez-Cruz et al., 2018;Loader and Hemming, 2000;Mundargi et al., 2016). Demonstrating the success of the technique on acetolysed grains would be particularly valuable, because this would allow many existing sets of processed pollen samples to be utilised . A recent FTIR study on Nitraria pollen (Woutersen et al., 2018) has shown clear species level taxonomic differentiation on chemically isolated single pollen grains, and our previous work (Jardine et al., 2015 has demonstrated that components of the sporopollenin biomacromolecule are stable after exposure to acetolysis procedures. We are therefore confident that taxonomic signals will be recoverable from processed fossil and subfossil material. Third, in common with a number of other studies that have tested the potential for classifying pollen using chemical or morphological data (Dell'Anna et al., 2009;Holt and Bebbington, 2014;Holt and Bennett, 2014;Holt et al., 2011;Julier et al., 2016;Mander et al., 2013Mander et al., , 2014Zimmermann et al., 2016;Woutersen et al., 2018), this research has focused on a relatively small dataset comprising only a few species. For this technique to be widely applied it will need to be tested on a much larger number of taxa, with each ideally being sampled from multiple individuals representing a range of environments (Holt and Bebbington, 2014), including variations in ultraviolet B (UVB) regime since this is known to influence pollen chemistry via the formation of UVB absorbing compounds (UACs) Lomax et al., 2008). This will provide a more realistic estimate of classification success when mixed, and potentially diverse, subfossil and fossil assemblages are analysed, as well as forming the basis of a chemical library that could be used as a training set for classification in archaeobotanical and palaeoecological applications.
Further enhancements to this approach may be possible by incorporating pollen grain size or surface sculpture information into the classification procedure. As already noted, the size variation across Poaceae pollen grains broadly scales with genome size, with generally larger grains in domesticated types than in wild types (Andersen, 1979;Andersen and Bertelsen, 1972;Bennett, 1972;Beug, 1961Beug, , 2004Bottema, 1992;Firbas, 1937;Tweddle et al., 2005). If combined with chemical data, pollen grain size, along with pore and annulus size, may be useful for improving the classification of wild types and their domesticated descendants. Surface sculpturing, as already utilised by the computational image analysis classification approach of Mander et al. (2013), could together with chemistry data offer a powerful set of tools. A challenge to integrating these different data types, however, is constructing a robust and efficient workflow. FTIR imaging systems allow for measurements of grain size quite readily, but it would be considerably harder to capture SEM images of the same grains for computational feature analysis.
We have used FTIR spectroscopic data from pollen grains to demonstrate that high levels of classification success are obtainable for differentiating among domesticated grasses and their wild relatives. This approach therefore offers much potential for leveraging further information from Holocene pollen data, for reconstructing the spread of agriculture and its impact on ecosystems and environments. It also has the potential to improve on current size-based classifications of domesticated and wild grasses, or sculpture-based classifications of polyphyletic groups of taxa. Future studies need to focus on expanding the number of taxa, and working with isolated sporopollenin from single pollen grains, to provide a more realistic test of classification potential in archaeological settings.