The Helgoland Experiment – assessing the influence of methodologies on Recent benthic foraminiferal assemblage composition

The aim of the present study was to compare preservation, staining and preparation techniques to assess the influence of different sample treatments and analyses on the accuracy of benthic foraminiferal assemblage data from NE Atlantic shelf seas. Replicate surface samples from the SE North Sea were preserved with ethanol–rose Bengal or formalin, some were stained after processing, or foraminifera were concentrated by flotation. Coloration of living specimens was different between samples treated with an ethanol–rose Bengal solution and those stained after washing. In the latter case, only the last two or three chambers were stained. The aliquot sample preserved with formalin showed dissolution features in agglutinated and porcellaneous species. Population density varied between different preservation, picking modes and investigators. The accuracy of picking was in the range of ±2 % (1σ), while external reproducibility ranged from −34 to +16 %. There was no significant difference between wet and dry picking. Samples that were concentrated by flotation generally yielded a lower number of specimens. Agglutinated species were under-represented in samples that were stained after washing and in the flotation concentrate. Size fractions showed a reduction of population density and Fisher alpha diversity index with increasing mesh size. Only half of the specimens and less than two-thirds of the species are captured if the >125 µm rather than >63 µm fraction is analysed. In oxygen minimum zones, where small-sized species dominate the assemblage, the recovery in larger size fractions could be lower.

certain area is required for a reliable environmental assessment in order to monitor these changes. Many benthic foraminiferal studies include preparational steps, however, in which certain parts of the foraminiferal assemblages are excluded from analyses (Schröder et al., 1987;Van Marle, 1988). How this artificial loss of faunal information affects species richness data or diversity indices, which are used to evaluate the environmental status of a certain location under investigation, has so far been poorly constrained.
The methodology of Recent benthic foraminiferal studies -in particular, sample treatment and preservation, identification of specimens living at the time of sampling, and different preparation techniques -has been debated since the 1920s. A plenary discussion during the FORAMS 2010 Symposium at Bonn, Germany, in September 2010 highlighted the need for a standardization method for environmental surveys using benthic foraminifera. Subsequently, a workshop was held at Fribourg, Switzerland, in June 2011 in order to develop a standardized protocol for field and laboratory methods to be applied in foraminiferal biomonitoring studies. During the preparation of this FOBIMO workshop and well before it took place, the question arose as to what would be the consequences for faunal data if a sample were to be preserved, processed and analysed by different methods? Such comparisons have scarcely been made based on aliquots of the same sample and, if so, they consider only a single aspect, e.g. vital stains . We, therefore, took a large surface sediment sample, processed and prepared aliquots with different methodological approaches as described in the literature, and discussed the respective faunal assemblages with FOBIMO workshop participants. The results corroborated some of the FOBIMO recommendations but they were not presented in detail .
The aim of the present paper is to compare different preservation, staining and preparation techniques to constrain the internal data variability and bias as inferred by different methodologies. A second objective is to constrain variations produced by different investigators analysing the foraminiferal assemblages. In an experimental approach, hereafter called the Helgoland Experiment, aliquots of the same sample were treated with different methods and studied by different persons. The consequences for the accuracy of foraminiferal data are addressed.

MAtErIAL And MEtHods sampling operations
The study site was visited with R/V Senckenberg on 28 March 2011. An interface corer had no recovery and thus a box corer of approximately 400 kg weight and box dimensions of 20 × 30 × 46 cm was deployed (Bouma & Marshall, 1964). We took a 0-1 cm sediment sample from a visually intact part of the box core surface by using a plastic tube and graduated ring of 5.7 cm inner diameter (Murray, 2006, p. 10). This sample was to check the current status of the foraminiferal fauna at the study site. The sample was preserved and stained with a rose Bengal-ethanol solution (Lutze & Altenbach, 1991). After four hours, the test sample was washed through a 63 µm screen. The sample volume was 24 cm 3 , an aliquot of 7.5 cm 3 was wet screened and 67 stained specimens of textulariid, miliolid and rotaliid species were found. The population density was calculated to be 77 living specimens per 10 cm 3 . The test sample revealed that the study site was suitable and that a 50 cm 3 sample would contain a sufficient number of living individuals for a reliable census.
R/V Senckenberg returned to the site on 29 March 2011. Five box cores were taken while the vessel was drifting, starting at 54°5.106' n and 7°37.633' E ( Fig. 1). The deployments were 60 to 220 m apart. The surface preservation was different between the deployments: some appeared to be washed, others were seemingly intact. The abundance of macro-organisms on the sediment surface also varied. We therefore decided to blend samples from the different deployments in order to rule out variations due to both natural patchiness and quality of surface preservation. Two surface sediment samples were taken from each deployment. For sampling, a plastic frame of 87.6 cm 2 was pressed into a less disturbed part of the box core surface and the uppermost centimetre was carefully scooped out with a spoon. The sediment was collected in a graduated 1000 ml plastic beaker. The final volume of 5 × 2 samples was 860 ml, which indicated that levels deeper than 1 cm were indeed not sampled.
The sample blend was homogenized for several minutes with a 2-blade plaster-stirring paddle mounted to a cordless drill driver. The stirrer was moved up and down several times to ensure that the entire sample was thoroughly mixed. Homogenization was terminated when the blend attained a plain, soft and creamy consistency, and no streaks of different coloration or texture were visible any more.
Twelve samples of the same volume were taken from this blend for different treatments and analyses (Table 1). They were first filled into a 50 cm 3 high-density polyethylene (HDPE) measuring cup with a rounded bottom, then scooped out with a small, elastic dough scraper as used in cooking. Ten aliquot samples were transferred into 200 cm 3 PVC sample vials. A preservative volume of about 1.5 the sample volume was added to aliquots preserved with ethanol (98 %, technical quality) or with a solution of 2 g rose Bengal in 1 litre ethanol (98 %, technical quality). The ethanol was diluted by the pore water of the samples. With a given porosity of 31 %, the effective alcohol concentration in the sample vial was estimated to be 83 %. Two aliquots were transferred into 1000 ml HDPE vials and preserved with formaldehyde with a concentration of 4% in seawater and buffered to a pH of 8.5 with 31 g hexamethylentetramine per litre . About 900 cm 3 , i.e. 18 times the sample volume, of this buffered formaldehyde solution (formalin) was added. All samples were shaken for more than a minute to ensure a complete mixing with the preservative. Additionally, 10 cm 3 samples were taken with cut-off syringes for physical properties, grain size, carbonate, and organic carbon analyses.

Laboratory methods
The present study involved three micropalaeontological laboratories at kiel, Germany, Fribourg, Switzerland and St Petersburg,  Rhumbler (1938), X after Jarke (1961, and 15254 through 15256 after Wang (1983). Depth contours and outline of the isles were drawn after nautical charts. note that sounding depths refer to the chart datum for Helgoland, which is 1.8 m below German Reference Surface or mean tide level. Russia. Different methods and devices were used in each laboratory. For instance, when samples were split in a wet stage, a Scott-Splitter was used at kiel (Scott & Hermelin, 1993) and a glassware riffle-splitter was used at Fribourg (Rupp, 1986). Dry sample residues were split with a Green Geological GG-101 microsplitter at kiel (modified after Otto, 1933), a glassware riffle-splitter at Fribourg, and by cone-and-quartering at St Petersburg (Brittain, 2002;Gerlach et al., 2002). The splits were weighed at kiel and Fribourg and their true proportions were calculated. The difference between target value and split weight varied between -2.5 % and 0.8 % and was -0.5 % on average. The foraminiferal tests from one sample were enriched by flotation at GEOMAR. A quarter split was gently tipped into a beaker with trichlorethylene under a fume hood. The liquid was stirred and foraminiferal tests and organic particles rose and floated on the surface or remained in suspension while the mineral grains sank to the bottom of the beaker. The liquid was decanted through a 63 µm sieve. The residue in the beaker and the concentrate on the sieve were left drying under the fume hood at room temperature over a weekend. Concentrate and residue were both picked for living foraminifers. Splits considered for dry picking were size fractionated with stacked sieves of 76 mm diameter into the grain-size fractions 63-125 µm, 125-150 µm, 150-250 µm and 250-2000 µm at GEOMAR, kiel, in order to facilitate microscopic work. In the laboratories at Fribourg and St Petersburg, the residue >63 µm was not subdivided into different size fractions before dry picking. A Leica WILD-M3C, nikon SMz1500 and Leica M205C dissecting microscope was used at kiel, Fribourg and St Petersburg, respectively. A magnification of ×40 was used for dry and wet picking by all investigators. Higher magnifications of up to ×64 were used to examine single specimens or for taxonomic investigations. The residue was submerged in tap water that had been boiled in order to precipitate scale before it was used at kiel. untreated tap water was used at Fribourg and St Petersburg. In the latter laboratory, a dry sample split had been soaked with tap water before wet picking whereas the designated split was obtained by wet splitting and thus had not been dried before at kiel and Fribourg. The residue was spread in a petri dish with a paintbrush under immersion in all laboratories. Picking was done with a small paintbrush using incident light at St Petersburg and direct light at Fribourg. A mixture of direct and incident light was applied and a Pasteur pipette was used for picking at kiel. The picked foraminifera were sorted by species in Plummer cell slides at kiel and Fribourg, and in a cardboard double-cell slide at St Petersburg. The specimens were fixed with glue and thereafter counted in all laboratories. Only the census of the 250-2000 µm fraction was based on the notes taken after picking at GEOMAR, kiel.
Light microscopy images were taken with a MPX2051 CCDcamera (AOS™) mounted to a navitar™ 6.5× zoom microscope at GEOMAR in order to visualize the different staining patterns. Test preservation and microstructure was examined with a CamScan CS-44 scanning electron microscope (SEM) at the Institute of Geosciences, kiel university. The elemental composition of agglutinated grains of Eggerelloides scaber was assessed by EDX electron microprobe analyses at the Department of Geosciences, university of Fribourg. Faunal indices were calculated with the statistics program PAST v1.74 (Hammer et al., 2001). sample processing Six foraminiferal samples were processed at GEOMAR Micropaleontology Laboratory, kiel, Germany. One sample each was processed and analysed at Fribourg, Switzerland, and St Petersburg, Russia. Three samples preserved with ethanol-rose Bengal and one sample preserved with formalin were not processed and kept as archive materials (Table 1). Five samples preserved with the ethanol-rose Bengal solution were stored for more than a month in order to achieve complete staining of the protoplasm of specimens living at the moment of sampling (Lutze & Altenbach, 1991). Only the sample that was designated to be 'washed too early' was processed six days after the cruise. The samples preserved with ethanol or formalin were stored for 14 or 71 days before processing, respectively (Table 2). Samples preserved with ethanol-rose Bengal solution were processed at GEOMAR following a standard protocol that was developed in the 1970s at the Institute of Geosciences, kiel university (e.g. Wefer, 1976). A very similar procedure was applied at Fribourg and St Petersburg, even though some steps of the kiel protocol were skipped (Table 3). For instance, the sediment volume was not determined at Fribourg and St Petersburg (Steps 1 and 8) because the initial volume was well constrained. The >2000 µm size fraction was not separated (Steps 2 and 3) because larger particles, such as pebbles or mollusc shells, were very rare. Once they occur in substantial amounts, pebbles, in particular, could damage small and delicate foraminiferal test if they were washed too long with the sample. Three samples were processed at GEOMAR in a different way, i.e. the sample designated for 'later staining' which was preserved with unstained ethanol, the sample designated for 'wet picking, dry picking, and flotation' which was subdivided with a Scott-Splitter in a wet state after washing, and the sample preserved with formalin (Table 3).

sample preparation and analyses
The samples were very rich in foraminifera. From samples that were to be picked dry, a fourth or eighth split was made if the 63-2000 µm fraction was to be analysed. The split size was chosen to obtain a target value of 150 to 200 specimens, a number that was considered to provide fairly accurate results with reference to the rather low diversities of north Sea faunas (Fatela & Taborda, 2002). With the exception of the 'washed too early' sample, we consistently used a fourth split in order to keep the yield and time effort uniform between different observers. The splits or their subfractions were picked for well-stained foraminifera that are considered to have been living at the time of sampling (Murray & Bowser, 2000). Some large Quinqueloculina specimens had to be soaked with water to make their staining pattern more visible or the last chamber had to be punctured to see whether the test contained protoplasm. At GEOMAR, living specimens from 63-125 µm, 125-150 µm, 150-250 µm, and 250-2000 µm size fractions were collected in single cell slides, their number and preliminary species names were noted. From these figures, an estimate was made which split size would be necessary to analyse the 125-2000 µm or 150-2000 µm size fractions, which was 0.5 or 0.75, respectively. The assemblage composition of the 250-2000 µm size fraction was not obtained from a separate split or sample. It was calculated as the sum of counted specimens from the respective size fractions of the other samples as noted after picking. This procedure was justified as only four different species were encountered in the 250-2000 µm fraction of all samples, these species were common and easy to identify. They were also recorded in smaller size fractions, and thus no bias of diversity measures was inferred. After the living specimens were sorted out, the size fractions that were separated to facilitate microscopic work were put together again and archived. For analysis of the dead assemblage, a further aliquot was made with the microsplitter from the 63-2000 µm fraction of one sample and picked for empty tests.
Living faunas and the dead assemblage obtained from the different size fractions were put together again at GEOMAR before they were sorted by species in Plummer cell slides. In order to determine the range of variability and thus accuracy of faunal census data, each observer picked in a dry state several splits of the 63-2000 µm fraction from one sample. Four 1/8 splits from the 'washed too early sample' were picked at kiel whereas three 1/4 splits were analysed at Fribourg and St Petersburg. unless otherwise stated, population densities and species proportions are reported as mean values of these internal replicates.
Species determinations and cell slides were cross-checked between the investigators on the occasion of two meetings at Fribourg in June 2011 and August 2012. The slides and census data were again scrutinized at GEOMAR in October 2012. Slides, counting sheets, sample residues and untreated aliquots are archived at Senckenberg Forschungsinstitut und naturkundemuseum, Frankfurtam-Main, Germany.

rEsuLts staining patterns
Stained foraminifera from the 63-2000 µm aliquot that was picked dry showed a strong and homogeneous coloration in all specimens (Plate 1). The intensity varied between species. Ammonia batava was consistently lighter in chroma than Elphidium excavatum, the same held true for Bolivina species as compared to Stainforthia fusiformis. Quinqueloculina seminulum was well-stained in and around the aperture only. However, the pink colour of the protoplasm was shimmering through the opaque shell at thinner parts of the chamber walls, near the sutures and close to chamber junctions.
Stained foraminifera from the aliquot that was picked in a wet state showed a complete coloration of all specimens. However, the intensity was lower than in specimens that were picked dry. In particular, the colour of Ammonia batava was dull and the fill of some chambers appeared to be faded out. The coloration tone of Elphidium excavatum was more brownish than in specimens that were picked dry. The staining of Quinqueloculina seminulum was less intense. The pink colour of protoplasm was shimmering through as a faint spot at the chamber junctions only. The coloration of stained foraminifera obtained from concentrate obtained by flotation with trichlorethylene showed no difference to the staining patterns of specimens from the 'dry-picking' aliquot. This applies for agglutinated, porcellaneous and hyaline species. Living foraminifera from the sample that was washed too early showed a slightly different staining pattern than specimens that were stained with ethanol-rose Bengal for more than a month. In particular, the coloration of Ammonia batava appeared fainter than at longer impregnation times. The colour of Elphidium excavatum was more brownish and the chamber fill was granular and more structured as compared to specimens that were exposed to rose Bengal for a longer time. The staining of Quinqueloculina seminulum was also less intense. The pink colour of protoplasm was shimmering through at the chamber junctions only. The arenaceous species Eggerelloides scaber and small buliminid taxa showed a homogeneous and bold staining of all chambers, however.
Foraminifera that were stained after sample processing with an aqueous rose Bengal solution showed a markedly different staining pattern than specimens where the stain was applied dissolved in the preservative (Plate 1). From the sample preserved with ethanol, living specimens of Eggerelloides scaber showed a pink-coloured detritus cyst at the aperture and a well-stained final chamber, only in 4 of 23 specimens the 2 nd or 3 rd chamber were also stained. The final 2 or 3 chambers of larger Stainforthia, Hopkinsina, Buliminella and Bolivina specimens were consistently well-stained, though smaller specimens were stained completely. Ammonia batava showed bright staining in the spiral suture, umbilical area and sutures on umbilical side containing detritus, as well as the protoplasm of the final 1 to 3 chambers. The rest of the test showed an opaque fill of brownish-cream colour. The final chamber was stained strongly pink in larger Elphidium excavatum and up to 3 chambers in juvenile specimens. The other chambers were yellowish-light green in colour. no specific pattern was recognized in Quinqueloculina seminulum, however.
Stained foraminifera from the sample preserved with formalin showed a similar pattern (Plate 1). A pink-coloured detritus cyst at the aperture of previously living Eggerelloides scaber was either reduced or absent. The final 2 to 3 chambers were stained strongly pink, smaller specimens were stained completely. Tests of all Quinqueloculina seminulum specimens, either living or dead at the moment of sampling, were coloured light pink throughout, and the shell was strongly corroded (see below). The final 4 to 5 chambers were stained in larger specimens of buliminid taxa. The earlier chambers showed an opaque infill of light greenish-cream in colour. Smaller specimens were stained throughout. In Ammonia batava, 1 to 2 chambers were faintly stained, which were not necessarily the last two chambers. The spiral suture and sutures or incisions on the umbilical side were brightly stained only in the vicinity of a stained chamber. Other parts of the test showed an opaque fill of whitish cream in colour. Elphidium excavatum showed, in principle, the same pattern with two stained chambers at maximum. The earlier part of their test was light yellowish-green in colour but transparent.

Preservation
Preservation was very good in general. Fragmentation and loss of final chambers was rare in samples that were picked dry. Foraminiferal specimens picked wet did not show indications of dissolution or destruction. This supports the contention that the foraminiferal tests were not damaged through the homogenization process of the bulk sample when the sample was mixed by using a paddle attached to a cordless drill. Only the wet-picked tests of Quinqueloculina seminulum were not so bright as from samples that were picked dry. Even though we used boiled tap water for wet picking from which scale had been precipitated already, the slightly mat appearance of the Quinqueloculina shell might be due to submicroscopic CaCO 3 deposits precipitated from the water during desiccation. The tests from the sample that had been concentrated with trichlorethylene did not show any signs of dissolution or different shine or colour of the tests. The sample preserved with formalin showed several signs of dissolution affecting arenaceous as well as calcareous species. Eggerelloides scaber was quite well preserved but the agglutination appears to be coarser in that single large grains were seemingly protruding from the outer surface. The wall finish was much smoother in specimens from samples preserved in ethanol. Tests of Ammonia batava preserved with formalin had a granular shine; they appeared as being dusty. Such an aspect was not recognized in Ammonia shells from ethanol-preserved samples. Quinqueloculina seminulum was strongly corroded. Exposed parts of the outer chamber walls were worn off or missing. The outer wall face was dull, though the interior was often bright, in particular close to strings of cytoplasm. SEM images of Quinqueloculina seminulum specimens preserved with formalin revealed that the outer parquetry layer of small calcite plates was largely dissolved and that the inner part of the shell with randomly orientated calcite needles was partly worn. These needles were thinner than in the shell of Quinqueloculina seminulum preserved with ethanol ( Fig. 2). In Ammonia batava a significant enlargement of pores, scaling and irregular dissolution scars were recognized on the shells of specimens preserved with formalin. The agglutinated shell of Eggerelloides scaber also experienced surface corrosion (Fig. 2). Small grains were rounded and the voids between them were slightly larger than in tests preserved with ethanol. Eggerelloides scaber builds a wall of up to 0.5 mm large particles, which are embedded in a fine-grained agglutinated matrix. The          matrix particles were bound by a thin film of organic substance where they were in contact with each other (Murray, 1973;Bender, 1989). EDX element mapping performed at the Department of Geosciences, university of Fribourg, revealed that the fine-grained matrix was uniformly rich in Ca and Mg. This indicates that matrix particles contain a higher proportion of carbonate than the larger grains, which on the other hand were unusually rich in heavy minerals, such as zircon and rutile. The fine, carbonate matrix grains were probably corroded by the altered formalin solution leading to the observed alteration of surface structures.
Preserving marine organisms at sea in a 4 % seawater formalin solution is still used in meiofaunal, foraminiferal and zooplankton studies (e.g. Rathburn et al., 2003Rathburn et al., , 2009niehoff et al., 2012) despite formalin being banned in many countries and from research vessels for health reasons. Manuals on marine biology note that formalin has to be buffered with hexamethylentetramine at a concentration of 10 g l -1 , or with Borax in excess in order to avoid carbonate dissolution (e.g. Thiel, 1983). An adjustment to pH of 8.2 is recommended, and the pH of the solution is to be checked at four weeks and six months after sampling (Hemleben et al., 1989, p. 34). We skipped the first inspection date and checked the pH of the formaldehyde-seawater solution of the archive sample on 15 June 2011, 2.5 months after sampling. The pH was 7.6, approximately one unit lower than the initially buffered solution and 0.5 units lower than seawater at the respective salinity. This pH as lowered by the formation of formic acid is still in the basic range but corrosive to calcareous shells as described above. It may be speculated whether the presence of carbonate ions in seawater and thus the CO 2 system could have exerted a certain influence and lowered the carbonate saturation state. Dissociation constants of CO 2 species in formaldehyde-seawater solutions are poorly constrained and any calculations of the carbonate saturation state omega (calcite) in the sample preservative, for instance following Lewis & Wallace (1998), would produce unrealistic values.

Population density
The population density was quite variable between different preservations, picking modes, size fractions and also between different examiners. Samples picked at GEOMAR, kiel, showed population densities ranging from 75.5 to 147.6 living specimens >63 µm per 10 cm 3 surface sediment (Table 4). The lowest abundances were recorded in samples that were stained after sample processing with an aqueous rose Bengal solution (75.5) or preserved with formalin (86.9). The population density calculated from specimens that were found in the flotation concentrate was with 93.6 specimens >63 µm per 10 cm 3 also rather low. The concentrate contained only 63 % of the living specimens from that aliquot. This recovery was markedly lower than literature data on heavy liquid separation (93% and 97%; Gibson & Walker, 1967;Lutze, 1968), and perhaps the flotation should have been repeated as suggested by Wefer (1976, p. 11).
Samples that were preserved with an ethanol-rose Bengal solution and picked dry showed quite similar population densities of 144.1, 142.6 and 147.6 specimens per 10 cm 3 ( Table 4). The mean value of 144.8 and standard deviation (1σ) of 2.6 specimens >63 µm per 10 cm 3 revealed an internal reproducibility of ±2 %. The population density from the aliquot that has been picked in a wet stage was with 130.9 specimens per 10 cm 3 more than 2σ units below the mean value of the dry-picked aliquot and other samples and thus was significantly lower. In samples that were analysed at St Petersburg and Fribourg, the average population densities of the dry-picked splits yielded 167.5 and 95.9 specimens per 10 cm 3 . The densities were either higher by 16 % or lower by 34% than the average population density from samples or aliquots that were dry-picked at kiel. Aliquots picked in a wet stage at St Petersburg and Fribourg yielded population densities that were lower by 17.9 (-11 %) or higher by 19.4 (20 %) living specimens >63 µm per 10 cm 3 than in the dry-picked splits. It has to be noted, however, that these differences were not substantially higher than the variability among the three dry-picked splits of ±10.3 and ±14.3 specimens per 10 cm 3 respectively (1σ values).
Population densities were distinctly lower in the larger grainsize fractions due to the loss of small-sized species or juvenile specimens. In particular, the density in the >125 µm fraction was, with 66.6 specimens per 10 cm 3 , less than half the average population density of the >63 µm fraction (Fig. 3). The density was, with 51.1 specimens, only slightly lower in the >150 µm as compared to the >125 µm fraction. The >250 µm fraction yielded 10.5 living specimens per 10 cm 3 and thus the population density was only 7% of the >63 µm fraction. These figures are in good general agreement with literature data, even though the difference between the >125 µm and >150 µm size fractions was higher in deep-sea faunas (Schönfeld, 2012).
The abundance of empty tests from the dead assemblage was, with 4123 specimens >63 µm per 10 cm 3 , 29 times as high as the average population density of the living fauna.

Assemblage composition
Eggerelloides scaber (30 %) and Elphidium excavatum (20 %) were frequent in the >63 µm fraction (dry-picked aliquot). Stainforthia fusiformis (14 %) and Ammonia batava (10 %) were common. Quinqueloculina seminulum (5 %) and the other species were common to rare (Fig. 3)  comparisons, the confidence interval of the census as given by the 1σ binominal standard error has to be considered (Patterson & Fishbein, 1989;Fatela & Taborda, 2002). In particular, Ammonia batava was slightly more frequent in the samples that were stained after processing or preserved with formalin, and slightly less frequent in the aliquot that was picked in a wet stage as well as in the flotation concentrate as compared to the sample aliquot that was picked dry (Fig. 4). Eggerelloides scaber was more frequent in the sample that had been washed too early, significantly enriched in the assemblage preserved with formalin, but strongly reduced in the flotation concentrate as compared to the dry-picked sample. Quinqueloculina seminulum was slightly enriched in the wet-picked sample. Stainforthia fusiformis showed lower proportions in the samples that were stained after processing or preserved with formalin, significantly lower in the latter; it was strongly enriched in the flotation concentrate. In contrast, Elphidium excavatum showed a varied pattern with slightly higher percentages in the samples that were washed too early or stained after processing. The assemblage composition was different in the larger grainsize fractions (Fig. 3). In the >125 µm fraction, the proportion of Eggerelloides scaber increased from 30 % to 47 % and Ammonia batava increased from 10 % to 13 %. Stainforthia fusiformis and other small-sized buliminids, which were abundant in the >63 µm fraction, decreased from 14 % to 5 % and 15 % to 8 %, respectively. This trend continued in the >150 µm fraction, even though the relative changes were much smaller. The proportions of Eggerelloides scaber, Quinqueloculina seminulum and Ammonia batava increased by 1 % to 3 % while the proportion of Stainforthia fusiformis decreased by 4 %. In the >250 µm fraction, Ammonia batava (52 %) and Quinqueloculina seminulum (37 %) became the dominant faunal elements. Eggerelloides scaber (8%) and Elphidium excavatum (4 %) were low. The dead assemblage >63 µm showed a slightly different composition. Dominant species were Elphidium excavatum and Ammonia batava, though with proportions higher by 16 % and 12 % than in the living assemblage. Nonion depressulus and Eggerelloides scaber were also frequent. The latter comprised only8 % as compared to 30 % of the living assemblage. Quinqueloculina seminulum and Stainforthia fusiformis were not recorded in the dead assemblage. It has to be noted, however, that 11 species comprising 16.2 % of the dead assemblage were not recorded in the living assemblages of any split or subsample. They were common in the deeper part of the north Sea, as for instance Elphidium incertum or Cassidulina crassa. Others were probably re-deposited from intertidal or shallow subtidal areas closer to the mainland shore or the isle of Helgoland (Fig. 1), e.g. Cibicides lobatulus and Elphidium williamsoni.

Faunal indices
The faunal diversity showed considerable variability which mirrored the effects of preparation techniques on the assemblage composition and population density. The species richness, expressed as expected number of species in a sample of 100 individuals (ES(100); Hulbert, 1971;Olabarria, 2005), varied between 11 and 16, average 13 in the living fauna >63 µm. The ES(100) differed only slightly from the observed species richness, which varied between 10 and 18, average 14. The Fisher alpha index presented a more detailed picture (Table 4). Samples preserved with ethanol-rose Bengal solution and picked dry had a mean Fisher alpha of 4.47 whereas the index of samples that were stained after washing was 4.00, slightly lower. The wet-picked aliquot had a Fisher alpha of 2.99, which indicated that a few species were not found when applying this picking technique. The alpha indices of samples that were analysed at St Petersburg and Fribourg also showed a marked difference between wet and dry picking. Even though the alpha index from dry picking was the mean value of three splits, it was 4.04 and 3.55, thus substantially higher than 2.86 and 2.94 as obtained from wet picking of one split at St Petersburg and Fribourg, respectively. In larger size fractions, the Fisher alpha decreased from 4.63 in the >63 µm fraction (dry-picked aliquot) to 2.04 and 1.96 in the >125 µm and >150 µm fractions, respectively (Fig. 3). At 0.89 it was lower by about a half in the >250 µm grain-size fraction. It has to be noted, however, that in the case of samples from this study, variations in Fisher alpha values of less than ±0.7 are difficult to interpret (see below). Mirroring the increasing proportions of frequent species as described above, the Dominance expressed by the 1-Simpson index, where very low values characterize a fauna in which all taxa are equally present and values close to 1 indicate that one species dominates the community, increases from an average of 0.2 in the >63 µm fraction to 0.5 in the >125 and >150 µm fractions, and further to 0.7 in the >250 µm fraction (Table 4).

Parallel splits
The number of living specimens found in the parallel splits showed a variability of +19 % to -18 %, +14 % to -16 % and +7 % to -5 % with reference to mean values of 77, 112 and 209 specimens from samples examined at kiel, Fribourg and St Petersburg, respectively (Table 5). A strong negative relationship of variability range and number of recorded specimens (r = 0.96) suggest random probability as the main reason for the observed variations in the number of living specimens encountered in parallel splits (e.g. van der Plas & Tobi, 1965).
The proportions of most species varied only slightly between the parallel splits. Indeed, the variability was well within the range of binominal 1σ standard error from probability statistics, slightly less in most cases (Table 5). In cases where a species is morphologically very distinctive, such as Eggerelloides scaber, or where a species is very familiar to the investigator, such as Ammonia batava, the range of variability in the count data was markedly lower. Training effects are considered to have exerted only a limited influence on the proportions of some species. For instance, Quinqueloculina seminulum and Stainforthia fusiformis were found with more specimens in sample splits that were picked later at Fribourg and kiel. The increase in the latter species was paralleled by Hopkinsina pacifica, indicating that the capability to recognize small, elongated specimens had successively improved. none the less, Elphidium excavatum was substantially less abundant in some splits. This pattern was recognized in all three laboratories, and it induced a comparatively high standard deviation to the average proportion of this species. The stochastic drawdown of Elphidium excavatum could neither be attributed to random probability nor to individual training effects and remained enigmatic.
The census data of Ammonia batava as a common and Eggerelloides scaber as a frequent species from the parallel splits were added consecutively and plotted as proportion with the binominal 1σ standard error. The data depicted a successive attenuation of the percentages to a near-asymptotic value at high numbers of counted specimens (Fig. 5). The proportion of Eggerelloides scaber in the first one or two splits was outside the 1σ range of the near-asymptotic value, in particular once the number of counted specimens was below 150. With Ammonia batava, this pattern was not recognizable with certainty. The data suggest that for splitting north Sea samples with the given diversity to a manageable size, a target value of more than 150 specimens is appropriate to capture the proportions of frequent species with sufficient accuracy.
Rarefaction curves from the same data demonstrated that even with 600 counted specimens a level was not reached where almost every species had been recorded (Fig. 5). Instead, the curves suggested a target value of about 300 specimens, which has been commonly recommended in the literature (e.g. Murray, 2006;Schönfeld et al., 2012). Once 150 more were counted, an increase in species richness by approximately 10 % would be achieved, i.e. 2 or 3 more and supposedly very rare species. The Fisher alpha indices of living assemblages from the parallel splits varied by ±0.5 to ±0.7 (Table  4). Diversity variations between samples of less than ±0.7 Fisher alpha units are, therefore, considered as not significant.

Influences of rose Bengal staining
There is hardly any aspect of methodology in Recent foraminiferal studies which has fuelled such a controversial debate than the reliability of cytoplasm stains. At present, rose Bengal staining is widely applied (Schönfeld, 2012). This method was introduced by Walton (1952) to distinguish benthic foraminifera that were living at the moment of sampling from empty tests. Rose Bengal stains any proteins, not only foraminiferal protoplasm but also adherent bacteria or algae, small nematodes dwelling in empty tests, and other metazoans. Furthermore, decaying protoplasm of foraminifers that are considered as being already dead is also stained (Walker et al., 1974;Bernhard, 2000). The colour and intensity of the staining varies among species (Lutze & Altenbach, 1991;Schönfeld et al., 2012). Therefore, the assessment of whether or not a specimen was living at the time of sampling requires certain experience and a critical view in order to minimize bias inferred by subjectivity (Murray & Bowser, 2000).
Our observations revealed that rose Bengal coloration intensity was not only different between species but also between different sample treatments (Plate 1). This may interfere with speciesspecific patterns or the natural colour of their cytoplasm. For instance, Elphidium excavatum exhibited a brownish staining whereas Bolivina species were generally light rose. none the less, well-stained specimens considered as living at the time of sampling were clearly discernible from empty tests or those containing remains of decaying cytoplasm or bacteria (Lutze & Altenbach, 1991). This applies to samples that were preserved with ethanol or formalin, stained before or after washing, and picked wet or dry. The recognition was consistent between the investigators working at kiel, St Petersburg and Fribourg. The clear coloration pattern, and the fact that the original cytoplasm was fairly well recognizable in calcareous tests where only the first chambers were stained (Plate 1), again demonstrated the reliability of rose Bengal as cytoplasm stain.
A striking difference in staining patterns was recognized between samples where the stain was applied dissolved in ethanol during preservation and samples that were stained after washing with an aqueous rose Bengal solution. In the latter case, only the last few chambers were impregnated. The number of stained chambers was species specific. It was higher in bulimind taxa that were bathed in the rose Bengal solution for 24 hours than for 30 minutes. With respect to specimens documented by Bernhard et al. (2006, figs 1a-d) that were treated for 24 to 48 hours and completely stained, it is justifiable to assume that the proportion of stained chambers increased with time. This also holds true for samples that were stained with a rose Bengal-ethanol solution . In a given sample, the proportion of chambers stained with an aqueous rose Bengal solution was generally higher in species with small-sized tests. These relationships suggest that rose Bengal staining is effected by random diffusion through the protoplasm, which has species-specific properties. Seeping through the foramina between successive chambers, as previously suggested by Schönfeld (2012), may be of secondary importance for the spread of the stain. The foramina are apparently smaller in minute tests and thus the stained portion should be less, which in fact is not the case. The completeness and intensity of staining has important implications for species recognition. The population density and thus the number of specimens identified as living was lower by 46% in the sample that had been stained with an aqueous rose Bengal solution after washing and lower by 37 % in the sample that was preserved with formalin, both with reference to the average population density of dry-picked samples that were stained with a rose Bengal-ethanol solution during preservation. The later-stained samples were processed and picked by the same person during the same times of the day and any days of the week, as all the other samples. A systematic bias caused by lows in the circadian rhythm, weariness or divertissement through other office activities is less likely. Hence, more than a third of the specimens have been overlooked due to their incomplete staining, far more than the 14-19 % bias as inferred by random probability. The assemblage composition of samples that were stained after processing or preserved with formalin was not completely different, indicating that all species were affected. Systematically lower proportions of Stainforthia fusiformis suggest that small, elongated taxa were preferentially missed. On the other hand, Ammonia batava showed higher proportions not entirely counterbalancing the reduction of Stainforthia fusiformis. The chroma was more intense than in specimens stained with a rose Bengal-ethanol solution.
Although only a small part of the test was stained, the later-stained specimens were securely identified as living individuals. none the less, our results corroborate that the highest population densities and thus complete and most reliable faunal census data can be obtained only from samples stained with a rose Bengal-ethanol solution during preservation. A complete and strong staining is effected after an impregnation time of more than two weeks .

wet and dry picking
Wet and dry picking were both applied in Recent foraminiferal studies (Schönfeld, 2012, table 1). Dry picking has more commonly been used as it has been considered less time-consuming and easier to perform (e.g. Wissing & Herrig, 1999;Murray, 2006). Wet picking has been recommended for samples rich in organic debris, or when soft-shelled or fragile arenaceous species are to be recorded (Brodniewicz, 1965;Bernhard & Sen Gupta, 1999;Scott et al., 2004). Detailed investigations suggested that both methods provide benthic foraminiferal diversities that are accurate for environmental assessments, and that dry-picked samples only lack fragile or nonfossilizable taxa (Bouchet et al., 2012).
In samples from the Helgoland Experiment one individual of the soft-walled Leptohalysis scotti was found in the wet-picked sample split. The species was recorded in dry-picked samples, in the flotation concentrate, as well as in one parallel split picked at St Petersburg. The other species were also found in both dry-and wet-picked samples. However, the population densities and thus the number of recognized living specimens varied substantially between splits that were picked in a wet or dry stage. none the less, population densities were consistently lower in wet-picked samples analysed in all three laboratories. According to personal experience, this pattern could reflect the inclination to overlook stained specimens when they are submerged. In dry-picked samples, tests containing bacteria-rich detritus or metazoans could be mistaken for specimens containing cytoplasm more easily. However, all specimens obtained by both picking modes were carefully cross-checked for their staining pattern. Therefore, the results presented here cannot be used with certainty to determine whether a picking mode infers a systematic under-or overestimation. This also applies to the use of incident or direct light for wet picking. If the species proportions are compared, all data are within the 95% confidence interval as defined by the 2σ binominal standard error. Substantial differences from the 1:1 target line were depicted by some frequent species only (Fig. 6). The comparison revealed that a lower proportion of the agglutinated Eggerelloides scaber had been recorded by wet picking at kiel, and by dry picking at Fribourg and St Petersburg. Lower values of Elphidium excavatum have been recorded with wet picking at Fribourg and with dry picking at kiel. This species showed a stochastic, varied pattern in the parallel splits and, therefore, no systematic offsets in species proportions between the different picking modes were recognized.

Aliquotes and accuracy
All subsamples considered in this study were subsamples from a large sample that was blended from samples taken on five cruise deployments. The question remains: how homogeneous was the mixture and to what extent can the variability described above be attributed to bias inferred by subsampling, sample processing and splitting? These are physical treatments. In mechanics and engineering, accuracy and error estimates are commonly performed with the root-mean-square (rms.) method based on residuals from the mean, thus taking into account the total variance of measurements (e.g. Taylor, 1997). The number of aliquot samples was six in the present study and thus too low for a sound statistical treatment of the data, for instance by using AnOVA. The total data variability may, therefore, provide only a rough estimate of the accuracy magnitude.
Sample volumes were determined with a graduated cylinder with a precision of ±0.5 cm 3 . The measured volumes were 49.5 cm 3 on average with a variance of 1.9 cm 3 , i.e. 4%. The mean volume was slightly less than the target volume of 50 cm 3 . This could be due to the fact that after transferring the subsample from the measuring cup to the sample vial, some sample material was left sticking to the cup walls and dough scraper, which was missing from the sample. The mean weight of the sample residues after washing on a >63 µm screen was 59.1496 g with a variance of 2.4184 g or 4 %. Residue weight and sample volume did not co-vary. As such, it is conceivable that clogging of the sieve during washing induced most of the variability in residue weight. All samples were washed for the same length of time but clogging happened at different times, after 7-10 minutes. Once a considerable portion of the meshes was plugged with elongate quartz crystals, the probability of washing through grains <63 µm decreased. If a certain degree of clogging occurred earlier, the residue amount would then be higher provided this sample was washed as long as other samples.
Sample splitting also infers certain variability (Guptill et al., 1976). The mean residual variance as difference between the true and target proportions was 0.0216 for subsamples that were split dry and 0.0037 for wet splits, i.e. 2.2 % and 0.4 %. Such accuracy was higher than the range of 10-20 % and 5-30 %, respectively, as given in the literature (Van Guelpen et al., 1982;Tennant & Baker, 1992;Scott & Hermelin, 1993) but in the range of current laboratory practice (Schönfeld, 2012).
If the uncertainties inferred by subsampling (4 %), sample processing (4 %) and splitting (2 %) were added with the rms. method, the total error before faunal analysis was 6 %. This value was in good agreement with the picking accuracy of drypicked samples, which was ±2 % (1σ). If variations in the recovery of living specimens were about in the same range as the accuracy of laboratory procedures, it is reasonable to assume that the blend had been sufficiently homogenized and that subsamples were representative aliquots. Any offsets in recovery higher than ±3 % are, therefore, considered to be due to speciesrecognition skills of the individual investigator. It has to be emphasized that this uncertainty is inherent to sample treatment and does not depend on the finding probability depending on the number of counted specimens (van der Plas & Tobi, 1965;Dennison & Hay, 1967;Fatela & Taborda, 2002). The standing stock variability of external replicates in field studies may be different, however. For instance, at Station B, 550 m water depth in the Bay of Biscay, the standard deviation of population density was ±7.8 % as referred to the average of 4 samples (Barras et al., 2010). Earlier duplicate samples from the same location suggested an even higher range of ±4.6 and ±28.7 % of the mean (deployments OB9B and OB10B; Fontanier et al., 2003). On tidal flats of the Dutch Wadden Sea, 1σ values of 22.7-77.1 % of mean population densities were reported from parallel samples (de nooijer, 2007). This variability mirrors local patchiness at various scales. none the less, the variations in population density of external replicate samples were one magnitude higher than among the aliquots from our study.

Inter-laboratory variations
If our aliquots from the Helgoland Experiment were homogeneous and well mixed, and if any deviation in population density beyond ±3 % was specific to the investigator, the question arises whether stained specimens have been overlooked or their staining pattern was assessed differently in the participating laboratories. From the dry-picked aliquot, a further 1 / 128 split was made at GEOMAR for analysis of the dead assemblage after the living fauna had been picked. When picking the dead assemblage, some well-stained specimens were recognized that have been overlooked. At Fribourg, successively more living specimens were found when the picking tray was occasionally screened again after having finished. On the other hand, a thorough comparison of the living faunas revealed no systematic differences in sizes, preservation or staining patterns between specimens picked at the individual laboratories. Therefore, no different perception of the stained/not-stained criterion could be implied, so the individual surveillance and finding skills most likely account for the differences in population densities as recorded by the different laboratories.
To test whether species proportions vary substantially between the laboratories we compared the dry-picked faunas. Dry picking is far more common and thus all investigators should have roughly the same skills in species recognition when using this technique. Counting of parallel splits has shown that the internal variability of the individual examiners was slightly less than the 1σ binominal standard error. The comparison between the laboratories revealed that all species proportions are within the 95 % confidence interval as defined by the 2σ binominal standard error. Considerable offsets from the 1:1 target line were depicted by some frequent species (Fig. 6). In particular, Ammonia batava was recorded with a lower proportion at St Petersburg than at Fribourg and kiel, with 8 % instead of 11 % and 10 %, respectively. Stainforthia fusiformis was less frequent at kiel: 14 instead of 20 % at St Petersburg. On the other hand, Eggerelloides scaber was recorded with a higher proportion of 30 % at kiel than 23 % at St Petersburg. All differences appear to be low by 2-7 %. If these differences were related to the maximum number of specimens of a given species to be expected in the sample, however, the deviations would range from 16-34 %. It, therefore, appears reasonable to contend that about every third to sixth specimen has been overlooked in those laboratories where the proportion was substantially lower.

concLusIons
In the Helgoland Experiment sampling, processing and faunal analyses followed the usual practices that have been applied in the majority of Recent benthic foraminiferal studies in nE Atlantic shelf seas. Aliquots of the same sample were treated with different methods and studied by different persons in three European laboratories. The faunal data revealed substantial differences in population density and assemblage composition between different sample treatments, staining modes and picking techniques.
Benthic foraminifera that were living at the moment of sampling were assessed by rose Bengal staining. The specimens were strongly and completely stained in samples where rose Bengal was applied dissolved in ethanol during preservation. In samples that were stained after washing with an aqueous rose Bengal solution, only the last few chambers were stained. As result, the number of specimens identified as living was lower by 37-46 % in samples that were stained with an aqueous rose Bengal solution after washing. Lower proportions of Stainforthia fusiformis in these samples showed that small, elongate taxa were preferentially overlooked. The chroma might also exert a certain influence on species recognition. Ammonia batava showed higher proportions in later stained samples where their coloration was more intense. Therefore, the highest population densities and thus complete and accurate faunal census data can be obtained from rose Bengalstained samples only when the stain is applied in ethanol solution during preservation.
Foraminifera from the sample preserved with formalin showed signs of dissolution even though the pH was still in the basic range. Porcellaneous species were most affected as their test wall is composed of minute and, in part, loosely arranged crystallites. The preservation of textulariid and rotaliid species was moderate to good. Dissolution phenomena on rotaliid tests mirrored features that have been observed in CO 2 -rich habitats. Therefore, the influence of formic acid dissociation on the carbonate system in the supernatant seawater-formaldehyde solution has to be taken into consideration. The addition of bicarbonate or aragonite, for instance polished coral beads, could inhibit or delay the corrosion. In any case, the buffering agent hexamethylentetramine or sodiumtetraborate should be used in large quantities, and the pH needs to be checked at monthly intervals if the samples are to be stored for years.
The population densities of samples that were picked dry by a single investigator revealed a picking accuracy of ±2 % (1σ). This value was in good agreement with an error estimate of 6 % that was obtained with the rms. method and considered the physical sample treatment before faunal analysis. This match suggested that our sample had been sufficiently homogenized and that subsamples were indeed representative aliquots. From the literature it can be seen that variations in population density of external replicate samples are sometimes higher by more than one order of magnitude. none the less, the individual surveillance and finding skills of different persons may confer substantial variability to population densities. Census data of three or four parallel splits picked by the same examiner revealed that the variability of percentages was in the range of binominal 1σ standard error from probability statistics for most species, except Elphidium excavatum which showed a stochastic variability in all three laboratories. Considerable differences in recovery and species proportions were recognized between wet and dry picking. The comparisons revealed, however, that all data were within the 95 % confidence interval as defined by the 2σ binominal standard error and no significant differences from the 1:1 target line were recognized. Systematic offsets in species proportions between different picking modes or laboratories were not found. none the less the proportions of frequent species may differ by 2-7 %. The results showed that every third to sixth specimen had been overlooked in those laboratories where a substantially lower proportion of the respective species was recorded. More parallel investigations involving a larger number of specialists and training sessions are needed to achieve a better accuracy of faunal census data in large monitoring projects.
The faunal indices showed a considerable variability, reflecting the effects of preparation techniques but more importantly the effects of different size fractions. In particular the Fisher alpha index was lower by a half in the >125 µm and >150 µm grain-size fractions as compared to the >63 µm fraction, and in the >250 µm fraction it was again lower by about a half. The decrease in diversity with increasing mesh size was on the same scale as reported from deep-sea faunas, even though our samples represent a sandybottom shelf fauna. The data confirm that only half of the living specimens were captured if the >125 µm fraction is used instead of the >63 µm fraction. The loss in species is with more than a third slightly higher than in deep-sea environments where about a quarter of the inventory was missing from the >125 µm fraction. However, the proportions could be substantially different in faunas from oxygen minimum zones where small-sized Bolivina species dominate the assemblages.

AcKnowLEdgEMEnts
André Freiwald and Achim Wehrmann, Senckenberg am Meer, Wilhelmshaven, provided ship time, logistical assistance and supporting information. Robin Hinz, Captain karl Baumann and his crew helped the first author during sampling on board R/V Senckenberg. Thorsten Garlichs, ulrike Lomnitz, Sebastian Meier and ute Schuldt, kiel university, made SEM images of surface structures, Volker Liebetrau, GEOMAR, gave access to photomicroscope facilities. We are indebted to John Murray, Southampton, an anonymous reviewer, and Bridget Wade, Leeds, as Handling Editor for their thorough and very constructive reviews of an earlier version of this paper. Joachim Schönfeld acknowledges funding by the Deutsche Forschungsgemeinschaft (grant SCHO605/9-1) and by the Schweizerischer nationalfonds (grant Iz32z0-135895/1 to Silvia Spezzaferri).

APPEndIx A: BEntHIc ForAMInIFErAL sPEcIEs consIdErEd
Taxonomic references are given in Ellis & Messina (1940); they are not included in the Reference list.