Predicting plant attractiveness to pollinators with passive crowdsourcing

Global concern regarding pollinator decline has intensified interest in enhancing pollinator resources in managed landscapes. These efforts frequently emphasize restoration or planting of flowering plants to provide pollen and nectar resources that are highly attractive to the desired pollinators. However, determining exactly which plant species should be used to enhance a landscape is difficult. Empirical screening of plants for such purposes is logistically daunting, but could be streamlined by crowdsourcing data to create lists of plants most probable to attract the desired pollinator taxa. People frequently photograph plants in bloom and the Internet has become a vast repository of such images. A proportion of these images also capture floral visitation by arthropods. Here, we test the hypothesis that the abundance of floral images containing identifiable pollinator and other beneficial insects is positively associated with the observed attractiveness of the same species in controlled field trials from previously published studies. We used Google Image searches to determine the correlation of pollinator visitation captured by photographs on the Internet relative to the attractiveness of the same species in common-garden field trials for 43 plant species. From the first 30 photographs, which successfully identified the plant, we recorded the number of Apis (managed honeybees), non-Apis (exclusively wild bees) and the number of bee-mimicking syrphid flies. We used these observations from search hits as well as bloom period (BP) as predictor variables in Generalized Linear Models (GLMs) for field-observed abundances of each of these groups. We found that non-Apis bees observed in controlled field trials were positively associated with observations of these taxa in Google Image searches (pseudo-R2 of 0.668). Syrphid fly observations in the field were also associated with the frequency they were observed in images, but this relationship was weak. Apis bee observations were not associated with Internet images, but were slightly associated with BP. Our results suggest that passively crowdsourced image data can potentially be a useful screening tool to identify candidate plants for pollinator habitat restoration efforts directed at wild bee conservation. Increasing our understanding of the attractiveness of a greater diversity of plants increases the potential for more rapid and efficient research in creating pollinator-supportive landscapes.

Global concern regarding pollinator decline has intensified interest in enhancing pollinator resources in managed landscapes. These efforts frequently emphasize restoration or planting of flowering plants to provide pollen and nectar resources that are highly attractive to the desired pollinators. However, determining exactly which plant species should be used to enhance a landscape is difficult. Empirical screening of plants for such purposes is logistically daunting, but could be streamlined by crowdsourcing data to create lists of plants most probable to attract the desired pollinator taxa. People frequently photograph plants in bloom and the Internet has become a vast repository of such images. A proportion of these images also capture floral visitation by arthropods. Here, we test the hypothesis that the abundance of floral images containing identifiable pollinator and other beneficial insects is positively associated with the observed attractiveness of the same species in controlled field trials from previously published studies. We used Google Image searches to determine the correlation of pollinator visitation captured by photographs on the Internet relative to the attractiveness of the same species in common-garden field trials for 43 plant species. From the first 30 photographs, which successfully identified the plant, we recorded the number of Apis (managed honeybees), non-Apis (exclusively wild bees) and the number of bee-mimicking syrphid flies. We used these observations from search hits as well as bloom period (BP) as predictor variables in Generalized Linear Models (GLMs) for fieldobserved abundances of each of these groups. We found that non-Apis bees observed in controlled field trials were positively associated with observations of these taxa in Google Image searches (pseudo-R 2 of 0.668). Syrphid fly observations in the field were also associated with the frequency they were observed in images, but this relationship was weak. Apis bee observations were not associated with Internet images, but were slightly associated with BP. Our results suggest that passively crowdsourced image data can potentially be a useful screening tool to identify candidate plants for pollinator habitat restoration efforts directed at wild bee conservation. Increasing geotagged vacation photos shared on the popular photo sharing site Flickr (www.flickr.com) and userreported profile data, to understand how lake quality affected how far a vacationer was willing to travel [34]. Similarly, floral photography is a common pastime among amateur and professional photographers. A Flickr search conducted in April 2014 yielded 92 103 photo sharing groups and over 23 million individual photographs tagged with the term 'flower' (C.A.B. 2014, personal observation). Many of these photos depict flowers occurring in natural or semi-natural habitats and in some cases, can capture relevant ecological information within the photo (such as insect visits) or within the photo caption (such as species identification). Google's Image search engine (www.google.com/imghp) provides a much broader database and additional search functionality. Image databases such as these represent a 'passive' crowdsourced data resource which has the potential to provide insights into ecological patterns and direct future experimental research efforts.
In this study, we use a 'passive' crowdsourced data resource to accelerate the search for pollinatorattractive plants. Specifically, we hypothesized that the abundance of Internet images of flowers with visiting insects may correspond to their attractiveness to insects under controlled experimental conditions. To test this hypothesis, we ask: 'Is the frequency of observation of various pollinator taxa on plants in search engine results positively associated with the attractiveness of these plants under field conditions'?

Plant list and field observations
We used data produced by Tuell et al. [20] and Fiedler [35] to test the association between crowdsourced data and experimental results. In these previous studies, we and our co-workers have empirically measured the attractiveness of flowering plants to bee-mimic flower flies [15,35] and pollinators [20] in common-garden experiments using vacuum sampling (table 1). Specifically, these studies contrasted the attractiveness of five exotic plants that are widely recommended for their attractiveness to beneficial insects, to 43 species of perennial native plants [15]. Tuell & Fiedler [20] summarized their observations as number of Apis (honeybees) and non-Apis bees visiting each plant species at peak bloom. Because insect activity differed significantly over the course of the growing season, plants were grouped into three flowering categories: early, middle and late season blooming for analysis [20].
Fiedler [35] used a similar vacuum sampling methodology, but instead focused on beneficial predators and parasitoids, including flower flies (Diptera: Syrphidae). Although syrphids have a predatory larval phase, they are nectar feeding as adults and many are important pollinators [36,37]. Additionally, adult syrphids are superficially similar to bees and often are mistaken for bees in photographs. Thus, we used the Tuell et al. [20] Apis and non-Apis bee observations and the Fiedler [35] syrphid observations by plant species as response variables.

Determining search engine and search terms
All searches were performed between December 2013 and April 2014 using the Google Chrome v. 33.0.x web browser. Search engines evaluated were Google Images (www.google.com/imghp) and Bing Images (www.bing.com/images/). To evaluate which search engine and search terms had best performance (i.e. yielded the most relevant results by returning images with the correct plant species and visible insects in the photo), we used the list of the five highly recommended exotic species from Fiedler & Landis [15] as these species are relatively common, frequently photographed and known to be attractive to beneficial insects. These species included Vicia faba (fava bean), Fagopyrum esculentum (buckwheat), Coriandrum sativum (coriander), Lobularia maritime (sweet alyssum), and Anethum graveolens (dill). Initial evaluation indicated that Latin names yielded more relevant search results (i.e. a greater number of photographs with correctly identified plants) than common names. Latin names were combined with the following search terms: 'bee', 'beneficial insect', 'honeybee' or 'insect'. For the first 30 image results, which captured the correct plant species, the number of results where blooming flowers with insects present was recorded. If incorrectly identified plants or irrelevant images appeared, subsequent images were examined until a total of 30 images meeting these criteria were reached. The search term structure and search engine that yielded the most relevant results (Google Images, search term '[Plant Latin name] bee') was used for all subsequent data collection (

Frequency of occurrence of pollinators in flower photos
Using Google Image search, we conducted searches for '[Plant Latin name] bee' using the list of native Michigan plants that were used in our group's common-garden studies (table 1; [15,20,35]. Images were evaluated sequentially, in the order they appeared in the search results. Images were evaluated by the following criteria: (i) each search procedure received one tally for each image containing the correct plant species, shown in bloom, and with sufficient image quality such that target insect taxa could be reliably identified (i.e. the image was not blurry and the inflorescence was clearly visible), (ii) in the set of images where the previous condition was met, the number of images where Apis, non-Apis bees and syrphid flies were visible were tallied and recorded (figure 1). For each search procedure, photos were sequentially evaluated until 30 images meeting criterion 1 were evaluated or until 200 images were examined. The number of images evaluated for each search was recorded. Duplicate images and differently cropped shots of previously counted images were excluded from evaluation and not counted towards the total images searched.

Analysis
Search data (S) were compared with data from the field studies [15,20,35]. Because some searches did not have 30 images meeting the criteria described above, search results were scaled for lower search success rates by multiplying the number of images where a given taxon was observed by 30 and dividing by the number of images meeting the criteria in that category. Then, a model selection approach was used to determine which parameters were in the best model to predict field observations (O) for a given pollinator taxon (all bees, Apis bees, non-Apis bees and syrphids). Because net bee abundance varied dramatically by bloom period (BP) (as defined for Michigan native plants in Tuell et al. [20]), this variable (BP) was also included as in the model selection procedure. The field observations took the form of counts, so models with Poisson or negative binomial error structure are most appropriate [38]. As models fit reasonably well (i.e. the ratio of residual deviance to residual degrees of freedom less than 1 for all models), Poisson structures were used for all analyses. The global model, a GLM took the form For each pollinator group-based model set, variables were dropped singly from the global model to determine the simplest model with the best performance. Akaike's Information Criterion (AIC) [39] was used to rank models. If two models had equivalent performance (i.e. produced AIC values that were within two units of each other), the model with the fewest parameters was selected as the best model. All analyses were performed in R v. 3.0.3 [40]. Figures were generated with ggplot2 [41]. An α = 0.05 was used to determine statistical significance, where appropriate.

Frequency of occurrence of pollinators in flower photos
We

Relationship to field data
Model selection favoured the inclusion of image search results in models for field observations of all bees, non-Apis bees and syrphids, and all these regressions produced positive regression coefficients (table 3). The best model for Apis bees only included BP. BP was also included in the best models for non-Apis bees and all bees, but not in the model for syrphids. Only the models for all bees and non-Apis bees produced statistically significant regression parameters, although effects in the 'all bee' model was largely due to responses of non-Apis bees, as Apis bees represented a minority of those observed. Observations of both Apis and non-Apis bees were more variable by BP in search result data than in field-collected data ( figure 2). Non-Apis bee field observations had the strongest relationship with search result data ( figure 3).

Discussion
We detected positive associations between the frequency that non-Apis bees were photographed on a given plant and its relative attractiveness to non-Apis bees in controlled field trials (figure 3). To a lesser and much more variable extent, a similar positive association was observed for total bees and syrphid flies (table 3). We did not observe this relationship for Apis bees. The reason for this strong association observed for non-Apis bees compared with other taxa may, at least in part, be due to sample size effects: non-Apis bees were observed nearly twice as often as Apis bees in the field and more than 10 times as often as syrphid flies [20,35], thus relationships may not be consistent enough to be statistically detectable. However, model selection suggested that unlike non-Apis bees and syrphids, Apis bees were only associated with BP of flowers (table 3 and figure 2). The model for total bees, defined as the sum of non-Apis and Apis bees, although statistically significant, had a substantially lowered strength of effect (i.e. slope) between the number of bees observed in the field versus their frequency of observation in photos. This result suggests that conflicting responses essentially masked the strong association observed in the non-Apis bee model and highlights the importance of striking a balance between taxonomic resolution and available sample size. Both honeybee behaviour and human manipulation of their colonies may play a role in the differentiation of patterns we observed between Apis and non-Apis bees. Model selection favoured a model containing only BP to predict Apis bee abundance, suggesting that seasonality, potentially related to cropping practices and not the specific attributes of a particular flower species, is the primary factor driving honeybee visitation, at least in Michigan field trials. Honeybees are generalist foragers, which are moved from crop to crop, as pollination needs dictate [2,43]. This management practice adds an element of unpredictability to their foraging patterns: colonies of Apis bees are physically moved throughout the season, thus their use of plants adjacent to croplands would be a function of colony placement and attractiveness of their target crop. This seasonality effect would vary with region, crop and local apicultural practice, and thus could obscure patterns in image search results, which draw from a global range. The social behaviour of Apis bees also influences the foraging behaviour of the colony. Scout bees inform nest-mates of the direction and distance to flowering resources [44] and honeybees tend to have high fidelity to specific resources where they have previously found significant reward [45]. In combination, these behaviours may influence bee abundance at floral resources that are less abundant in the landscape.
The results of this study have potential application in the development of locally targeted pollinator enhancement habitats, particularly those that emphasize supporting wild pollinator populations. Locally targeted pollinator plantings, particularly those emphasizing native plants, are desirable from a wide variety of perspectives. In addition to supporting restoration of native plant diversity, habitat enhancements emphasizing native plants help to restore local biodiversity. Floral resources can increase local biodiversity by supporting specialist insects that may be endemic to the area and plants can provide non-floral resources year-round, such as nesting and overwintering sites [11,46]. Furthermore, using native plants that tolerate local environmental conditions can help to lower establishment and maintenance costs of these habitats [13].
Although we did not observe any generalizable trends in Apis bee plant preferences, honeybees also benefit from well-designed pollinator habitats in landscapes. Honeybees have high energy requirements and habitats with an abundance of nectar available season-long are better able to support larger honeybee colonies [47,48]. Even if honeybee colonies are being moved through the landscape for crop pollination, honeybees can and do forage within wild plant communities embedded in agricultural matrices [20,48]. Locally targeted pollinator enhancement habitats can support greater communities of natural enemies, as well as supporting conservation biological control and potentially mitigating pesticide risk [11][12][13][14][15]49]. Our methodology serves as a complement to strategies already in place for developing pollinator habitats and helps to refine efforts for creating locally adapted plant communities. Using our methodology, plant lists with a particular set of attributes (i.e. adapted to a particular soil type, endemic to a specific region) can be evaluated for further screening under field conditions.
Crowdsourcing data allow us to use a collective intelligence which can outperform individual studies or experts [50]. Crowdsourcing usually capitalizes on the intent of the participants to produce data for a specific purpose, but incidental observations of casual Web users, mined for patterns, can be regarded as a 'passive' crowdsourcing approach. Using passively crowdsourced data and the methodology outlined in this study may have applications in other systems. Internet images can represent a random sample of events and as we have shown, at least for certain interactions, the frequency with which an event is observed in Internet photos corresponds to the frequency of events occurring in the field, under controlled conditions. Yet, it is important that findings based on this methodology be 'ground-truthed'. Not all patterns will be captured because of localized variability. If the geographical range of a particular interaction is wide and patterns in the interactions vary over the range, this decreases the likelihood that a usable trend will be detected. Geographical biases affecting data quality would also include cultural and economic factors (i.e. the availability of photographic equipment and Internet access in a given region, the local cultural precedence for collecting images of organisms and sharing them on the Internet, the time of year people are most likely to use leisure time to photograph insects or flowers). Additionally, citizen scientists are more likely to document rare events [27], possibly due to cognitive biases associated with the recall of unusual occurrences [22,51]. Comparing the results of searches to experimental results is essential to develop an understanding of which interactions are captured in images and which are not. Yet, as we have shown, our methodology has the potential to have application in capturing a subset of ecological interactions with potential implications in management.