Science podcasts: analysis of global production and output from 2004 to 2018

Since 2004, podcasts have emerged as a decentralized medium for science communication to the global public. However, to date, there have been no large-scale quantitative studies of the production and dissemination of science podcasts. This study identified 952 English language science podcasts available between January and February 2018 and analysed online textual and visual data related to the podcasts and classified and noted key production parameters. It was found that the total number of science podcast series available grew linearly between 2004 and 2010, and then exponentially between 2010 and 2018. Sixty-five per cent of science podcast series were hosted by scientists and 77% were targeted to public audiences. Although a wide range of primarily single-subject science podcasts series were noted, 34% of science podcast series were not dedicated to a science subject. Compared to biology and physics, chemistry may be under-represented by science podcasts. Only 24% of science podcast series had any overt financial income. Sixty-two per cent of science podcast series were affiliated to an organization; producing a greater number of episodes (median = 24, average = 96) than independent science podcast series (median = 16, average = 48). This study provides the first ‘snapshot’ of how science podcasts are being used to communicate science to public audiences around the globe.


Introduction
Since 2004, podcasts have emerged as a new decentralized medium for free and independent communication to global audiences. Podcasts are typically audio-only, hosted online and distributed to audiences via direct, on-demand audio and video downloads to personal computers, MP3 players, interactive media devices and smartphones [1]. For app-enabled devices, episodes of a podcast series can be automatically downloaded via free opt-in subscription to particular podcast series & 2019 The Authors. Published by the Royal Society under the terms of the Creative the incorporation of supplementary show notes. All data associated with this study are available as a supplementary dataset in the form of a Microsoft Excel spreadsheet.

Information sources
All information used in this study was sourced from public websites that were dedicated to the promotion of podcasts. Information was gleaned exclusively from visual and textual 'metadata' relating to each podcast series, including the description of each podcast series on 'iTunes', the websites of podcasts and the social media content associated with podcast series, i.e. on 'Twitter' [20], 'Facebook' [21] and 'Patreon' [22]. The audio and video content of podcasts themselves was not used due to the impracticalities associated with listening and transcribing the tens of thousands of hours of audio content that science podcasts provide [23]. Producers and other individuals associated with the production of podcast series were not contacted for information about this study to avoid methodical disparity between podcast series with responsive producers and those without responsive producers. In all cases, information was accessed between 5 January 2018 and 5 February 2018. The associated supplementary database contains all the specific dates of when each website URL was accessed. All data were manually coded and categorized by the author.

Identification of podcast series
Owing to the decentralized nature of the podcast medium, there is not a single podcast database or website that lists all podcast series. However, the closest thing to a 'de facto' centralized podcast series database is the 'iTunes' podcast directory, which as of 2015, was estimated to list over 200 000 podcast series [24]. 2 The 'iTunes' podcast directory's search function is available cross-platform: i.e. it can be used by podcast apps running on non-Apple platforms, e.g. Android devices [26,27]. If a podcast series is not listed on the 'iTunes' podcast directory, then it is considerably less likely to be found by listeners [28]. Therefore, in line with other studies [12], the 'iTunes' podcast directory was selected as the primary directory from which to source podcasts.
A systematic review of the 'iTunes' podcasts 'Natural Sciences' directory was conducted to identify potential podcast series for inclusion in this study [29]. All podcast series in the 'Natural Sciences' section were examined between 5 January 2018 and 5 February 2018 by proceeding through the section in reverse alphabetical order. However, it should be noted that the category that a podcast series is assigned to within the 'iTunes' podcast directory is based entirely on the category nominated by the uploader of said podcast series [28]; consequently, there are many non-scientific podcast series spuriously listed in the 'Natural Sciences' 'iTunes' category [29]. Therefore, to ensure that only valid podcast series covering scientific topics were examined in this study, a stringent set of inclusion criteria were developed and applied (see 'Categorical definitions'). The inclusion criteria were applied after analysis of the textual and visual information associated with each podcast series and they are defined in the subsection 'Inclusion/exclusion criteria'. Additionally, during the study, some podcast series that were not listed on the 'iTunes' podcast directory were found incidentally. These incidental podcasts were also considered for inclusion. Of these 'non-iTunes' listed podcasts, 18 met the inclusion criteria, representing approximately 2% of the 952 science podcast series included in this study.

Inclusion/exclusion criteria
To ensure that only legitimate science podcast series were included in this study, the following set of inclusion/exclusion criteria were developed and applied: -Only English language podcast series were considered in this study. If a podcast series was available in multiple languages via separate podcast feeds, then only the English language podcast feed was considered for analysis to avoid duplicating content.

Categorical definitions
Podcast series, their production methods and their production outputs were manually classified by the author in accordance with the definitions provided in table 1 and the methods detailed herein. Science podcast series were typically found to be focused on either a single distinct topic or to cover many different topics across a wide range of scientific disciplines. Therefore, an exclusive single-category system was used to classify the topics of podcast series; i.e. podcast series were either classified as a single subject, or if they covered many topics, they were classified as 'general science'. Similarly, an exclusive one-category classification system was deemed sufficient for organizational affiliations, target audiences, and whether or not a podcast series was video or audio format. Three non-exclusive categories were devised for classifying supplementary income: 'donations', 'merchandise' and 'advertising/ sponsorship'. These categories were not exclusive because individual podcast series may employ some or all of these income mechanisms.
'Country of podcast production' was defined as the country primarily associated with a podcast series and its hosts. For this category, an exclusive one-category classification system was adopted; if two or more countries were associated with a podcast series, then it was classed as 'multinational'.
Science podcast hosts were classified according to a ranked classification system consisting of 'scientific researchers/educators' (Rank 5); 'media/journalism professionals' (Rank 4); other professionals' (Rank 3); 'amateurs' (Rank 2); and 'unclear' (Rank 1), where the ranking is related to general expertise/scientific authority, i.e. the higher the rank, the higher the authority (table 1). In the case where podcasts had multiple hosts (or a single host of different areas of expertise), then the highest ranked category corresponding to one of the hosts was recorded, even if that host was in an overall minority of hosts. The limitations of this method are discussed in the 'Methodology and associated limitations' subsection of the discussion.
Podcast activity and podcast lifespans were determined by the objective definitions described in table 1.

Data analysis
All relevant information and resultant categorical analysis was recorded within a spreadsheet database (Microsoft Excel 2016, .xlsx format), which is available as a supplementary dataset to this manuscript. Basic categorical analysis was undertaken with Microsoft Excel; however, advanced categorical and data analysis (such as analysis of podcast series lifespan) was carried out using custom-written 'Nature' [47], 'PLOS' [48] and 'SAGE' [49]. conventional media body an organization which primarily disseminates conventional media, such as TV/radio broadcasts, or print media. For example: 'BBC Radio 4' [50], 'ABC Radio National' [51] 'Scientific American' [52] and 'NPR' [53]. podcast network an Internet-only media organization solely dedicated to releasing podcasts. For example, 'The Naked Scientists' [33], 'Relay FM' [54] and 'StarTalk Radio' [55]. amateur organization any amateur organization. For example, local astronomy groups and 'sceptics' societies.
podcast media types (figure 4) audio podcast a podcast that directly incorporates only audio information [not including media within show notes].
video podcast a podcast that directly incorporates both visual and audio information [not including media within show notes].
show notes media or information which is supplementary to a podcast episode and which is available to audiences via podcast apps or related websites. Show notes may include images, videos, hyperlinks, scientific references and audio transcripts. However, simple descriptions of a podcast episode are not classified as 'show notes'. countries (figure 5) country of podcast production the country primarily associated with a podcast and its hosts. N.B. If a podcast is clearly associated with two or more countries, then that podcast is classified as 'multinational'. To estimate mean lifespan of podcast series, single-term and two-term exponential decays were fitted to podcast series lifespan data by least-squares regression. 3 The equations describing these fits are respectively: where a, b, c and d, are the recovered best-fit parameters with associated 95% confidence intervals. The mean lifespan (T) was then calculated by where ln (2) is the natural logarithm of 2 (approx. 0.693). For estimation of long and short mean lifespan components from two-term exponential decay fits, d was substituted for b in equation (2.3). 95% confidence intervals for the upper and lower bounds of T were also estimated. The statistical significance of the difference between the best-fit estimates of T for long-duration and short-duration components were estimated by the method described in Altman & Bland [56], which is based upon the 95% confidence intervals. In all cases (including the case of non-normally distributed 95% confidence intervals), the larger confidence interval was used to assess statistical significance. The statistical significance of the difference in the number of episodes produced by 'affiliated' and 'independent' podcast series was calculated via a two-sample t-test [57].

Results
The inclusion criteria for this study were met by 952 science podcast series. A similar number-i.e. many hundreds of podcast series-were excluded as per the inclusion/exclusion criteria, but the details of these individual excluded podcasts were not recorded.
Between 2004 and 2010, the total number of science podcast series grew in a linear manner (see linear fit in figure 1a, R 2 ¼ 0.99). By contrast, between 2010 and 2018, the total number of available science podcast series grew exponentially (figure 1a, R 2 ¼ 0.99), rising to 952 podcast series by the sampling period (5 January-5 February 2018). Prior to 2004, 11 science podcasts were available as Internet radio shows, which have subsequently become available as podcast series.
As of their individual sampling dates, 4 46% of total science podcast series were 'active', meaning that they released an episode in the three months prior to their specific sampling date. Of the remaining 'inactive' podcast series, 14% released an episode between 3 and 12 months prior to their sampling date, and 40% had not released an episode for over a year prior to their sampling date (figure 1b).
The number of episodes released by each science podcast series was found to be highly variable: 33% of science podcast series produced fewer than 10 episodes, and 72% of science podcast series produced mean lifespan (t) the timespan in which 50% of a given population of podcasts will become 'inactive'. The mean lifespan is estimated by fitting an exponential decay to the lifespan data of a population of podcasts, and is therefore analogous to the concept of 'mean lifetime' within the context of radioactive decay. short lifespan podcasts the population of podcasts with a 'mean lifespan' of less than 1 year.
long lifespan podcasts the population of podcasts with a 'mean lifespan' of more than 1 year.
3 Two-term exponential fits were necessary because single-term exponential decays were found to fit the data poorly, as shown in figure 8. 4 The exact sampling date for each podcast is provided in the associated supplementary dataset.  and table 2). From figure 1d,e, it is apparent that a high proportion of science podcast series did not produce podcast episodes for more than a year. A wide variety of science podcast series topics/themes were recorded, with 66% of science podcast series themed around discipline-specific topics (figure 2a). Of particular note, 'chemistry' was the topic for only 3% of science podcast series, compared to 18% for 'physics and astronomy', and 14% for 'biology'. Thirty-four per cent of science podcast series were categorized as 'general science', i.e. science podcasts focusing on no single discipline-specific theme.
The majority of science podcast series (77%) have been targeted to public audiences, 16% were targeted towards scientists or specialists, and 6% were provided as academic lectures, research seminars/conferences or as secondary education learning aids (figure 2b).
Nearly two-thirds (65%) of science podcast series were hosted by 'scientists'; 10% were hosted by 'media professionals', 7% by 'other professionals' and 5% by 'amateurs' (figure 3a). Host categories could not be identified for 13% of science podcast series.
Fifty-seven per cent of science podcast series did not follow a regular episode release schedule (figure 3c). The most popular release schedule was 'weekly' (15%), followed by 'monthly' (8%) and 'fortnightly' (6%). Only 3% of science podcasts released more than one episode per week, and 1% released an episode daily. Only 2% of science podcast series explicitly acknowledged a seasonal release format, i.e. periods of scheduled episode releases followed by an extended period where no episodes are released.
While podcasts can contain both audio and visual information, 87% of science podcast series were audio-only, with the remaining 13% being video podcast series (so-called vodcasts; figure 4a). Fiftyone per cent of science podcast series provided additional non-audio supplementary material in the form of show notes (e.g. hyperlinks, images and references; figure 4b). From figure 4c, it is clear that the proportion of new video science podcast series produced each year, as a fraction of overall science podcast series, has declined from a peak of approximately 30% of science podcast series in 2007 to approximately 5% of science podcast series in 2017. However, the absolute number of new video science podcast series produced each year has been relatively constant, at around 9 + 3 (mean + s.d.). This long-term decline in video podcasts may reflect changing behaviour, i.e. that audiences consume podcasts while undertaking activities incompatible with watching video content [19,[58][59][60].
Global production of science podcast series, to date, is shown in figure 5: 57% of the available English language science podcast series were produced in the United States of America (USA); 17% were produced in the United Kingdom (UK); 5% in Australia; 3% in Canada and 1% in the Republic of Ireland. Other countries produce a combined total of 7% of English language science podcast series. A country of production could not be identified for 10% of science podcast series.
Seventy-six per cent of science podcast series were observed to have no overt supplementary income mechanisms and are thus seemingly independently financed by their producers (figure 6a). 'Advertising' general science physics and astronomy biology ecology/zoology/conservation oceanography/marine biology psychology and neuroscience chemistry climate change/atmospheric science geology/earth science mathematics paleontology/anthropology/archaeology medical/pharmacology computer science engineering statistics/data science  was the least commonly used supplementary income mechanism (figure 6b), but it was common for science podcasts to mix 'voluntary donations', 'merchandise' and 'advertising' to various degrees. The differences between 'independent' science podcast series and 'affiliated' science podcast series in relation to various production outputs are shown in figure 7. In terms of podcast activity, there is only a marginal difference between the percentage of active 'affiliated' and 'independent' science podcast series (48% and 45%, respectively; figure 7a). However, a larger proportion of 'independent' podcast series (84%) are targeted to the public, compared to 'affiliated' podcast series (73%) (figure 7b). A slightly smaller proportion of 'independent' podcast series (14%) are targeted towards 'scientist/specialist' audiences compared with 'affiliated' podcast series (17%) (figure 7b). Nearly all science podcast series billed as academic seminars, student lectures or secondary education aids are produced as 'affiliated' podcast series (figure 7b). Roughly 75% of both 'independent' and 'affiliated' podcast series had no overt supplementary income (figure 7c). However, a considerably greater proportion of 'independent' podcast series solicited for 'voluntary donations' and sold 'merchandise' (figure 7c). 'Advertising' was much more prevalent for 'affiliated' podcast series (25%) than 'independent' podcast series (11%)  (figure 7c); this is probably due to many 'affiliated' podcast series being associated with commercial broadcast networks, where 'advertising' was assumed. 'Affiliated' podcast series produced more podcast episodes (median ¼ 24, average ¼ 90) than 'independent' podcast series (median ¼ 16, average ¼ 48). A two-tailed t-test found that the difference in the overall number of episodes released was statistically significant ( p ¼ 0.01) and that the greater average number of podcast episodes released by 'affiliated' podcast series was also statistically significant ( p , 0.01).
The lifespan of both 'independent' and 'affiliated' podcast groupings was best fitted by a two-term exponential. This indicates that both 'affiliated' and 'independent' podcast groupings contain subsets of 'short lifespan' and 'long lifespan' podcast series ( figure 8a,b). Extraction of fit parameters enables the estimation the podcast 'mean lifespan' (T) for each of these podcast subsets. T is analogous to the concept of 'mean lifespan' in radioactive decay; i.e. T is the elapsed timespan in which 50% of the podcasts in a population become inactive. The best-fit and 95% confidence interval values for T are shown in figure 8c,d. For short-duration podcast series subsets, the difference in the best-estimates of  T for 'affiliated' and 'independent' podcast series was not statically significant ( p . 0.33). However, for long-duration podcast series subsets, the difference in the best-estimates of T or 'affiliated' and 'independent' podcast series (5.5 years and 4.3 years, respectively) was statistically significant ( p , 0.02).

Methodology and associated limitations
This is the first study to analyse the global production and outputs of a large group of science podcast series. As such, the findings here provide fundamental and novel insight into who is producing science  podcast series and their target audiences. However, before detailed discussion of results, it is important to acknowledge the limitations of the methodology employed in this study. Firstly, in this study, only English language science podcast series were surveyed and analysed. It is highly probable that non-English language science podcast series would demonstrate different trends due to different listener and producer demographics.
Secondly, it is important to note that the data generated in this study were analysed (coded) by only a single researcher (the author). This is a shortcoming of the study design because different individuals may categorize qualitative data different. Best practice in such research would have been to follow 'multiple coding' procedures, i.e. for multiple researchers to evaluate and analyse the data, subsequently resolving any discrepancies arising, while also maximizing robustness in data coding [61]. Also relevant to data coding and interpretation of the results is that a host classification based on a notional ranking of scientific authority was used. The rationale of this system was that having even a single scientist in a podcast host group will tend to elevate the scientific content of a podcast; therefore, such instances should be highlighted. However, this host classification system has several limitations: (i) it is based on the analysis of textual and visual data, (ii) it may overly simplify the data in a manner that over-represents higher-ranked host classifications (i.e. scientists and media professionals), and (iii) it does not consider the expertise of guests on podcasts. For future studies, a classification system that better represents the myriad possibilities of podcast host backgrounds should be implemented.
Thirdly, science podcast series were primarily identified by survey of only a single 'iTunes' category: i.e. the 'natural sciences' category [29]. This is similar to the methodology of a previous study by Birch & Weitkamp [12], which defined science podcasts as 'the natural sciences and mathematics'. However, constraining this study to the 'natural sciences' category limits the podcasts examined for two reasons: (i) listing a podcast on 'iTunes' is not mandatory; (ii) the category in which a podcast is listed on 'iTunes' is self-selected by the uploader, and therefore, many science podcasts may have been listed in 'iTunes' categories not examined. The most obvious category that was not analysed was the 'science and medicine' category [62]. However, a large number of podcast series that covered dubious/harmful pseudo-medical practices and advice were prevalent within the 'Science and Medicine' category. Therefore, an extremely stringent and in-depth inclusion/exclusion criteria strategy would have to be developed and applied, along with deep content analysis (e.g. actually listening to individual episodes of each podcast), to ensure that only legitimate scientific podcast series are included in any such study. Unfortunately, this was beyond the scope of the current study. Moreover, some science podcast series are not listed on 'iTunes' at all; an example of such a science podcast is 'BioLogic Podcast', which is hosted on the video sharing website 'YouTube' [63]. Additionally, it should be noted that some podcast series may voluntarily restrict the number of podcast episodes that are freely available to the public via 'iTunes' or other websites, but only freely available episodes were included for analysis within this study. Therefore, this study provides a lower-bound on the number of science podcast series available during the sampling period. Fourthly, this study exclusively examined the visual and textual online presence of podcast series. Owing to practical constraints, it was not possible to examine the extensive audio data associated with science podcasts. Therefore, it is possible that various aspects of podcast production were not fully categorized. This could affect all studied podcast categories, but most likely affects the capture of any audio-only advertisements or sponsorships that were not acknowledged in textual or visual web content of science podcasts. Therefore, it is possible that a greater proportion of science podcasts contain advertisements or sponsorships than is explicitly reported by this study. With regard to hosts, it is possible that podcast hosts and production teams fit multiple categories, but this is not captured by the relatively shallow nature of our study; as Picardi & Regina [5] note in their detailed comment on podcasting: 'defining who is inside and who is outside [sic: the podcast] control room is not an easy task'.
Fifthly, podcast episode length data and podcast download statistics were not available for analysis. Such data would be desirable for a more complete analysis of the consumption and production of science podcasts.
A notable limitation of this study is that the original podcast upload date for radio shows broadcast prior to 2004 are not known; instead the original air-date episodes (as provided on iTunes or another relevant website) are used as a compromise. This accounts for the 11 podcast series available prior to 2004 (see supplementary database for full details). Of these 11 podcast series, 10 are affiliated to an organization. Considering that 586 'affiliated' podcast series were analysed and that the mean lifespan, T, is calculated from robust curve-fitting models, the influence of these 10 podcast series on the results of lifespan fitting calculations can be considered negligible for the purposes of this study.

Science podcasts versus general podcasts
Large-scale studies of podcast production have not been published in peer-reviewed literature; therefore, it is necessary to look beyond the peer-reviewed literature to glean large-scale podcast production insights. In 2015, Morgan published a semi-formal study of podcasts of many different topics as a blog post on 'medium.com' [24]. While not published in a peer-reviewed journal, all data associated with Morgan's study are publicly available. Morgan's study sampled a subset of podcast series available on 'iTunes' in June 2015. Morgan estimated that there were 206 000 unique podcast series available on 'iTunes' at that time. Morgan then selected a random subset of podcast series for further analysis. This subset consisted of a total of 2500 podcast series, with 100 random podcast series drawn from the 25 'most popular' 'iTunes' categories (N.B. this did not include any category theme around science). Morgan's sampling and analysis was fully automated, so manual categorization of podcast production outputs was not conducted. Importantly, Morgan defined 'active podcast series' as podcast series that had released an episode within the six months prior to the sampling date [24]; this is a less stringent definition than that used in the present study, which defines 'active podcast series' as podcast series that had released an episode within three months prior to the sampling date. Morgan found that the number of podcast series available on 'iTunes' had grown from approximately 10 000 in 2007 to approximately 206 000 in 2015. When graphed, the trends in growth of total number of podcast series calculated by Morgan (not shown here) appear broadly similar to the trends shown in figure 1a, i.e. displaying distinct linear growth up to 2010, and exponential growth thereafter. This indicates that trends in the growth of science podcast series probably reflect the overall growth of the podcast medium. Additionally, Morgan found that roughly 40% of podcast series were 'active' by his less stringent definition [24]. This is lower than the comparable population of 'active' science podcast series (46%) found by the present study ( figure 1b). This comparison suggests that science podcast series may be more inclined to continue to release episodes compared to the wider population of podcast series. However, this comparison may not necessarily be valid because Morgan did not exclude podcast series that had not released a single episode. Furthermore, Morgan found that the average lifespan of podcast series was around six months, and that podcasts, on average, released 12 episodes, at a rate of two episodes per month. Additionally, Morgan estimated that around 20% of podcast series listed on 'iTunes' at the time were not English language podcasts.

Insights into the production of science podcasts
The predominance of scientists as hosts for science podcast series (figure 3a), combined with the fact that most science podcast series (57%) are released on an irregular schedule (figure 3c), may indicate that a significant majority of science podcast series are being produced by scientists as an extra commitment beyond their regular duties as a scientific researcher, science educator or science communicator. However, the limitations of the study methodology must be considered in that this study may possibly over-represent scientists as podcast hosts (see 'Methodology and associated limitations'). The result that most science podcasts do not have any overt supplementary income mechanisms (figure 5a) is of note when considering that there can be substantial costs associated with hosting a podcast (i.e. high-quality audio equipment and editing software, as well as branded websites for advertisement and podcast hosting). The lack of overt supplementary income mechanisms suggests that independent science podcast hosts are paying these costs 'out of their own pocket'. These results combine to give a broad impression that many science podcast series are being produced by scientists with no financial recompense. The obvious exception being the science podcast series 'affiliated' to organizations that can provide undisclosed financial support. However, the fundamental validity of this interpretation requires further research and study before firm conclusions can be made. Figure 2a shows that only 3% of science podcast series cover 'chemistry' as their main topic. When compared with the two other primary science subjects typically taught in schools-i.e. 'biology' (13% of science podcast series), and 'physics and astronomy' (18% of science podcasts)-it appears that chemistry is under-represented in science podcasts. There are several potential explanations as to why this may be. A 2011 editorial in the journal Nature Chemistry suggested that chemistry 'is a central science', meaning that aspects of chemistry are incorporated into other disciplines (e.g. biochemistry and materials research); therefore, chemistry is often not distinctly represented in public-facing science communication [64]. Similarly, Hartings and Fahy [65] noted that popular science involving chemistry may not be labelled as chemistry; that chemistry is complex; and that chemistry lacks unifying themes and public narratives that may be present in biology and physics. Additionally, a review of chemistry communication in 2016 noted that concepts in chemistry are well served by dynamic visual representations [66]; therefore, chemistry may not be well suited to the primarily audio format of podcasts. Indeed, chemistry content is very well received in more visual Internet mediums, e.g. the video series: 'Periodic Videos' on 'YouTube' [67]. Velden & Lagoze [68] note that chemistry has been slow to adopt 'new web-based models of scholarly communication' when compared with physics and biology. While this may be true for scholarly communications, it is not clear if this is true for chemistry and digital science communication practices. All these reasons are likely to play into the apparent lack of chemistry science podcast series. This reinforces a 2016 recommendation from the National Academies of Sciences, Engineering, and Medicine that science funding agencies should support digital media for chemistry communication as a priority [69].
The statistically significant greater best-estimate values for mean lifespan of 'affiliated' podcast series (5.5 years) compared to 'independent' podcast series (4.3 years; figure 8d) could be explained by the hypothesis is that 'independent' podcast series may be more likely to be produced by individuals or small groups, with limited time and resources, whereas 'affiliated' podcast series are produced by organizations with dedicated staff with defined duties. Such dedicated staff could take over podcasting duties when necessary, therefore extending the overall lifespan of the 'affiliated' podcast series compared to 'independent' podcast series. However, no firm conclusions with regard to the causes of podcast series sustainability can be drawn from this study, and it should be noted that there are exceptionally long-running podcast series within both the 'independent' and 'affiliated' subsets. In their 2011 study titled 'Why podcasters keep going', Markman found that creator-audience community, engagement (e.g. via e-mails, discussion forums and social media), audience appreciation and enjoyment were key drivers of podcast longevity. Markman notes that further study is required into the phenomena of podcast longevity and so-called podfading, where podcasts are no longer produced [70].

Open questions and future directions
This study provides the first large-scale overview of the production of English language science podcast series, yet there are many open questions that remain. For example, does the general content of science podcasts differ across different cultures and languages? [7] What level of prior knowledge is required to understand science podcasts? [71] Are science podcasts helping to change non-representative stereotypes of scientists? [72] Do science podcasts promote and foster trust in science? [13] Are podcasts considered in long-term science communication and impact strategies? [73].
The motivations for podcast hosts and creators of podcast have previously been explored in two studies: Markman [70], and Markman & Sawyer [14]. However, the motivations for the creation of science podcast series may be rather different from the motivations of podcast producers for other topics. For example, how do factors such as career recognition (or lack thereof ), and time constraints motivate science podcasters [74], and how do podcast creators use social media to engage with their audiences? [75] In recent years, new methods of analysis have been developed for other new online media such as blogs and online news sources [71,76]. While metrics, such as listener numbers and attention, are not available for large-scale analysis of podcasts, other techniques could be adapted to the study of science podcasts. For example, analysis of hyperlinks included in blogs has been used to provide a measure of 'content diversity' [76]. Similarly, hyperlink analysis could be applied to science podcast show notes to ascertain diversity of sources and content that audiences are referred to.
Audiobooks are an increasingly popular medium [77] that could be used as a direct comparison between the written word and audio forms of science communication. Audiobooks, like podcasts, are a portable and convenient audio-only format. Audiobooks are typically narrated by a single voiceactor or by the authors themselves. However, because they are typically direct adaptions of the written word, science audiobooks are formal, not conversational [78]. A further distinction of audiobooks from podcasts is that audiobooks are nearly exclusively produced by for-profit media and publishing companies, not by independent, decentralized, content creators. As an example of the potential richness of audiobooks as a data source: at the time of writing, Audible (a major for-profit audiobook content provider) has over 2000 science audiobooks available across 'science', 'astronomy', 'physics' and 'biology' categories [79]. Therefore, audiobooks could serve as a 'test-bed' for studies comparing how media formats may alter the effectiveness of science communication.

Conclusion
This study has revealed large-scale trends in science podcasting for the first time. Overall, the total number of science podcast series grew linearly between 2004 and 2010, and subsequently it has grown exponentially between 2010 and 2018. A total of 952 science podcast series met the inclusion criteria for this study, giving a lower-bound on English language science podcasts available at the start of 2018. Most science podcast series (87%) are audio-only, with the number of new video format science podcast series declining from a peak of approximately 30% in 2007 to only 5% in 2017. This may reflect that podcast audiences are choosing to listen to podcasts while undertaking activities incompatible with consuming video content.
One third of science podcast series were found to cover many aspects of science, but many individual subjects were well represented by dedicated podcast series. Notably, 'chemistry' as a topic appears to be under-represented, with only 3% of podcast series compared to 18% for 'physics and astronomy', and 13% for 'biology. This apparent under-representation in podcasting may mirror similar long-term trends in science communication where chemistry has been under-represented as a distinct subject. This may also reflect the idea that chemistry is best-represented by visual mediums, i.e. not audio podcasts.
Most science podcasts appear to be targeted towards the audience of the general public (77%), with fewer science podcast series serving educational purposes (6%), serving specialist audiences (16%) or dedicated to science communication for children (less than 1%). Fifty-one per cent of science podcast series included extra information to audiences in the form of supplementary show notes, containing text, images or hyperlinks.
Almost two-thirds of science podcast series have at least one host with a background in scientific research, science communication or science education. This indicates that scientists are using podcasts to communicate with the public. The exact reasons as to why podcasting is attractive to science communicators are still to be ascertained, but it is likely to be due to the simplicity of producing podcasts, the low amount of equipment required, the global audience reach, the ability to receive feedback via social media, the intimate nature of the medium and the lack of format constraints.
Thirty-eight per cent of science podcast series appeared to be produced independently; the remaining 62% of science podcast series had an overt affiliation to some sort of organization, e.g. a university, funding agency or media network. Generally, most science podcast series appeared to not have any overt form of supplementary income, i.e. through advertising, selling merchandise or soliciting for audience donations. This indicates that a large portion of science podcast series are being financed by independent content creators or by organizations. Of podcasts with overt supplementary income, podcasts 'affiliated' with an organization were more likely to have adverts, and 'independent' science podcast series were more likely to sell merchandise or solicit for audience donations. Whether or not a science podcast series is independent or affiliated to an organization appears to make key differences in several production outputs. Most notably, 'independent' podcast series produce fewer episodes on average (median 16, average 48) than 'affiliated' podcast series (median 24, average 90) ( p 0.01). Furthermore, the long-term mean lifespan of 'independent' podcasts (4.3 years) appears to be significantly less than the long-term mean lifespan of 'affiliated' podcasts (5.5 years) ( p , 0.02).
While this study has provided the first insights into the large-scale production of science podcasts, there are still many ongoing questions about how science podcasts are being used to communicate science. Metrics for download and listener attention were not available for the podcasts studied, but content analysis of show-note hyperlinks could be used in future as a proxy for content diversity. Audiobooks could serve as a medium for comparative studies between written and spoken science communication, without the conversational nature of podcasts. In future, a combination of quantitative and qualitative approaches may be required to yield further insights into the motivations of science podcasters, why they choose to produce the podcasts that they do, and how science podcasts are meeting the need for science communication without geographic barriers.