Abstract
Large Language Models (LLMs) have revolutionized the field of natural language processing. These models can analyze vast amounts of data, extract meaningful insights, and provide a basis for informed conservation decisions. This paper identifies the main applications of LLMs for ecology and biodiversity conservation: generation of ecological data, prediction using coding by LLMs, providing insights into public opinion and sentiment, and the potential application of Ecology-specialized LLMs.We discuss the potential challenges and limitations associated with the use of LLMs, such as biases in LLM-generated code and data, and the need for careful evaluation and interpretation of LLM-generated results.
Biodiversity and ecosystem conservation are crucial for the long-term sustainability of our planet (Turner et al. 2007; Rands et al. 2010; Haddad et al. 2015; King S., Ferrier S., Turner K., Badura T. 2019; Malhi et al. 2020). The loss of species and habitats has far-reaching consequences, including impacts on food and water security (Díaz et al. 2006; Hanski 2011; Haddad et al. 2015; Bellard et al. 2022), climate change (Hooper et al. 2012; Mantyka-pringle et al. 2012; Weiskopf et al.2020), and disease transmission (Pongsiri et al. 2009; Keesinget al. 2010; Petrovan et al. 2021) . Given the complexity of these issues, there is an urgent need for general ecological knowledge and innovative and effective conservation strategies.
In recent times, the concepts of ”big data”, ”data-driven”, and ”data integration” have gained widespread popularity in various sectors of human society, including those related to general ecology and biodiversity conservation (LaDeau et al. 2017; Runting et al. 2020; Heberling et al. 2021). Runting et al. (2020) have demonstrated how these advancements in data growth can aid in the discovery, analysis, and understanding of environmental changes at varying scales, ranging from micro to macro. Nevertheless, the authors have also highlighted the need to overcome various barriers, such as usage fees, substantial delays, and incomplete data releases, to enable effective analysis of big data and access to its derived outputs (Runting et al. 2020).
Recent advances in machine learning and artificial intelligence have led to the development of increasingly sophisticated language models. Among these models, large language models (LLMs), such as Generative Pre-trained Transformer (GPT) (Brown et al. 2020) are revolutionizing the field of natural language processing. These models can generate human-like text, understand complex language structures, and even translate between languages with remarkable accuracy. The potential applications of these models are vast, ranging from chatbots and virtual assistants to content creation and language learning. One of the most significant benefits of LLMs is their ability to generate text that is indistinguishable from that written by humans. This has enabled the development of chatbots and virtual assistants that can interact with humans more naturally. In addition, these models have shown a remarkable ability to translate between languages, which has the potential to revolutionize the way we communicate globally. In recent years, LLMs have emerged as a promising tool for addressing some of these challenges. LLMs can analyze vast amounts of data, extract meaningful insights, and provide a basis for informed conservation decisions. Here, we provide the LLMs applications on ecology and biodiversity conservation in the following sections.
Ecological data generation by LLMs
LLMs can process large volumes of documents related to ecology, including scientific papers, reports, and online news articles, to extract relevant information. For instance, researchers can use LLMs to identify new or endangered species, track changes in population size, and detect emerging threats to biodiversity. Additionally, LLMs can analyze ecosystem data, such as climate and soil data, to develop statistical models that help understand the complex relationships between species and their environment. LLMs can also be used to develop automatic image recognition systems for identifying species from large amounts of image data. This technology can aid in monitoring and surveys of biodiversity, making data collection and analysis more efficient and accurate. Additionally, LLMs can assist in developing simulation models of ecosystems that predict the effects of environmental changes or species migration. These models can help inform conservation decision-making by providing insights into how ecosystems may respond to different scenarios.
Many LLMs possess the capability to translate between languages, enabling researchers to collect relevant information from non-English resources. A recent study indicated that non-English resources could fill conservation gaps (Amano et al. 2021). Although natural history information is the foundation of ecology, evolution, and conservation (https://www.esa.org/wp-content/uploads/2022/05/Submission-Types-Ecology.pdf, accessed 9, June, 2023), there are several natural history resources in non-English languages that are not shared. LLMs could facilitate the integration of natural history information across languages, easing barriers for researchers when choosing the language for publication.
However, a potential drawback of using LLMs to collect data is that the data may contain false information. LLMs rely on the quality of the data they are trained on and may not be able to distinguish between real and false data. Therefore, it is important to ensure that the data used to train LLMs is of high quality and not contaminated with false information. To address this potential drawback, researchers can use techniques such as data preprocessing and data augmentation to improve the quality of the training data. In addition, researchers can develop algorithms to detect and remove fake data from the training data set. With these measures in place, LLMs can be a powerful tool for biodiversity data collection, providing valuable insights into the complex relationships between species and their environment.
Additionally, LLMs contribute to image analysis by automatically recognizing and classifying species based on visual data. This capability enhances biodiversity monitoring and species identification efforts, particularly in large-scale surveys. Moreover, LLMs can process satellite imagery, extracting valuable environmental data such as land cover types, vegetation indices, and habitat connectivity. This satellite-based information aids in understanding ecosystem dynamics, monitoring deforestation, and assessing the impacts of land use changes on biodiversity. The combined use of text mining, image analysis, and satellite data processing by LLMs holds great potential for generating ecological data, enabling researchers and conservation practitioners to gain valuable insights for effective biodiversity conservation and management strategies.
Prediction in ecology and biodiversity conservation
The application of LLMs in biodiversity conservation has the potential to revolutionize research practices and enhance conservation efforts. LLMs offer a unique opportunity for standardization and commonization of various aspects of research, for example the process from SQL and GBIF data to Rstan code. By leveraging LLMs, researchers can shift their focus towards utilizing the rich and field data available through platforms like GBIF. This enables a more comprehensive understanding of biodiversity patterns and trends. Additionally, LLMs facilitate simulation research by promoting reproducibility and interpretability, ensuring that findings can be validated and understood by the scientific community. This emphasis on reproducibility aligns with the broader movement towards open science. Moreover, the LLM-centered research approach provides a solid foundation for integrating logic support and employing case studies to address complex conservation challenges. While this approach may be considered radical, it offers immense potential to advance our understanding of biodiversity and guide effective conservation strategies. By embracing LLMs, researchers can harness their capabilities to generate valuable insights, inform evidence-based decision-making, and foster collaborative and interdisciplinary approaches in biodiversity conservation.
However, there are potential limitations and challenges associated with the use of LLMs for code generation in ecological modeling. A major concern is the potential for bias in the generated code, which could lead to inaccurate or misleading predictions (Shah et al. 2020; Weidinger et al. 2021; Albrecht et al. 2022). LLMs are trained on large datasets that may contain biases and limitations that are not always apparent to the user. In addition, LLMs can sometimes generate code that is difficult to understand or modify, which can limit the flexibility and adaptability of ecological models. To address these challenges, it is important to carefully evaluate the performance and accuracy of LLM-generated code in ecological modeling applications.
Providing insights into public opinion and sentiment
LLMs have considerable potential in providing valuable insights into public opinion and sentiment regarding biodiversity conservation. One application involves leveraging AI to generate summaries of intricate environmental reports, enabling easier comprehension by the general public. This approach enhances communication and promotes wider engagement with ecological and conservation issues. LLMs can analyze not only published materials, also social media data, which has become a significant platform for public discourse. Althoughh LLMs can be used to create fake news, spread propaganda, and manipulate public opinion (Civelek et al. 2016), by processing large volumes of text, LLMs can help identify instances of fake news, propaganda, and the manipulation of public opinion. Such analysis is crucial as it can shed light on the narratives being propagated and their potential impact on public perception and policy-making. Recognizing the power of false narratives, especially concerning topics like climate change, is essential to avoid a lack of action or misguided policies. By examining reports and social media content, LLMs enable researchers to gain insights into public sentiment and opinion regarding ecological issues, empowering more informed decision-making and the development of targeted communication strategies.
One of the most important advantages of utilizing the LLM for such analyses lies in its immediacy. A systematic review serves as a pivotal method for summarizing and synthesizing the extant knowledge and perspectives on a particular subject. However, this method necessitates significant effort and basically omits contemporaneous sources, such as news articles and social media content. Conversely, analyzing a variety of recent reports and information can ameliorate the limitations inherent in the systematic review approach.
However, the use of LLMs for ecology and biodiversity conservation also poses several challenges. For instance, the accuracy and reliability of LLMs depend on the quality of data used to train them. Furthermore, LLMs can perpetuate biases and inequalities present in the data, which can negatively impact management efforts. As such, it is important to use LLMs responsibly and consider the environmental impact of using these models for conservation activities. LLMs should supplement human expertise in conservation, not replace it, as there may be unique insights, intuition, and contextual understanding that models cannot grasp. Implementing large language models in ecology and conservation requires navigating complex legal landscapes, including compliance with various international and national regulations.
Development of Ecology-specialized LLMs
Recently, LLMs specialized for specific purposes have been developed. For example BioMedLM (https://www.mosaicml.com/blog/introducing-pubmed-gpt) and agriGPT (https://agri-gpt.com) for the specialized LLM for Biomedicine and agriculture. There is no doubt that the development of specialized LLMs in ecology and biodiversity science would accelerate the advancement of the field.
Ecology-specialized LLMs have the potential to revolutionize our understanding of complex ecological systems and play a crucial role in decision-making for biodiversity and ecosystem conservation. One effective approach to harness the full capabilities of LLMs is to develop topic-specific models by fine-tuning them with relevant academic papers that offer peer-reviewed information on various aspects of ecological systems. These papers provide comprehensive insights into the components, interactions, functions, threats, and solutions related to ecosystems. By fine-tuning LLMs with this rich academic literature, their accuracy and reliability in generating and analyzing ecological knowledge can be significantly enhanced. This specialized training enables LLMs to better comprehend ecological concepts, identify relevant patterns and trends, and provide valuable information for scientific research, conservation planning, and environmental policy development. Ecology-specialized LLMs have the potential to become indispensable tools for ecologists, helping them uncover hidden relationships, predict ecosystem responses, and guide effective conservation strategies to safeguard our planet’s biodiversity and the services it provides.
Negative impacts to use LLMs for environment
LLMs have also negative impacts on the environment (Rillig et al. 2023). The direct impact of LLMs on the environment is related to the energy consumption required to train and run these LLMs. LLMs require massive amounts of computational power, which in turn requires large amounts of electricity. The training process for GPT-3, for example, is estimated to have consumed over 1287 MWh of energy (Rillig et al. 2023). The carbon footprint of training GPT-3 was estimated to be around 552.1 t of CO2 (Strubell et al. 2019; Pattersonet al. 2021). The carbon footprint of running these models is also significant, as they require large amounts of energy to operate LLMs. Furthermore, as LLMs become more powerful and require even larger data sets, the energy requirements will only increase Therefore, it is important to consider the environmental impact of LLMs and work towards developing more sustainable approaches to their development and use. This could include developing more energy-efficient hardware or exploring alternative sources of energy to power LLMs, as well as investigating ways to reduce the computational requirements of training and operating these models.
Conclusion
In this paper, we have explored the potential of LLMs to address these challenges in the applications of ecology and biodiversity conservation. LLMs are powerful tools that can help to consolidate, evaluate, and synthesize the existing knowledge on ecology and biodiversity conservation, potentially with using the Ecology- specialized LLMs. This can facilitate evidence-based policy-making in biodiversity conservation. Researchers can focus more on discovering new insights based on the current ecological facts, rather than exploring and processing existing information. However, there are also potential challenges and limitations associated with the use of LLMs, such as biases in LLM-generated code and data and the need for careful evaluation and interpretation of LLM-generated results including fake information. We conclude that LLMs have the potential to transform the methods for ecology and biodiversity conservation, but their successful integration into management practice will require collaboration between researchers, practitioners and stakeholders, as well as ongoing evaluation and refinement of LLM applications.
Acknowledgments
We would like to thank Akira S. Mori for his comments on our manuscript. We would like to express our sincere gratitude to the teams behind GPT-3.5, Notion AI, DeepL, DeepL Write, Perplexity AI, Elicit, Bing AI, and Grammarly for providing us with exceptional language tools that have significantly improved the quality and efficiency of our work. We generated the first-draft paragraphs by GPT-3.5 and Notion AI allowing us to generate high-quality text from our ideas. Grammarly, DeepL and DeepL Write have been valuable in helping us to improve our English writing of this paper.
References:
Albrecht, J., Kitanidis, E. & Fetterman, A.J. (2022). Despite “super-human” performance, current LLMs are unsuited for decisions about ethics and safety. arXiv, 2212.06295 .
Amano, T., Berdejo-Espinola, V., Christie, A. P., Willott, K., Akasaka, M., Baldi, A., … & Sutherland, W. J. (2021). Tapping into non-English-language science for the conservation of global biodiversity. PLoS Biology, 19(10), e3001296.
Bellard, C., Marino, C. & Courchamp, F. (2022). Ranking threats to biodiversity and why it doesn’t matter. Nat. Commun. , 13, 2616.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P.,et al. (2020). Language models are few-shot learners. Adv. Neural Inf. Process. Syst. , 33, 1877–1901.
Civelek, M.E., Çemberci, M. & Eralp, N.E. (2016). The Role of Social Media in Crisis Communication and Crisis Management. International Journal of Research in Business and Social Science (2147-4478) , 5, 111–120.
Costanza, R., Wainger, L., Folke, C. & Mäler, K.-G. (1993). Modeling Complex Ecological Economic SystemsToward an evolutionary, dynamic understanding of people and nature. Bioscience , 43, 545–555.
Díaz, S., Fargione, J., Chapin, F.S., 3rd & Tilman, D. (2006). Biodiversity loss threatens human well-being. PLoS Biol. , 4, e277.
Haddad, N.M., Brudvig, L.A., Clobert, J., Davies, K.F., Gonzalez, A., Holt, R.D., et al. (2015). Habitat fragmentation and its lasting impact on Earth’s ecosystems. Sci Adv , 1, e1500052.
Hanski, I. (2011). Habitat loss, the dynamics of biodiversity, and a perspective on conservation. Ambio , 40, 248–255.
Heberling, J.M., Miller, J.T., Noesgaard, D., Weingart, S.B. & Schigel, D. (2021). Data integration enables global biodiversity synthesis.Proc. Natl. Acad. Sci. U. S. A. , 118.
Hooper, D.U., Adair, E.C., Cardinale, B.J., Byrnes, J.E.K., Hungate, B.A., Matulich, K.L., et al. (2012). A global synthesis reveals biodiversity loss as a major driver of ecosystem change. Nature , 486, 105–108.
Keane, R.E., Loehman, R.A., Holsinger, L.M., Falk, D.A., Higuera, P., Hood, S.M., et al. (2018). Use of landscape simulation modeling to quantify resilience for ecological applications. Ecosphere , 9, e02414.
Keesing, F., Belden, L.K., Daszak, P., Dobson, A., Harvell, C.D., Holt, R.D., et al. (2010). Impacts of biodiversity on the emergence and transmission of infectious diseases. Nature , 468, 647–652.
King S., et al. (2019). Discussion paper 11: Research paper on habitat and biodiversity related ecosystem services. Paper submitted to the Expert Meeting on Advancing the Measurement of Ecosystem Services for Ecosystem Accounting .
LaDeau, S.L., Han, B.A., Rosi-Marshall, E.J. & Weathers, K.C. (2017). The Next Decade of Big Data in Ecosystem Science. Ecosystems , 20, 274–283.
Lauenroth, W.K., Canham, C.D., Kinzig, A.P., Poiani, K.A., Kemp, W.M. & Running, S.W. (1998). Simulation Modeling in Ecosystem Science. In:Successes, Limitations, and Frontiers in Ecosystem Science (eds. Pace, M.L. & Groffman, P.M.). Springer New York, New York, NY, pp. 404–415.
Malhi, Y., Franklin, J., Seddon, N., Solan, M., Turner, M.G., Field, C.B., et al. (2020). Climate change and ecosystems: threats, opportunities and solutions. Philos. Trans. R. Soc. Lond. B Biol. Sci. , 375, 20190104.
Mantyka-pringle, C.S., Martin, T.G. & Rhodes, J.R. (2012). Interactions between climate and habitat loss effects on biodiversity: a systematic review and meta-analysis. Glob. Chang. Biol. , 18, 1239–1252.
Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., et al. (2021). Carbon Emissions and Large Neural Network Training. arXiv [cs.LG] .
Petrovan, S.O., Aldridge, D.C., Bartlett, H., Bladon, A.J., Booth, H., Broad, S., et al. (2021). Post COVID-19: a solution scan of options for preventing future zoonotic epidemics. Biol. Rev. Camb. Philos. Soc. , 96, 2694–2715.
Pongsiri, M.J., Roman, J., Ezenwa, V.O., Goldberg, T.L., Koren, H.S., Newbold, S.C., et al. (2009). Biodiversity Loss Affects Global Disease Ecology. Bioscience , 59, 945–954.
Rands, M.R.W., Adams, W.M., Bennun, L., Butchart, S.H.M., Clements, A., Coomes, D., et al. (2010). Biodiversity conservation: challenges beyond 2010. Science , 329, 1298–1303.
Rillig, M.C., Ågerstrand, M., Bi, M., Gould, K.A. & Sauerland, U. (2023). Risks and Benefits of Large Language Models for the Environment.Environ. Sci. Technol. , 57, 3464–3466.
Runting, R.K., Phinn, S., Xie, Z., Venter, O. & Watson, J.E.M. (2020). Opportunities for big data in conservation and sustainability.Nat. Commun. , 11, 2003.
Shah, D.S., Schwartz, H.A. & Hovy, D. (2020). Predictive Biases in Natural Language Processing Models: A Conceptual Framework and Overview. In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics . Association for Computational Linguistics, Online, pp. 5248–5264.
Strubell, E., Ganesh, A. & McCallum, A. (2019). Energy and Policy Considerations for Deep Learning in NLP. arXiv [cs.CL] .
Turner, W.R., Brandon, K., Brooks, T.M., Costanza, R., da Fonseca, G.A.B. & Portela, R. (2007). Global Conservation of Biodiversity and Ecosystem Services. Bioscience , 57, 868–873.
Van Nes, E.H. & Scheffer, M. (2005). A strategy to improve the contribution of complex simulation models to ecological theory.Ecol. Modell. , 185, 153–164.
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P.-S., et al. (2021). Ethical and social risks of harm from Language Models. arXiv 2112.04359.
Weiskopf, S.R., Rubenstein, M.A., Crozier, L.G., Gaichas, S., Griffis, R., Halofsky, J.E., et al. (2020). Climate change effects on biodiversity, ecosystems, ecosystem services, and natural resource management in the United States. Sci. Total Environ. , 733, 137782.