Abstract
Large
Language Models (LLMs) have revolutionized the field of natural language
processing. These models can analyze vast amounts of data, extract
meaningful insights, and provide a basis for informed conservation
decisions. This paper identifies the main applications of LLMs for
ecology and biodiversity conservation: generation of ecological data,
prediction using coding by LLMs, providing insights into public opinion
and sentiment, and the potential application of Ecology-specialized
LLMs.We discuss the potential challenges and limitations associated with
the use of LLMs, such as biases in LLM-generated code and data, and the
need for careful evaluation and interpretation of LLM-generated results.
Biodiversity and ecosystem conservation are crucial for the long-term
sustainability of our planet (Turner et al. 2007; Rands et
al. 2010; Haddad et al. 2015; King S., Ferrier S., Turner K.,
Badura T. 2019; Malhi et al. 2020). The loss of species and
habitats has far-reaching consequences, including impacts on food and
water security (Díaz et al. 2006; Hanski 2011; Haddad et
al. 2015; Bellard et al. 2022), climate change (Hooper et
al. 2012; Mantyka-pringle et al. 2012; Weiskopf et al.2020), and disease transmission (Pongsiri et al. 2009; Keesinget al. 2010; Petrovan et al. 2021) . Given the complexity
of these issues, there is an urgent need for general ecological
knowledge and innovative and effective conservation strategies.
In recent times, the concepts of ”big data”, ”data-driven”, and ”data
integration” have gained widespread popularity in various sectors of
human society, including those related to general ecology and
biodiversity conservation (LaDeau et al. 2017; Runting et
al. 2020; Heberling et al. 2021). Runting et al. (2020) have
demonstrated how these advancements in data growth can aid in the
discovery, analysis, and understanding of environmental changes at
varying scales, ranging from micro to macro. Nevertheless, the authors
have also highlighted the need to overcome various barriers, such as
usage fees, substantial delays, and incomplete data releases, to enable
effective analysis of big data and access to its derived outputs
(Runting et al. 2020).
Recent advances in machine learning and artificial intelligence have led
to the development of increasingly sophisticated language models. Among
these models, large language models (LLMs), such as Generative
Pre-trained Transformer (GPT) (Brown et al. 2020) are
revolutionizing the field of natural language processing. These models
can generate human-like text, understand complex language structures,
and even translate between languages with remarkable accuracy. The
potential applications of these models are vast, ranging from chatbots
and virtual assistants to content creation and language learning. One of
the most significant benefits of LLMs is their ability to generate text
that is indistinguishable from that written by humans. This has enabled
the development of chatbots and virtual assistants that can interact
with humans more naturally. In addition, these models have shown a
remarkable ability to translate between languages, which has the
potential to revolutionize the way we communicate globally. In recent
years, LLMs have emerged as a promising tool for addressing some of
these challenges. LLMs can analyze vast amounts of data, extract
meaningful insights, and provide a basis for informed conservation
decisions. Here, we provide the LLMs applications on ecology and
biodiversity conservation in the following sections.
Ecological data generation by LLMs
LLMs can process large volumes of documents related to ecology,
including scientific papers, reports, and online news articles, to
extract relevant information. For instance, researchers can use LLMs to
identify new or endangered species, track changes in population size,
and detect emerging threats to biodiversity. Additionally, LLMs can
analyze ecosystem data, such as climate and soil data, to develop
statistical models that help understand the complex relationships
between species and their environment. LLMs can also be used to develop
automatic image recognition systems for identifying species from large
amounts of image data. This technology can aid in monitoring and surveys
of biodiversity, making data collection and analysis more efficient and
accurate. Additionally, LLMs can assist in developing simulation models
of ecosystems that predict the effects of environmental changes or
species migration. These models can help inform conservation
decision-making by providing insights into how ecosystems may respond to
different scenarios.
Many LLMs possess the capability to translate between languages,
enabling researchers to collect relevant information from non-English
resources. A recent study indicated that non-English resources could
fill conservation gaps (Amano et al. 2021). Although natural history
information is the foundation of ecology, evolution, and conservation
(https://www.esa.org/wp-content/uploads/2022/05/Submission-Types-Ecology.pdf,
accessed 9, June, 2023), there are several natural history resources in
non-English languages that are not shared. LLMs could facilitate the
integration of natural history information across languages, easing
barriers for researchers when choosing the language for publication.
However, a potential drawback of using LLMs to collect data is that the
data may contain false information. LLMs rely on the quality of the data
they are trained on and may not be able to distinguish between real and
false data. Therefore, it is important to ensure that the data used to
train LLMs is of high quality and not contaminated with false
information. To address this potential drawback, researchers can use
techniques such as data preprocessing and data augmentation to improve
the quality of the training data. In addition, researchers can develop
algorithms to detect and remove fake data from the training data set.
With these measures in place, LLMs can be a powerful tool for
biodiversity data collection, providing valuable insights into the
complex relationships between species and their environment.
Additionally, LLMs contribute to image analysis by automatically
recognizing and classifying species based on visual data. This
capability enhances biodiversity monitoring and species identification
efforts, particularly in large-scale surveys. Moreover, LLMs can process
satellite imagery, extracting valuable environmental data such as land
cover types, vegetation indices, and habitat connectivity. This
satellite-based information aids in understanding ecosystem dynamics,
monitoring deforestation, and assessing the impacts of land use changes
on biodiversity. The combined use of text mining, image analysis, and
satellite data processing by LLMs holds great potential for generating
ecological data, enabling researchers and conservation practitioners to
gain valuable insights for effective biodiversity conservation and
management strategies.
Prediction in ecology and biodiversity conservation
The application of LLMs in biodiversity conservation has the potential
to revolutionize research practices and enhance conservation efforts.
LLMs offer a unique opportunity for standardization and commonization of
various aspects of research, for example the process from SQL and GBIF
data to Rstan code. By leveraging LLMs, researchers can shift their
focus towards utilizing the rich and field data available through
platforms like GBIF. This enables a more comprehensive understanding of
biodiversity patterns and trends. Additionally, LLMs facilitate
simulation research by promoting reproducibility and interpretability,
ensuring that findings can be validated and understood by the scientific
community. This emphasis on reproducibility aligns with the broader
movement towards open science. Moreover, the LLM-centered research
approach provides a solid foundation for integrating logic support and
employing case studies to address complex conservation challenges. While
this approach may be considered radical, it offers immense potential to
advance our understanding of biodiversity and guide effective
conservation strategies. By embracing LLMs, researchers can harness
their capabilities to generate valuable insights, inform evidence-based
decision-making, and foster collaborative and interdisciplinary
approaches in biodiversity conservation.
However, there are potential limitations and challenges associated with
the use of LLMs for code generation in ecological modeling. A major
concern is the potential for bias in the generated code, which could
lead to inaccurate or misleading predictions (Shah et al. 2020;
Weidinger et al. 2021; Albrecht et al. 2022). LLMs are
trained on large datasets that may contain biases and limitations that
are not always apparent to the user. In addition, LLMs can sometimes
generate code that is difficult to understand or modify, which can limit
the flexibility and adaptability of ecological models. To address these
challenges, it is important to carefully evaluate the performance and
accuracy of LLM-generated code in ecological modeling applications.
Providing insights into public opinion and sentiment
LLMs have considerable potential in providing valuable insights into
public opinion and sentiment regarding biodiversity conservation. One
application involves leveraging AI to generate summaries of intricate
environmental reports, enabling easier comprehension by the general
public. This approach enhances communication and promotes wider
engagement with ecological and conservation issues. LLMs can analyze not
only published materials, also social media data, which has become a
significant platform for public discourse. Althoughh LLMs can be used to
create fake news, spread propaganda, and manipulate public opinion
(Civelek et al. 2016), by processing large volumes of text, LLMs
can help identify instances of fake news, propaganda, and the
manipulation of public opinion. Such analysis is crucial as it can shed
light on the narratives being propagated and their potential impact on
public perception and policy-making. Recognizing the power of false
narratives, especially concerning topics like climate change, is
essential to avoid a lack of action or misguided policies. By examining
reports and social media content, LLMs enable researchers to gain
insights into public sentiment and opinion regarding ecological issues,
empowering more informed decision-making and the development of targeted
communication strategies.
One of the most important advantages of utilizing the LLM for such
analyses lies in its immediacy. A systematic review serves as a pivotal
method for summarizing and synthesizing the extant knowledge and
perspectives on a particular subject. However, this method necessitates
significant effort and basically omits contemporaneous sources, such as
news articles and social media content. Conversely, analyzing a variety
of recent reports and information can ameliorate the limitations
inherent in the systematic review approach.
However,
the use of LLMs for ecology and biodiversity conservation also poses
several challenges. For instance, the accuracy and reliability of LLMs
depend on the quality of data used to train them.
Furthermore, LLMs can perpetuate
biases and inequalities present in the data, which can negatively impact
management efforts. As such, it is important to use LLMs responsibly and
consider the environmental impact of using these models for conservation
activities. LLMs should supplement human expertise in conservation, not
replace it, as there may be unique insights, intuition, and contextual
understanding that models cannot grasp. Implementing large language
models in ecology and conservation requires navigating complex legal
landscapes, including compliance with various international and national
regulations.
Development of Ecology-specialized LLMs
Recently, LLMs
specialized
for specific purposes have been developed. For example BioMedLM
(https://www.mosaicml.com/blog/introducing-pubmed-gpt) and agriGPT
(https://agri-gpt.com) for the specialized LLM for Biomedicine and
agriculture. There is no doubt that the development of specialized LLMs
in ecology and biodiversity science would accelerate the advancement of
the field.
Ecology-specialized
LLMs have the potential to revolutionize our understanding of complex
ecological systems and play a crucial role in decision-making for
biodiversity and ecosystem conservation. One effective approach to
harness the full capabilities of LLMs is to develop topic-specific
models by fine-tuning them with relevant academic papers that offer
peer-reviewed information on various aspects of ecological systems.
These papers provide comprehensive insights into the components,
interactions, functions, threats, and solutions related to ecosystems.
By fine-tuning LLMs with this rich academic literature, their accuracy
and reliability in generating and analyzing ecological knowledge can be
significantly enhanced. This specialized training enables LLMs to better
comprehend ecological concepts, identify relevant patterns and trends,
and provide valuable information for scientific research, conservation
planning, and environmental policy development. Ecology-specialized LLMs
have the potential to become indispensable tools for ecologists, helping
them uncover hidden relationships, predict ecosystem responses, and
guide effective conservation strategies to safeguard our planet’s
biodiversity and the services it provides.
Negative impacts to use LLMs for environment
LLMs have also negative impacts on the environment (Rillig et al. 2023).
The direct impact of LLMs on the environment is related to the energy
consumption required to train and run these LLMs. LLMs require massive
amounts of computational power, which in turn requires large amounts of
electricity. The training process for GPT-3, for example, is estimated
to have consumed over 1287 MWh of energy (Rillig et al. 2023).
The carbon footprint of training GPT-3 was estimated to be around 552.1
t of CO2 (Strubell et al. 2019; Pattersonet al. 2021). The carbon footprint of running these models is
also significant, as they require large amounts of energy to operate
LLMs. Furthermore, as LLMs become more powerful and require even larger
data sets, the energy requirements will only increase Therefore, it is
important to consider the environmental impact of LLMs and work towards
developing more sustainable approaches to their development and use.
This could include developing more energy-efficient hardware or
exploring alternative sources of energy to power LLMs, as well as
investigating ways to reduce the computational requirements of training
and operating these models.
Conclusion
In this paper, we have explored the potential of LLMs to address these
challenges in the applications of ecology and biodiversity conservation.
LLMs are powerful tools that can help to consolidate, evaluate, and
synthesize the existing knowledge on ecology and biodiversity
conservation, potentially with using the Ecology- specialized LLMs. This
can facilitate evidence-based policy-making in biodiversity
conservation. Researchers can focus more on discovering new insights
based on the current ecological facts, rather than exploring and
processing existing information. However, there are also potential
challenges and limitations associated with the use of LLMs, such as
biases in LLM-generated code and data and the need for careful
evaluation and interpretation of LLM-generated results including fake
information. We conclude that LLMs have the potential to transform the
methods for ecology and biodiversity conservation, but their successful
integration into management practice will require collaboration between
researchers, practitioners and stakeholders, as well as ongoing
evaluation and refinement of LLM applications.
Acknowledgments
We would like to thank Akira S. Mori for his comments on our manuscript.
We would like to express our sincere gratitude to the teams behind
GPT-3.5, Notion AI, DeepL, DeepL Write, Perplexity AI, Elicit, Bing AI,
and Grammarly for providing us with exceptional language tools that have
significantly improved the quality and efficiency of our work. We
generated the first-draft paragraphs by GPT-3.5 and Notion AI allowing
us to generate high-quality text from our ideas. Grammarly, DeepL and
DeepL Write have been valuable in helping us to improve our English
writing of this paper.
References:
Albrecht, J., Kitanidis, E. & Fetterman, A.J. (2022). Despite
“super-human” performance, current LLMs are unsuited for decisions
about ethics and safety. arXiv, 2212.06295 .
Amano, T., Berdejo-Espinola, V., Christie, A. P., Willott, K., Akasaka,
M., Baldi, A., … & Sutherland, W. J. (2021). Tapping into
non-English-language science for the conservation of global
biodiversity. PLoS Biology, 19(10), e3001296.
Bellard, C., Marino, C. & Courchamp, F. (2022). Ranking threats to
biodiversity and why it doesn’t matter. Nat. Commun. , 13, 2616.
Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J.D., Dhariwal, P.,et al. (2020). Language models are few-shot learners. Adv.
Neural Inf. Process. Syst. , 33, 1877–1901.
Civelek, M.E., Çemberci, M. & Eralp, N.E. (2016). The Role of Social
Media in Crisis Communication and Crisis Management. International
Journal of Research in Business and Social Science (2147-4478) , 5,
111–120.
Costanza, R., Wainger, L., Folke, C. & Mäler, K.-G. (1993). Modeling
Complex Ecological Economic SystemsToward an evolutionary, dynamic
understanding of people and nature. Bioscience , 43, 545–555.
Díaz, S., Fargione, J., Chapin, F.S., 3rd & Tilman, D. (2006).
Biodiversity loss threatens human well-being. PLoS Biol. , 4,
e277.
Haddad, N.M., Brudvig, L.A., Clobert, J., Davies, K.F., Gonzalez, A.,
Holt, R.D., et al. (2015). Habitat fragmentation and its lasting
impact on Earth’s ecosystems. Sci Adv , 1, e1500052.
Hanski, I. (2011). Habitat loss, the dynamics of biodiversity, and a
perspective on conservation. Ambio , 40, 248–255.
Heberling, J.M., Miller, J.T., Noesgaard, D., Weingart, S.B. & Schigel,
D. (2021). Data integration enables global biodiversity synthesis.Proc. Natl. Acad. Sci. U. S. A. , 118.
Hooper, D.U., Adair, E.C., Cardinale, B.J., Byrnes, J.E.K., Hungate,
B.A., Matulich, K.L., et al. (2012). A global synthesis reveals
biodiversity loss as a major driver of ecosystem change. Nature ,
486, 105–108.
Keane, R.E., Loehman, R.A., Holsinger, L.M., Falk, D.A., Higuera, P.,
Hood, S.M., et al. (2018). Use of landscape simulation modeling
to quantify resilience for ecological applications. Ecosphere , 9,
e02414.
Keesing, F., Belden, L.K., Daszak, P., Dobson, A., Harvell, C.D., Holt,
R.D., et al. (2010). Impacts of biodiversity on the emergence and
transmission of infectious diseases. Nature , 468, 647–652.
King S., et al. (2019). Discussion paper 11: Research paper
on habitat and biodiversity related ecosystem services. Paper submitted
to the Expert Meeting on Advancing the Measurement of Ecosystem Services
for Ecosystem Accounting .
LaDeau, S.L., Han, B.A., Rosi-Marshall, E.J. & Weathers, K.C. (2017).
The Next Decade of Big Data in Ecosystem Science. Ecosystems , 20,
274–283.
Lauenroth, W.K., Canham, C.D., Kinzig, A.P., Poiani, K.A., Kemp, W.M. &
Running, S.W. (1998). Simulation Modeling in Ecosystem Science. In:Successes, Limitations, and Frontiers in Ecosystem Science (eds.
Pace, M.L. & Groffman, P.M.). Springer New York, New York, NY, pp.
404–415.
Malhi, Y., Franklin, J., Seddon, N., Solan, M., Turner, M.G., Field,
C.B., et al. (2020). Climate change and ecosystems: threats,
opportunities and solutions. Philos. Trans. R. Soc. Lond. B Biol.
Sci. , 375, 20190104.
Mantyka-pringle, C.S., Martin, T.G. & Rhodes, J.R. (2012). Interactions
between climate and habitat loss effects on biodiversity: a systematic
review and meta-analysis. Glob. Chang. Biol. , 18, 1239–1252.
Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M.,
Rothchild, D., et al. (2021). Carbon Emissions and Large Neural
Network Training. arXiv [cs.LG] .
Petrovan, S.O., Aldridge, D.C., Bartlett, H., Bladon, A.J., Booth, H.,
Broad, S., et al. (2021). Post COVID-19: a solution scan of
options for preventing future zoonotic epidemics. Biol. Rev. Camb.
Philos. Soc. , 96, 2694–2715.
Pongsiri, M.J., Roman, J., Ezenwa, V.O., Goldberg, T.L., Koren, H.S.,
Newbold, S.C., et al. (2009). Biodiversity Loss Affects Global
Disease Ecology. Bioscience , 59, 945–954.
Rands, M.R.W., Adams, W.M., Bennun, L., Butchart, S.H.M., Clements, A.,
Coomes, D., et al. (2010). Biodiversity conservation: challenges
beyond 2010. Science , 329, 1298–1303.
Rillig, M.C., Ågerstrand, M., Bi, M., Gould, K.A. & Sauerland, U.
(2023). Risks and Benefits of Large Language Models for the Environment.Environ. Sci. Technol. , 57, 3464–3466.
Runting, R.K., Phinn, S., Xie, Z., Venter, O. & Watson, J.E.M. (2020).
Opportunities for big data in conservation and sustainability.Nat. Commun. , 11, 2003.
Shah, D.S., Schwartz, H.A. & Hovy, D. (2020). Predictive Biases in
Natural Language Processing Models: A Conceptual Framework and Overview.
In: Proceedings of the 58th Annual Meeting of the Association for
Computational Linguistics . Association for Computational Linguistics,
Online, pp. 5248–5264.
Strubell, E., Ganesh, A. & McCallum, A. (2019). Energy and Policy
Considerations for Deep Learning in NLP. arXiv [cs.CL] .
Turner, W.R., Brandon, K., Brooks, T.M., Costanza, R., da Fonseca,
G.A.B. & Portela, R. (2007). Global Conservation of Biodiversity and
Ecosystem Services. Bioscience , 57, 868–873.
Van Nes, E.H. & Scheffer, M. (2005). A strategy to improve the
contribution of complex simulation models to ecological theory.Ecol. Modell. , 185, 153–164.
Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang,
P.-S., et al. (2021). Ethical and social risks of harm from
Language Models. arXiv 2112.04359.
Weiskopf, S.R., Rubenstein, M.A., Crozier, L.G., Gaichas, S., Griffis,
R., Halofsky, J.E., et al. (2020). Climate change effects on
biodiversity, ecosystems, ecosystem services, and natural resource
management in the United States. Sci. Total Environ. , 733,
137782.