Mining Biodiversity

Project TitleMining Biodiversity
Tweets @mibio_did
Project email address:
Start DateApril 2014
End DateOctober 2015
UK Project ManagerProfessor Sophia Ananiadou, National Centre for Text Mining,
School of Computer Science, University of Manchester,
Project TeamDr Riza Batista-Navarro, National Centre for Text Mining, School
of Computer Science, University of Manchester, +441613063090,
Dr Rafal Rak, National Centre for Text Mining, School of
Computer Science, University of Manchester, +441613063090,
Lead InstitutionNational Centre for Text Mining, School of Computer Science,
University of Manchester
Project PartnersBig Data Analytics Institute, Dalhousie University
Social Media Lab, Dalhousie University (Canada)
Missouri Botanical Garden (USA)
Project Plan
Progress Report


The Mining Biodiversity project aims to transform the Biodiversity Heritage Library (BHL)
into a next-generation social digital library resource to facilitate the study and discussion
(via social media integration) of legacy science documents on biodiversity by a worldwide
community and to raise awareness of the changes in biodiversity over time in the general
public. The project integrates novel text mining methods, visualisation, crowdsourcing and
social media into the BHL. The resulting digital resource will provide fully interlinked and
indexed access to the full content of BHL library documents, via semantically enhanced
and interactive browsing and searching capabilities, allowing users to locate precisely the
information of interest to them in an easy and efficient manner.


By promoting the development of capabilities that will foster collaboration amongst
researchers from the fields of History of Science, Environmental History, Environmental
Studies, Library and Information Science and Social Media, the proposed project will make
a significant impact on the above disciplines by (1) enriching a large-scale library, i.e., the
BHL, via innovative application of text mining techniques to produce semantic metadata
and a term inventory; (2) providing improved access to biodiversity-related digital artefacts
via an enhanced search engine and visualisation of results, and (3) stimulating increased
collaboration, interaction and sharing of information amongst BHL users via the social
media environment.

Anticipated Outputs and Outcomes

The specific outputs of this project include:

  1. a tool for the automatic correction of errors in text extracted from legacy biodiversity literature via optical character recognition (OCR);
  2. a “gamified” crowdsourcing facility that will encourage users to annotate legacy texts with semantic metadata;
  3. text mining tools for automatically extracting metadata (i.e., terminology, entities and events);
  4. a search engine allowing users to search the BHL according to different information dimensions or facets;
  5. tools integrated into the BHL for visualising search results; and
  6. the expansion of the BHL with a social media layer to enable users to share materials,
  7. hold discussions and to collaborate.

The project’s main outcomes include the efficient delivery of even more informative BHL
content, enhanced experience of BHL end-users, and the increased interaction and
collaboration amongst members of the biodiversity community.

Print Friendly, PDF & Email

Leave a Reply

The following information is needed for us to identify you and display your comment. We’ll use it, as described in our standard privacy notice, to provide the service you’ve requested, as well as to identify problems or ways to make the service better. We’ll keep the information until we are told that you no longer want us to hold it.
Your email address will not be published. Required fields are marked *