Spaces:

jackkuo
/

Automated-Enzyme-Kinetics-Extractor

Running

App Files Files Community

jackkuo commited on Nov 22, 2024

Commit

71d5a29

verified ·

1 Parent(s): 85ff0d6

Update app.py

Browse files

Files changed (1) hide show

app.py +1 -1

app.py CHANGED Viewed

@@ -417,7 +417,7 @@ with gr.Blocks(title="Automated Enzyme Kinetics Extractor") as demo:
             gr.Markdown(
                 '''<h1 align="center"> Leveraging Large Language Models for Automated Extraction of Enzyme Kinetics Data from Scientific Literature </h1>
                 <p><strong>Abstract:</strong>
-                <br>Enzyme kinetics data reported in the literature is essential for guiding biomedical research. However, their extraction has traditionally been performed manually through a process that is both time-consuming and prone to errors. Though Large Language Models (LLMs) have witnessed a significant advancement in information extraction in recent years, the inherent capabilities of processing comprehensive scientific data, both precise extraction and objective evaluation, have been less-investigated. Hence achieving fully automated extraction with satisfactory accuracy and offering a comprehensive performance evaluation standard remain a challenging task. This research introduces a novel framework leveraging LLMs for automatic information extraction from academic literature on enzyme kinetics. Our work integrated OCR conversion, content extraction, and output formatting through prompt engineering, marking a significant advancement in automated data extraction for scientific research. We contributed a meticulously curated golden benchmark of 156 research articles, which serves as both an accurate validation tool and a valuable resource for evaluating LLM capabilities in extraction tasks. This benchmark enables a rigorous assessment of LLMs in scientific language comprehension, biomedical concept understanding, and tabular data interpretation. The best-performing model achieved a recall rate of 92% and a precision rate of 88%.  Our approach culminates in the LLM Enzyme Kinetics Archive (LLENKA), a comprehensive dataset derived from 3,435 articles, offering the research community a structured, high-quality resource of enzyme kinetics data that will facilitate future research endeavors. Our work leveraged the comprehensive inherent capabilities of LLMs and successfully developed an automated information extraction pipeline that enhances productivity, surpasses manual curation, and serves as a paradigm in various fields.
                 <br>Figure 1: Pipeline for Enzyme Kinetics Data Extraction
                 </p>'''
             )

             gr.Markdown(
                 '''<h1 align="center"> Leveraging Large Language Models for Automated Extraction of Enzyme Kinetics Data from Scientific Literature </h1>
                 <p><strong>Abstract:</strong>
+                <br>Enzyme kinetics data reported in the literature are essential for guiding biomedical research, yet their extraction is traditionally performed manually, a process that is both time-consuming and prone to errors, while there is no automatic extraction pipeline available for enzyme kinetics data. Though Large Language Models (LLMs) have witnessed a significant advancement in information extraction in recent years, the inherent capabilities of processing comprehensive scientific data, both precise extraction and objective evaluation, have been less-investigated. Hence achieving fully automated extraction with satisfactory accuracy and offering a comprehensive performance evaluation standard remain a challenging task. This research introduces a novel framework leveraging LLMs for automatic information extraction from academic literature on enzyme kinetics. It integrated OCR conversion, content extraction, and output formatting through prompt engineering, marking a significant advancement in automated data extraction for scientific research. We contributed a meticulously curated golden benchmark of 156 research articles, which serves as both an accurate validation tool and a valuable resource for evaluating LLM capabilities in extraction tasks. This benchmark enables a rigorous assessment of LLMs in scientific language comprehension, biomedical concept understanding, and tabular data interpretation. The best-performing model achieved a recall rate of 92% and a precision rate of 88%.  Our approach culminates in the LLM Enzyme Kinetics Archive (LLENKA), a comprehensive dataset derived from 3,435 articles, offering the research community a structured, high-quality resource for enzyme kinetics data facilitating future research endeavors. Our work leveraged the comprehensive inherent capabilities of LLMs and successfully developed an automated information extraction pipeline that enhances productivity, surpasses manual curation, and serves as a paradigm in various fields.
                 <br>Figure 1: Pipeline for Enzyme Kinetics Data Extraction
                 </p>'''
             )