RyanTietjen commited on
Commit
f343cdb
·
verified ·
1 Parent(s): ea34069

Update app.py

Browse files
Files changed (1) hide show
  1. app.py +182 -182
app.py CHANGED
@@ -1,183 +1,183 @@
1
- """
2
- Ryan Tietjen
3
- Sep 2024
4
- Demo application for paper abstract fragmentaion demonstration
5
- """
6
- import gradio as gr
7
- import tensorflow as tf
8
- from tensorflow import keras
9
- from keras import layers
10
- from timeit import default_timer as timer
11
- from process_input import split_abstract
12
- from process_input import split_abstract_original
13
- from process_input import split_sentences_by_characters
14
- import pandas as pd
15
- import tensorflow_hub as hub
16
- from model import EmbeddingLayer
17
- from process_input import encode_labels
18
-
19
-
20
- sample_list = []
21
- example1 = f"""The aim of this study was to describe the electrocardiographic ( ECG ) evolutionary changes after an acute myocardial infarction ( AMI ) and to evaluate their correlation with left ventricular function and remodeling.
22
- The QRS complex changes after AMI have been correlated with infarct size and left ventricular function.
23
- By contrast , the significance of T wave changes is controversial.
24
- We studied 536 patients enrolled in the GISSI-3-Echo substudy who underwent ECG and echocardiographic studies at 24 to 48 h ( S1 ) , at hospital discharge ( S2 ) , at six weeks ( S3 ) and six months ( S4 ) after AMI.
25
- The number of Qwaves ( nQ ) and QRS quantitative score ( QRSs ) did not change over time.
26
- From S2 to S4 , the number of negative T waves ( nT NEG ) decreased ( p < 0.0001 ) , wall motion abnormalities ( % WMA ) improved ( p < 0.001 ) , ventricular volumes increased ( p < 0.0001 ) while ejection fraction remained stable.
27
- According to the T wave changes after hospital discharge , patients were divided into four groups : stable positive T waves ( group 1 , n = 35 ) , patients who showed a decrease > or = 1 in nT NEG ( group 2 , n = 361 ) , patients with no change in nT NEG ( group 3 , n = 64 ) and those with an increase > or = 1 in nT NEG ( group 4 , n = 76 ).
28
- The QRSs and nQ remained stable in all groups.
29
- Groups 3 and 4 showed less recovery in % WMA , more pronounced ventricular enlargement and progressive decline in ejection fraction than groups 1 and 2 ( interaction time x groups p < 0.0001 ).
30
- The analysis of serial ECG can predict postinfarct left ventricular remodeling.
31
- Normalization of negative T waves during the follow-up appears more strictly related to recovery of regional dysfunction than QRS changes.
32
- Lack of resolution and late appearance of new negative T predict unfavorable remodeling with progressive deterioration of ventricular function."""
33
- sample_list.append(example1)
34
-
35
- def format_non_empty_lists(objective, background, methods, results, conclusion):
36
- """
37
- This function checks each provided list and formats a string with the list name and its contents
38
- only if the list is not empty.
39
-
40
- Parameters:
41
- - objective (list): List containing sentences classified as 'Objective'.
42
- - background (list): List containing sentences classified as 'Background'.
43
- - methods (list): List containing sentences classified as 'Methods'.
44
- - results (list): List containing sentences classified as 'Results'.
45
- - conclusion (list): List containing sentences classified as 'Conclusion'.
46
-
47
- Returns:
48
- - str: A formatted string that contains the non-empty list names and their contents.
49
- """
50
-
51
- output = ""
52
- lists = {
53
- 'Objective': objective,
54
- 'Background': background,
55
- 'Methods': methods,
56
- 'Results': results,
57
- 'Conclusion': conclusion
58
- }
59
-
60
- for name, content in lists.items():
61
- if content: # Check if the list is not empty
62
- output += f"{name}:\n" # Append the category name followed by a newline
63
- for item in content:
64
- output += f" - {item}\n" # Append each item in the list, formatted as a list
65
-
66
- output += "\n" # Append a newline for better separation between categories
67
-
68
- return output.strip()
69
-
70
- def fragment_single_abstract(abstract):
71
- """
72
- Processes a single abstract by fragmenting it into structured sections based on predefined categories
73
- such as Objective, Methods, Results, Conclusions, and Background. The function utilizes a pre-trained Keras model
74
- to predict the category of each sentence in the abstract.
75
-
76
- The process involves several steps:
77
- 1. Splitting the abstract into sentences.
78
- 2. Encoding these sentences using a custom embedding layer.
79
- 3. Classifying each sentence into one of the predefined categories.
80
- 4. Grouping the sentences by their predicted categories.
81
-
82
- Parameters:
83
- abstract (str): The abstract text that needs to be processed and categorized.
84
-
85
- Returns:
86
- tuple: A tuple containing two elements:
87
- - A dictionary with keys as the category names ('Objective', 'Background', 'Methods', 'Results', 'Conclusions')
88
- and values as lists of sentences belonging to these categories. Only non-empty categories are returned.
89
- - The time taken to process the abstract (in seconds).
90
-
91
- Example:
92
- ```python
93
- abstract_text = "This study aims to evaluate the effectiveness of..."
94
- categorized_abstract, processing_time = fragment_single_abstract(abstract_text)
95
- print("Categorized Abstract:", categorized_abstract)
96
- print("Processing Time:", processing_time)
97
- ```
98
-
99
- Note:
100
- - This function assumes that a Keras model 'test.keras' and a custom embedding layer 'EmbeddingLayer'
101
- are available and correctly configured to be loaded.
102
- - The function uses pandas for data manipulation, TensorFlow for machine learning operations,
103
- and TensorFlow's data API for batching and prefetching data for model predictions.
104
- """
105
- start_time = timer()
106
-
107
- original_abstract = split_abstract_original(abstract)
108
- df_original = pd.DataFrame(original_abstract)
109
- sentences_original = df_original["text"].tolist()
110
-
111
- abstract_split = split_abstract(abstract)
112
- df = pd.DataFrame(abstract_split)
113
- sentences = df["text"].tolist()
114
- labels = encode_labels(df["target"])
115
-
116
- objective = []
117
- background = []
118
- methods = []
119
- results = []
120
- conclusion = []
121
-
122
- embed_layer = EmbeddingLayer()
123
- model = tf.keras.models.load_model("20k_5_epochs.keras", custom_objects={'EmbeddingLayer': embed_layer})
124
-
125
- data_by_character = split_sentences_by_characters(sentences)
126
- line_numbers = tf.one_hot(df["line_number"].to_numpy(), depth=15)
127
- total_line_numbers = tf.one_hot(df["total_lines"].to_numpy(), depth=20)
128
-
129
- sentences_dataset = tf.data.Dataset.from_tensor_slices((line_numbers, total_line_numbers, sentences, data_by_character))
130
- labels_dataset = tf.data.Dataset.from_tensor_slices(labels)
131
- dataset = tf.data.Dataset.zip((sentences_dataset, labels_dataset)).batch(32).prefetch(tf.data.AUTOTUNE)
132
-
133
- predictions = tf.argmax(model.predict(dataset), axis=1)
134
-
135
- for i, prediction in enumerate(predictions):
136
- if prediction == 0:
137
- objective.append(sentences_original[i])
138
- elif prediction == 1:
139
- methods.append(sentences_original[i])
140
- elif prediction == 2:
141
- results.append(sentences_original[i])
142
- elif prediction == 3:
143
- conclusion.append(sentences_original[i])
144
- elif prediction == 4:
145
- background.append(sentences_original[i])
146
-
147
- end_time = timer()
148
-
149
- return format_non_empty_lists(objective, background, methods, results, conclusion), end_time - start_time
150
-
151
-
152
-
153
- title = "Paper Abstract Fragmentation With TensorFlow by Ryan Tietjen"
154
- description = f"""
155
- This app will take the abstract of a paper and break it down into five categories: objective, background, methods, results, and conclusion.
156
- The dataset used can be found in the [PubMed 200k RCT]("https://arxiv.org/abs/1710.06071") and in [this repo](https://github.com/Franck-Dernoncourt/pubmed-rct). The model architecture
157
- was based off of ["Neural Networks for Joint Sentence Classification in Medical Paper Abstracts."](https://arxiv.org/pdf/1612.05251)
158
-
159
- This project achieved a testing accuracy of 88.12% and a F1 score of 87.92%. For the whole project, please visit [my GitHub](https://github.com/RyanTietjen/Paper-Fragmentation).
160
-
161
- How to use:
162
-
163
- -Paste the given abstract into the box below.
164
-
165
- -Make sure to separate each sentence by a new line (this helps avoid ambiguity).
166
-
167
- -Click submit, and allow the model to run!
168
- """
169
-
170
- demo = gr.Interface(
171
- fn=fragment_single_abstract,
172
- inputs=gr.Textbox(lines=10, placeholder="Enter abstract here..."),
173
- outputs=[
174
- gr.Textbox(label="Fragmented Abstract"),
175
- gr.Number(label="Time to process (s)"),
176
- ],
177
- examples=sample_list,
178
- title=title,
179
- description=description,
180
- )
181
-
182
-
183
  demo.launch(share=False)
 
1
+ """
2
+ Ryan Tietjen
3
+ Sep 2024
4
+ Demo application for paper abstract fragmentaion demonstration
5
+ """
6
+ import gradio as gr
7
+ import tensorflow as tf
8
+ from tensorflow import keras
9
+ from keras import layers
10
+ from timeit import default_timer as timer
11
+ from process_input import split_abstract
12
+ from process_input import split_abstract_original
13
+ from process_input import split_sentences_by_characters
14
+ import pandas as pd
15
+ import tensorflow_hub as hub
16
+ from model import EmbeddingLayer
17
+ from process_input import encode_labels
18
+
19
+
20
+ sample_list = []
21
+ example1 = f"""The aim of this study was to describe the electrocardiographic ( ECG ) evolutionary changes after an acute myocardial infarction ( AMI ) and to evaluate their correlation with left ventricular function and remodeling.
22
+ The QRS complex changes after AMI have been correlated with infarct size and left ventricular function.
23
+ By contrast , the significance of T wave changes is controversial.
24
+ We studied 536 patients enrolled in the GISSI-3-Echo substudy who underwent ECG and echocardiographic studies at 24 to 48 h ( S1 ) , at hospital discharge ( S2 ) , at six weeks ( S3 ) and six months ( S4 ) after AMI.
25
+ The number of Qwaves ( nQ ) and QRS quantitative score ( QRSs ) did not change over time.
26
+ From S2 to S4 , the number of negative T waves ( nT NEG ) decreased ( p < 0.0001 ) , wall motion abnormalities ( % WMA ) improved ( p < 0.001 ) , ventricular volumes increased ( p < 0.0001 ) while ejection fraction remained stable.
27
+ According to the T wave changes after hospital discharge , patients were divided into four groups : stable positive T waves ( group 1 , n = 35 ) , patients who showed a decrease > or = 1 in nT NEG ( group 2 , n = 361 ) , patients with no change in nT NEG ( group 3 , n = 64 ) and those with an increase > or = 1 in nT NEG ( group 4 , n = 76 ).
28
+ The QRSs and nQ remained stable in all groups.
29
+ Groups 3 and 4 showed less recovery in % WMA , more pronounced ventricular enlargement and progressive decline in ejection fraction than groups 1 and 2 ( interaction time x groups p < 0.0001 ).
30
+ The analysis of serial ECG can predict postinfarct left ventricular remodeling.
31
+ Normalization of negative T waves during the follow-up appears more strictly related to recovery of regional dysfunction than QRS changes.
32
+ Lack of resolution and late appearance of new negative T predict unfavorable remodeling with progressive deterioration of ventricular function."""
33
+ sample_list.append(example1)
34
+
35
+ def format_non_empty_lists(objective, background, methods, results, conclusion):
36
+ """
37
+ This function checks each provided list and formats a string with the list name and its contents
38
+ only if the list is not empty.
39
+
40
+ Parameters:
41
+ - objective (list): List containing sentences classified as 'Objective'.
42
+ - background (list): List containing sentences classified as 'Background'.
43
+ - methods (list): List containing sentences classified as 'Methods'.
44
+ - results (list): List containing sentences classified as 'Results'.
45
+ - conclusion (list): List containing sentences classified as 'Conclusion'.
46
+
47
+ Returns:
48
+ - str: A formatted string that contains the non-empty list names and their contents.
49
+ """
50
+
51
+ output = ""
52
+ lists = {
53
+ 'Objective': objective,
54
+ 'Background': background,
55
+ 'Methods': methods,
56
+ 'Results': results,
57
+ 'Conclusion': conclusion
58
+ }
59
+
60
+ for name, content in lists.items():
61
+ if content: # Check if the list is not empty
62
+ output += f"{name}:\n" # Append the category name followed by a newline
63
+ for item in content:
64
+ output += f" - {item}\n" # Append each item in the list, formatted as a list
65
+
66
+ output += "\n" # Append a newline for better separation between categories
67
+
68
+ return output.strip()
69
+
70
+ def fragment_single_abstract(abstract):
71
+ """
72
+ Processes a single abstract by fragmenting it into structured sections based on predefined categories
73
+ such as Objective, Methods, Results, Conclusions, and Background. The function utilizes a pre-trained Keras model
74
+ to predict the category of each sentence in the abstract.
75
+
76
+ The process involves several steps:
77
+ 1. Splitting the abstract into sentences.
78
+ 2. Encoding these sentences using a custom embedding layer.
79
+ 3. Classifying each sentence into one of the predefined categories.
80
+ 4. Grouping the sentences by their predicted categories.
81
+
82
+ Parameters:
83
+ abstract (str): The abstract text that needs to be processed and categorized.
84
+
85
+ Returns:
86
+ tuple: A tuple containing two elements:
87
+ - A dictionary with keys as the category names ('Objective', 'Background', 'Methods', 'Results', 'Conclusions')
88
+ and values as lists of sentences belonging to these categories. Only non-empty categories are returned.
89
+ - The time taken to process the abstract (in seconds).
90
+
91
+ Example:
92
+ ```python
93
+ abstract_text = "This study aims to evaluate the effectiveness of..."
94
+ categorized_abstract, processing_time = fragment_single_abstract(abstract_text)
95
+ print("Categorized Abstract:", categorized_abstract)
96
+ print("Processing Time:", processing_time)
97
+ ```
98
+
99
+ Note:
100
+ - This function assumes that a Keras model 'test.keras' and a custom embedding layer 'EmbeddingLayer'
101
+ are available and correctly configured to be loaded.
102
+ - The function uses pandas for data manipulation, TensorFlow for machine learning operations,
103
+ and TensorFlow's data API for batching and prefetching data for model predictions.
104
+ """
105
+ start_time = timer()
106
+
107
+ original_abstract = split_abstract_original(abstract)
108
+ df_original = pd.DataFrame(original_abstract)
109
+ sentences_original = df_original["text"].tolist()
110
+
111
+ abstract_split = split_abstract(abstract)
112
+ df = pd.DataFrame(abstract_split)
113
+ sentences = df["text"].tolist()
114
+ labels = encode_labels(df["target"])
115
+
116
+ objective = []
117
+ background = []
118
+ methods = []
119
+ results = []
120
+ conclusion = []
121
+
122
+ embed_layer = EmbeddingLayer()
123
+ model = tf.keras.models.load_model("20k_5_epochs.keras", custom_objects={'EmbeddingLayer': embed_layer})
124
+
125
+ data_by_character = split_sentences_by_characters(sentences)
126
+ line_numbers = tf.one_hot(df["line_number"].to_numpy(), depth=15)
127
+ total_line_numbers = tf.one_hot(df["total_lines"].to_numpy(), depth=20)
128
+
129
+ sentences_dataset = tf.data.Dataset.from_tensor_slices((line_numbers, total_line_numbers, sentences, data_by_character))
130
+ labels_dataset = tf.data.Dataset.from_tensor_slices(labels)
131
+ dataset = tf.data.Dataset.zip((sentences_dataset, labels_dataset)).batch(32).prefetch(tf.data.AUTOTUNE)
132
+
133
+ predictions = tf.argmax(model.predict(dataset), axis=1)
134
+
135
+ for i, prediction in enumerate(predictions):
136
+ if prediction == 0:
137
+ objective.append(sentences_original[i])
138
+ elif prediction == 1:
139
+ methods.append(sentences_original[i])
140
+ elif prediction == 2:
141
+ results.append(sentences_original[i])
142
+ elif prediction == 3:
143
+ conclusion.append(sentences_original[i])
144
+ elif prediction == 4:
145
+ background.append(sentences_original[i])
146
+
147
+ end_time = timer()
148
+
149
+ return format_non_empty_lists(objective, background, methods, results, conclusion), end_time - start_time
150
+
151
+
152
+
153
+ title = "Paper Abstract Fragmentation With TensorFlow by Ryan Tietjen"
154
+ description = f"""
155
+ This app will take the abstract of a paper and break it down into five categories: objective, background, methods, results, and conclusion.
156
+ The dataset used can be found in the [PubMed 200k RCT]("https://arxiv.org/abs/1710.06071") and in [this repo](https://github.com/Franck-Dernoncourt/pubmed-rct). The model architecture
157
+ was based off of ["Neural Networks for Joint Sentence Classification in Medical Paper Abstracts."](https://arxiv.org/pdf/1612.05251)
158
+
159
+ This model achieved a testing accuracy of 88.12% and a F1 score of 87.92%. For the whole project, please visit [my GitHub](https://github.com/RyanTietjen/Paper-Fragmentation).
160
+
161
+ How to use:
162
+
163
+ -Paste the given abstract into the box below.
164
+
165
+ -Make sure to separate each sentence by a new line (this helps avoid ambiguity).
166
+
167
+ -Click submit, and allow the model to run!
168
+ """
169
+
170
+ demo = gr.Interface(
171
+ fn=fragment_single_abstract,
172
+ inputs=gr.Textbox(lines=10, placeholder="Enter abstract here..."),
173
+ outputs=[
174
+ gr.Textbox(label="Fragmented Abstract"),
175
+ gr.Number(label="Time to process (s)"),
176
+ ],
177
+ examples=sample_list,
178
+ title=title,
179
+ description=description,
180
+ )
181
+
182
+
183
  demo.launch(share=False)