Next Article in Journal
Research on Automatic Wavelength Calibration of Passive DOAS Observations Based on Sequence Matching Method
Previous Article in Journal
Toward a More Robust Estimation of Forest Biomass Carbon Stock and Carbon Sink in Mountainous Region: A Case Study in Tibet, China
 
 
Article
Peer-Review Record

GeoKnowledgeFusion: A Platform for Multimodal Data Compilation from Geoscience Literature

by Zhixin Guo 1, Chaoyang Wang 2, Jianping Zhou 1, Guanjie Zheng 1,*, Xinbing Wang 1 and Chenghu Zhou 1,3
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Submission received: 17 March 2024 / Revised: 17 April 2024 / Accepted: 21 April 2024 / Published: 23 April 2024
(This article belongs to the Section Earth Observation Data)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Overall, the paper is very interesting and I really enjoyed the time spend to read it as well as exploring the web platform. The authors present the GeoKnowledgeFusion platform which is publicly accessible for the fusion of text, visual, and tabular knowledge extracted from the geoscience literature. The work has clearly stated objectives and with detailed presentation and explanation of the current challenges.

However, some major changes should be considered by the authors, stated above.

First I think that the section numbering is wrong. Introduction has 0.

A general comment that should be addressed is regarding the metadata. The PDF pre-processing consists of metadata extraction and keyword filtering. What happens if a file doesn’t have open available the metadata information or some of the metadata is missing or is closed etc.? the process could not be completed or it could be partial completed, etc.? This information should be provided somewhere in the m manuscript by the authors.

It is very difficult to read figure 3. The text in the image is too small to be read but this text is connected with other boxes. I suggest to improve the quality of this image.

In Lines 217-218 how the extracted data transformed into XML documents? What is the process?

Furthermore in Lines 218-219 the authors mentioned that they establish a relational data table for each PDF. What is the structure of the table and the size.? I suggest the authors to elaborate more.

The generated index for PDF, authors mentioned that includes title, author list, abstract, venue and year. How the authors selected this information to for this index.

In Line 274 what is this threshold? I propose to explain more in the manuscript.

From my point of view figure 4 is maybe the most crucial one in this manuscript. However, it is very difficult to understand the proposed process. This image should be reconstructed and the different steps should be explained more (maybe in section 3.3) as well as the potential interlinkages among the steps. The image caption should also have the description of the steps e.g a)NER annotation, b) image (Table) recognition). As stated by the authors the human in the loop annotation is very critical in the overall proposed GeoKnowledgeFusion platform process, but from my point of view it could be presented better in the manuscript.

In lines 304-305 it would be very useful if the authors could present an example of this process. It could be text, image, table etc.

From my point of view till section 4 everything has satisfactory flow and are well connected. Although the authors have an “introduction” paragraph for this section the transition is too abrupt for the reader and not smooth. The authors must reconstruct the first paragraph and/or add connected information with the previous sections. In my opinion this paragraph should have also related references for the statements by the authors.

The authors in section 5 provide the conclusion of this work with its limitations and some future work. However, we are talking about a platform. Authors should definitely discuss about the sustainability of this platform.

All the images in the manuscript could have better quality.

Author Response

Thanks for your comments. Following your advice, we have reorganized our manuscript, and the PDF document is our response.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

The paper deals with multimodal knowledge extraction and its fusion in the field of geosciences. A new tool has been designed and implemented to extract information such as metadata, keywords, images and tabular data from PDF.

 The paper is well designed and written; the tool implemented and available via GitHub could be interesting for a large number of researchers in the field of geosciences.

 

Comments in detail:

Abstract: contains enough information about the article. Two cases are used for the quantitative test of the prototype: Debris flows/mountains and Sm-Nd isotope data for age determination.  The former should also be mentioned in the summary.

The introduction gives a comprehensive and understandable overview and mentions the main contribution of the work carried out.

The chapter "Related work" is quite short, only the first paragraph (about 16 lines) really deals with related work. Here the authors should add a more comprehensive description of the state of the art in this field. The second and third paragraphs in this chapter focus on the method used in the actual research and therefore seem out of place in this chapter. A reorganisation of this chapter is recommended.

A clearer separation of method (mixed in chapters 2 and 3) and results (mixed in chapters 3 and 4) would be helpful for the understanding of the article. In principle, two cases are analysed: Debris flows/mudslides and Sm-Nd isotope data for age determination. The first case is mainly used for a quantitative test of NER and table detection, while the second case is discussed in detail in Chapter 4. Here it is recommended to either discuss both use cases in detail or to deal with only one.  In this context, the concluding paragraphs of chapter 4 seem out of place. This assessment has nothing to do with the topic of the article and should be deleted. In this context [50] seems to be a self-citation and should be deleted

Author Response

Thanks for your comments. We have reorganized our manuscript following your advice, and the PDF document is our response.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

I find the research content to be of significant practical value in extracting multimodal data from Geoscience literature and compiling data to provide a platform for collecting, updating, and processing Geoscience data.

However, I have some suggestions for improvement:

Citation Format in the Introduction: Multiple instances in the introduction lack consistent citation formats, such as line 47-48, line 144, etc. Please ensure uniformity in citation style throughout the manuscript.

Figure and Table Formatting: The background color settings in the figures reduce readability. I suggest adjusting the proportion between text and images to ensure clarity and coherence. Furthermore, the formatting of Table 2 and Table 3 needs improvement, and it might be beneficial to merge these two tables if possible.

Depth of Research and Methodology: One significant concern is the lack of depth in the research methodology and implementation process. The explanation of the fusion model could be enriched by providing comparisons with other models. Additionally, the validation section only includes one application scenario. I recommend adding comparative analyses across multiple industries and content scenarios. Evaluating efficiency improvements solely based on comparison with manual compilation time is insufficient. It would be beneficial to compare the time consumption of different fusion methods to validate the effectiveness of your research approach. Moreover, the discussion section should include an analysis of the practical value of the platform in various application scenarios.

 

Author Response

Thanks for your comments. We have reorganized our manuscript following your advice and the PDF document is our response.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

Αll my comments were properly covered by the authors. From my point of view the work should be published.

Back to TopTop