Improving Table Extraction Accuracy and Automation for PDF-based Journal Articles

Authors

  • Pankaj Pachauri University of Rajasthan, Jaipur Author

DOI:

https://doi.org/10.64758/1v4fqs68

Keywords:

detection accuracy, element extraction, data accessibility, table structure restoration

Abstract

The paper provides insights into the obstacles in automatic table extraction from PDF-based journal articles with a focus on optimizing detection accuracy and minimizing the loss of information. The impact of the text size, border length, absolute location, and hierarchical clustering on compared performance with the previously developed solutions is studied. This paper adopted a quantitative research approach to explore how changes in independent variables influence detection accuracy and extraction efficiency. The results show that optimized text size and flexible border length greatly improve the detection and restoration of table structures, while hierarchical clustering improves the accuracy of table structures. The proposed method outperforms previous techniques in terms of reducing information loss and improving efficiency, and it is promising for automated data extraction in academic documents.

Downloads

Published

2025-10-25