Extraction von Informationen aus PDF DokumentenJun 26, 2017
Extraction von Informationen aus PDF Dokumenten
,
Daniel
F.
Portfolio
Online form for water jet cutting
When the user enters the shape and dimensions of the pieces to be cut, a scale drawing is displayed. Predefined shapes are discs, rings and rectangles, with optional drilling. Specific drawings for other shapes can be uploaded.
Thousands of pdf text documents had to be downloaded and parsed to extract pieces of information that had to be stored in a data base.
The challenge was that the pieces of information that had to be parsed were not always at the same place in the document, not always in the same order and could arbitrarily cover part of a paragraph, an entire paragraph or several paragraphs.