Andrew Friedman
afriedman412 [at] gmail [dot] com
Home • Work • Projects • Open Source • Writing • ContentOPEN SOURCE CONTRIBUTIONS
PDFPlumber
Plumb a PDF for detailed information about each char, rectangle, line, et cetera — and easily extract text and tables..- Implemented recognition of character spacing by fraction of font size (instead of total pixels)
- Improved and streamlined code for interpreting rotated text and text written in any direction (vertically or horizontally)
FPDF2
Simple PDF generation for Python- Expanded capability to convert SVG images to PDF
- Added support for SVG clipping paths
- Improved SVG variable interpretation
- Added proper formatting for multi-index tables
Textacy
Python library for performing a variety of natural language processing (NLP) tasks, built on the high-performance spaCy library. With the fundamentals --- tokenization, part-of-speech tagging, dependency parsing, etc. --- delegated to another library, textacy focuses primarily on the tasks that come before and follow after.- Improved quote detection to look for text within pairs of specific characters, instead of text between any sequential quotation mark-like characters
- Improved accuracy of attribution and reduced false positives by expanding and adjusting window for attribution
- Added code to prepare and standardize text for quote detection