My web scraping tends to start with xidel. If I need a little bit more power I'l...

mikepurvis · on Sept 8, 2021

I like xmlstarlet too, if only because it's old enough that I can reliably get it in package repositories and the dependency footprint is tiny (less an issue now with this tool written in Rust, but previously I was comparing to NPM- and PyPI-based affairs).

spiralx · on Sept 8, 2021

lxml is one of the most pleasing to use Python libraries ever, managing to wrap a hot mess of XML APIs in a consistent and Pythonic fashion that you rarely need to escape. IIRC I used beautifulsoup to parse the HTML of a site, and then lxml and either find items and fields by CSS in IPython for quick and dirty data munging, or knock up an XSLT file to transform what I'd scraped into good data in an XML file :)