I'm an rx geek and can often craft what I'm looking to get without a lot of help, but I have used this tool many times before -- it's very slick. It would be nice if it supported something other than JavaScript[0], but hey, it's on the web, it probably makes a lot of sense to be that way (and it's nice that it's all client-side and I don't have to wait for it to ship my regular expression back to a server for processing).
Regular expressions are simply awesome and I'm continually surprised at how frequently I run into developers who have next-to-no understanding of them.Case in point, I ran into some code a few months ago that spanned two methods and 20-lines to do something that a 6-character regular expression could have solved (and would have done so more performantly[1]); the best part was that part of what I was responsible for handling was a bug that ended up residing right inside one of those methods. And then there's all of the things related to "dealing with strings" that many regular expression libraries just handle, such as "\d" vs "[0-9]" in a world with unicode strings[3]. It feels cryptic[4] when you encounter it and you're not familiar with the syntax, but to learn the "80% most useful parts", you needn't study much more than content that would fit on a single printed sheet of paper (and to get the last 20%, you'd need, maybe 2, ... 3?)
All of that said, there's also the other side of the coin; if ever the saying "If all you have is a hammer, everything looks like a nail" had application, it's with regular expressions. I'm not sure how many times the question "How do I write a regular expression to parse HTML" has to be responded with "don't" before folks quit trying[2]. It tends to be the first thing I reach for when I have a need to process text, even when there are better tools; heck, all of my find/replace dialogues in every application that supports it have the "Regex" box checked by default (and it really throws me off when I hit up "Find" in the browser and need to search for something with a ( or ) in it which I escape due to muscle memory)
[0] I have an occasional need for PCRE and .NET style; and I really miss named-groups when I have to do something complex in JavaScript.
[1] While it's easy to accidentally end up in hell, ala https://blog.codinghorror.com/regex-performance/, poorly written string-search code can be worse when the complexity of the pattern your searching for reaches a certain point, and that's to say nothing of the errors per x lines of code and readability (not that rx is particularly readable under complexity).
[2] And hey, I've got a shell script that downloads a few status pages on my server at home that uses awk with regular expressions to extract values from a web page. I wouldn't say it necessarily qualifies as "parsing HTML" since it's really only concerned with looking for a small string which it filters a second time to get the value -- horribly inefficient, but it's worked for 5 years through page changes without requiring adjustment.
[4] While it's usually written cryptically, many (most?) implementations support flags to ignore whitespace and support comment features. I've had a few crazy-ugly rx's that I had to use to extract data from a ticketing system's "blob field" to insert into a structured format; were it not for that feature, it would have been impossible to write and support.
Regular expressions are simply awesome and I'm continually surprised at how frequently I run into developers who have next-to-no understanding of them.Case in point, I ran into some code a few months ago that spanned two methods and 20-lines to do something that a 6-character regular expression could have solved (and would have done so more performantly[1]); the best part was that part of what I was responsible for handling was a bug that ended up residing right inside one of those methods. And then there's all of the things related to "dealing with strings" that many regular expression libraries just handle, such as "\d" vs "[0-9]" in a world with unicode strings[3]. It feels cryptic[4] when you encounter it and you're not familiar with the syntax, but to learn the "80% most useful parts", you needn't study much more than content that would fit on a single printed sheet of paper (and to get the last 20%, you'd need, maybe 2, ... 3?)
All of that said, there's also the other side of the coin; if ever the saying "If all you have is a hammer, everything looks like a nail" had application, it's with regular expressions. I'm not sure how many times the question "How do I write a regular expression to parse HTML" has to be responded with "don't" before folks quit trying[2]. It tends to be the first thing I reach for when I have a need to process text, even when there are better tools; heck, all of my find/replace dialogues in every application that supports it have the "Regex" box checked by default (and it really throws me off when I hit up "Find" in the browser and need to search for something with a ( or ) in it which I escape due to muscle memory)
[0] I have an occasional need for PCRE and .NET style; and I really miss named-groups when I have to do something complex in JavaScript.
[1] While it's easy to accidentally end up in hell, ala https://blog.codinghorror.com/regex-performance/, poorly written string-search code can be worse when the complexity of the pattern your searching for reaches a certain point, and that's to say nothing of the errors per x lines of code and readability (not that rx is particularly readable under complexity).
[2] And hey, I've got a shell script that downloads a few status pages on my server at home that uses awk with regular expressions to extract values from a web page. I wouldn't say it necessarily qualifies as "parsing HTML" since it's really only concerned with looking for a small string which it filters a second time to get the value -- horribly inefficient, but it's worked for 5 years through page changes without requiring adjustment.
[3] At least in the C# world, are about twice as slow due to handling digits "correctly" https://stackoverflow.com/questions/16621738/d-is-less-effic...
[4] While it's usually written cryptically, many (most?) implementations support flags to ignore whitespace and support comment features. I've had a few crazy-ugly rx's that I had to use to extract data from a ticketing system's "blob field" to insert into a structured format; were it not for that feature, it would have been impossible to write and support.