Regular expressions performance: Boost vs. Perl


Question

I'm looking for a performance comparison between perl and boost regular expression.
I need to design a piece of code which relies very heavily on regular expressions, and can choose between:

  1. running it through a boost regex
  2. dispatching a perl interpreter and do the work in perl

I know perl is known for it's optimized string processing. However, I can't find a performance comparison to boost regex library.
Do you know of any such comparison?
Thanks

1
13
11/18/2009 11:59:11 PM

Accepted Answer

The startup cost of running a Perl interpreter from within your application (via the system function I presume) will outweigh any benefits you gain over using Perl's regex engine. The exception would be if you have a VERY complicated regular expression that Perl's regex implementation happens to be optimised for but boost's regex engine isn't.

The real answer is that I do not know of any such comparison, but Perl's regular expression facilities are not necessarily the fastest. See here for some information about an algorithm that beats Perl's regular expression for some expressions.

EDIT: It is possible to overcome the startup cost of starting a full perl interpreter by linking to libperl or using libPCRE. And using boost will probably give you more flexibility and performance tuning options if you need them.

Final Note: There are no known direct comparisons between boost.regex and Perl's regex in terms of performance. The solution is to try both and see which is more performant for the OP's specific situation.

(Edit : There is now a good comparison between Boost and PCRE. See http://www.boost.org/doc/libs/1_41_0/libs/regex/doc/gcc-performance.html)

12
2/27/2012 9:55:45 PM

If you haven't seen it yet, there's a regexp benchmark in the Great Language Shootout. It doesn't rank Perl very high at all. A Boost implementation using boost::xpressive is ranked first (which pre-compiles the expression at compile time). However, this is a microbenchmark, so probably not representative of general regular expression speed, but still worth a look.

Surprisingly enough, apparently the fastest regular expression engine by far is Google Chrome's V8 JavaScript JIT (almost beats GCC in wall-clock time, utilizing just a single CPU core)


Licensed under: CC-BY-SA with attribution
Not affiliated with: Stack Overflow
Icon