-
Notifications
You must be signed in to change notification settings - Fork 70
Description
The DNR API currently has three primary ways to match requests by their URL:
urlFilter
- match URL by a literal substring with optional wildcards; Chrome's docs for urlFilter, and also Firefox's source code docs for some undocumented cases.requestDomains
- match domain or superdomain by a literal string; Chrome's docs for requestDomains.regexFilter
- match URL by a regular expression; Chrome's docs for regexFilter.
The format of regexFilter
is poorly specified. This issue is about coming up with what regexFilter
should support, potentially beyond what the current implementations offer, following the 2023-01-19 WECG meeting (meeting notes will be available once #343 is merged).
In Chrome, regexpFilter
is basically the syntax of the underlying RE2 library plus the additional implementation-dependent constraint that the memory usage of an individual regex may not exceed 2kb (source). Chrome heavily relied on the RE2 library for its implementation, not just for matching, but also for extra optimizations (source code comment in regex_rules_matcher.h). Chrome limits the number of regexFilter rules to 1000.
In Safari, all DNR rules are internally converted to regular expressions and the maximum number of supported rules is 150k, without a specific smaller limit of regexFilter rules. Its supported regexp syntax is documented at https://developer.apple.com/documentation/safariservices/creating_a_content_blocker#3030754 . This linked documentation is not Safari's DNR docs, but the underlying Content blocker API that Safari uses. There is a human-readable high-level description of this content blocker API at https://webkit.org/blog/4062/targeting-domains-with-content-blockers/ . Internally, Safari compiles the regular expressions to a set of deterministic finite automatons (DFA). These DFAs are optimized and compiled to bytecode. An interpreter uses this bytecode to find the desired actions for a given URL. This implementation is not a generic regexp engine, so any regexp feature needs to be carefully examined before it can be supported.
In Firefox, regexFilter
is not implemented yet. Ideally we can reach a resolution here so that extensions don't have to encounter breaking changes.
In this issue, the goal is to determine the desired syntax of regexFilter
, in a way that is useful to extension developers and feasible to implement optimally in web browsers.