PatternCaptureGroupFitlerFactory vs. PatternTokenizerFactory

Question:

I want to facet on a field of part_number_type. The problem is, if the source data has data that does not match the pattern, the PatternCaptureGroupFitlerFactory emits the original token. I dont want the original token emitted in that case.

Answer:

So this case the client was using PatternCaptureGroupFilterFactory, but as said above, if the pattern didn't match, then the original token went through. So if we step back for a minute we need to think about what PaternCaptureGroupFilterFactory is doing vs. PatternTokenizerFactory. 

PatternCaptureGroupFilterFactory is working on individual tokens to possibly make new tokens based on the matched pattern within the token. It however will not block tokens that don't match, it lets them through. 

PatternTokenizerFactory in this case is what the client wanted. It will only tokenize tokens that match the pattern. So if you don't want the tokens created in the first place, this is what you should be looking for. 

Have more questions? Submit a request

0 Comments

Please sign in to leave a comment.
Powered by Zendesk