I want to be able to search for C++ and C#, however it looks like the + and # characters are being removed.
By default, WordDelimiterFilter assigns 'types' to each character (computed from Unicode Properties).
Based on these types and the options provided, it splits and concatenates text.
Here are the steps to follow to explicitly define the '+' and the '#' characters as alpha characters so that they are not filtered out, and they are considered in the search queries.
1. Edit the schema.xml file and find the solr.TextField that you are using (e.g. text_en)
2. Under "index" and query" analyzers modify the WordDelimiterFilterFactory and add types="wdfftypes.txt"
<filter catenateAll="0" catenateNumbers="0" catenateWords="0" class="solr.WordDelimiterFilterFactory" generateNumberParts="1" generateWordParts="1" splitOnCaseChange="1" types="wdfftypes.txt"/>
3. Then create the wdfftypes.txt file with the following and place it in the same folder as the schema.xml file.
NOTE: for the # character we have to use the unicode value.
# A customized type mapping for WordDelimiterFilterFactory
# the allowable types are: LOWER, UPPER, ALPHA, DIGIT, ALPHANUM, SUBWORD_DELIM
# the default for any character without a mapping is always computed from
# Unicode character properties
+ => ALPHA
\u0023 => ALPHA
4. Reload the core, or restart Solr
5. Re-index the data so that the missing characters are included in the index