I’ve just uploaded 1.1 update for Lemmagen lemmatizer for Solr, which is now a pure Java .JAR library and does not require installation of any additional files on your server. New version also updates package name and configuration attribute to be more consistent.
1. Download library
Download the library JAR from BitBucket: Lemmatizer
2. Add library to Solr’s Java path
Copy library JAR to your application’s server
lib dir or copy it to your core’s
lib dir. E.g. if your core is located in /var/solr/core, create a lib folder next to conf and data folders of the core and copy the
3. Add lemmatizer to schema.xml
Add lemmatizer filter to your Solr schema and pass desired language to it:
<filter class="si.virag.solr.LemmagenLemmatizerFactory" language="slovenian" />
That’s it. Suppored languages are: english, french, estonian, bulgarian, czech, slovakian, slovenian, serbian, russian, romanian, hungarian, macedonian and polish.
This version is based on Michal Hlaváč’s excellent jLemmaGen Java port of Lemmagen library. It was tested with Solr 4.3 and newer.