Hi,
I think there are several issues going on.
1. Some content is not found upon searching - this problem could be a lack of support for Czech language in the Lucene.NET, might be worth reviewing which languages are supported to confirm this or not. I have seen this problem before with other not yet supported languages. For example text with certain swedish characters is also known to not be searchable.
2. The Intro content is truncated sometimes in the middle of a word or character entity
The intro text is only for display so it is not connected to problem 1 as it is not used in actual search , only in display of results.
When we retieve the results from the search index, we are applying some security to prevent cross site scripting, if you look in the markup of SearchResults.aspx you'll see:
SecurityHelper.PreventCrossSiteScripting(DataBinder.Eval(Container.DataItem, "Intro").ToString())
Since the actual content is stored raw we use a white list approach to filter out or encode anything that doesn't match the white list of allowed markup.
So in trying to solve this I think first we need to find out if the languages is supported and if not perhaps you can look into what is needed to support it in Lucene.NET. The other thing is we may need a smarter way of truncating the string to produce an intro for display without breaking on a character entity or word. The intro text is created in mojoPortal.Business.WebHelpers.IndexHelper.RebuildIndex(...) at the time of content indexing, so that is where we would plug in a better solution.
Hope it helps,
Joe