Problem with rve.SearchText

toolwiz · Post by **toolwiz** » Sat Dec 17, 2005 11:20 am

I've got the following code:

        tot_kwds := 0;
        for wd_ndx := 0 to cklist.Count-1 do begin
            // 1
            if not CurrFileData.ContainsWord( cklist.Items[wd_ndx] ) then
                continue;

            rve.SetSelectionBounds(0, rve.GetOffsBeforeItem(0), 0, rve.GetOffsBeforeItem(0));
            while rve.SearchText( cklist.Items[wd_ndx], [rvseoDown,rvseoWholeWord]) do begin
                inc( tot_kwds );
                // . . .
            end;
        end;

What I'm basically doing is searching through a list of words and looking for each word in the rve using rve.SearchText. If a word is found, I look at it's RVData properties and do some things based on them.

I ran into a problem where SearchText was getting hung up not finding a word that wasn't there, so I added a test (below the line marked "//1" ) to see if the word was even there first. This worked in that case (maybe?).

But after a while, I ran into two situations where SearchText would get hung up searching for a word that DOES exist.

By "hung up" I mean it never returns false. After a few seconds of 100% CPU activity, I put a breakpoint at the line "inc( tot_kwds )" and it invariable shows that tot_kwds is > 10,000, which in the data I'm using is simply impossible -- the input data consists of text files that are 400-800 words in length and there aren't more than a dozen or so instances of any given keyword. Most of the keywords have zero occurrences, and most of the rest have 1 or 2. The word list itself is around 200 or so.

I tried to trace it through and track down the problem, but it seems to be intermittent. So I figured I'd just report the symptoms and ask if there's something I can test within the loop to see if it's finished and I can break out of the loop anyway.

Thanks
David

toolwiz · Post by **toolwiz** » Sun Dec 18, 2005 2:27 am

Actually, it appears that I can test to see if rve.RVData.ItemNo changes from one iteration to the next. If it doesn't, then I break out of the loop. Is that a good idea?

Post by **Sergey Tkachenko** » Sun Dec 18, 2005 2:29 pm

I cannot see why ItemNo may be changed - you code does not make any modifictions in the document.

Try to add the condition

Code: Select all

if tot_kwds>1000 then begin
  // display word and document with the selection
end;

toolwiz · Post by **toolwiz** » Sun Dec 18, 2005 7:21 pm

Ok, upon further reflection, I can see why ItemNo might not change and it would be ok. Isn't there anything I can test to see if the internal code isn't going any further?

Checking for some arbitrary number of tot_kwds gives me a useless number for that value -- it is a meaningful value that's used in a statistical calculation later on. That's why I'm counting them.

Post by **Sergey Tkachenko** » Sun Dec 18, 2005 7:48 pm

Enormous value of this counter allows to detect that something went wrong.
We need to know on which word and on which state of the document it happened, and you can view it if the counter becomes equal to 1000, for example.

PS: probably the devil is in // . . .

toolwiz · Post by **toolwiz** » Sun Dec 18, 2005 8:55 pm

The first time I saw this problem, it happened when it was searching for a word that was NOT in the text. The later times it happened looking for a word that WAS in the text. In one case the word was about 2/3 of the way through the text; I didn't check how many instances of it there were. In the second I was able to determine that there was exactly one instance of the word and it was about 15 words from the end of the text. The SearchText function just never moved past it. I couldn't tell if it simply started over at the top of the text buffer and kept finding it, or if it didn't advance past it and stop at the end of the text.

Is there a way to easily parse a text into individual words, like a kind of Split function that would break words into individual Items? (In my case, I'm only interested in strings consisting of alphanumerics, ie. common english words.) That way I could simply iterate over the list of items myself.

Any string of alphabetic chars could be tagged with a style, eg, 0, and the others could be tagged with another style, eg., 1. That way when I iterate through the list of items, I could first check the style and only examine the ones that are known to contain only alphabetic characters.