the large file challenge

Sun Nov 10 14:04:01 EST 2002

| 
| I'm pretty sure the problem with speed here is from reading in the entire 
| file.
| Unless of course you have enough free RAM- but that's hard to imagine when 
| the files are 300MB+.
| 
| How about this, which you can adjust to read any given number of lines at a 
| time.
| Try it with 10, 1000, 10000, etc and see what gives you the best performance!
| Hasn't been tested but hopefully it'll run with a tweak or less.
| 
| #!/usr/local/bin/mc
| on startup
|   ## initialize variables: try adjusting numLines
|   put "/gig/tmp/log/xaa" into the_file
|   put 1000 into numLines
|   put 0 into counter
| 
|   open file the_file
| 
|   repeat until (isEOF = TRUE)
|      ## read the specified number of lines, check if we are at the end of the 
| file
|      read from file the_file for numLines lines
|      put it into thisChunk
|      put (the result = "eof") into isEOF
| 
|      ## count the number of matches in this chunk
|      put offset("mystic_mouse", thisChunk) into theOffset
|      repeat until (theOffset = 0)
|         add 1 to counter
|         put offset("mystic_mouse", thisChunk, theOffset) into tempOffset
|         if (tempOffset > 0) then add tempOffset to theOffset
|         else put 0 into theOffset
|      end repeat
| 
|   end repeat
| 
|   close file the_file
| 
|   put counter
| end startup
| 
| HTH,
| Brian
-------------------

Hey Brian, thanks, excellent.

I tried it with 10, 1000, 10000 and it got slightly faster (just a few
seconds) with each increase so I'll leave it at 10000 and compare against other 
suggested algorithms, and let everyone knkow results..

Sadhu