SEX contributions anyone
Ben Rubinstein
benr_mc at cogapp.com
Thu Sep 4 13:06:00 EDT 2003
on 4/9/03 5:17 pm, MisterX wrote
> Has anyone got a "Fast" remove duplicate lines script?
> Best I get is 12ms per line... Any line being a word.
If split worked the way I think it should (and still could, quite compatibly
- perhaps I'll make in bugzilla a suggestion I made long ago) then doing
split/combine would probably do this almost instantaneously.
Even without that, I've found Rev/MC's hashed arrays fantastically
efficient. Have you tried simply:
put empty into aTemp
repeat for each line t in tManyLines
put true into aTemp[t]
end repeat
put the keys of aTemp into tFewerLines
Of course that will lose the order, but I'd expect it to be very fast. If
you want to keep sequence (first appearance) then
put empty into tFewerLines
put empty into aTemp
repeat for each line t in tManyLines
if aTemp[t] = empty then
put t & return after tFewerLines
put true into aTemp[t]
end if
end repeat
should work, albeit a bit more slowly.
Ben Rubinstein | Email: benr_mc at cogapp.com
Cognitive Applications Ltd | Phone: +44 (0)1273-821600
http://www.cogapp.com | Fax : +44 (0)1273-728866
More information about the metacard
mailing list