Comparing big lists
Gregory Lypny
gregory.lypny at videotron.ca
Sat Apr 27 16:14:01 EDT 2002
Thanks for the suggestion, Scott. I'll give it a shot. I've also tried
looping over the lines of bigList (i.e., a nested repeat), simply using
the 'in' operator: if x is in y, then... It takes about 6 minutes on a
modest (300 mHz) iBook running OS X, but I'm hoping for an improvement,
Regards,
Greg
On 27/4/2002 12:08 PM, metacard-request at lists.runrev.com wrote:
>Message: 2
>Date: Fri, 26 Apr 2002 12:48:53 -0600 (MDT)
>From: Scott Raney <raney at metacard.com>
>To: metacard at lists.runrev.com
>Subject: Re: Comparing big lists
>Reply-To: metacard at lists.runrev.com
>
>On: Thu, 25 Apr 2002 Gregory Lypny <gregory.lypny at videotron.ca> wrote:
>
>> Thought I would pick your brains on the topic of comparing two big
>> lists. Both are tab delimited. bigList has about 100,000 lines and
>> 6 items (columns) per line. smallList is about 15,000 lines and 2
>> items per line. I want to identify the lines in bigList in which
>> the third item is the same as the second item in a line in
>> smallList, and then pull out the intersection. I used something
>> like this, which works fine.
>
>> set the itemDelimiter to tab
>> repeat for each line j of smallList
>> put lineOffset(item 2 of j, bigList) into thisLine
>> if thisLine is not 0 then put j & tab & \
>> line thisLine of bigList & return after mergedList
>> end repeat
>> delete last character of mergedList -- Get rid of the trailing Return
>
>> Using the lineOffset function seemed the obvious choice to me, but I'm
>> also interested in other approaches.
>
>LineOffset on such a big variable is going to be pretty expensive.
>Another option would be to us split to build an array out of smallList
>and the loop over each line in big list and see if there is an array
>index for it. Split takes awhile and will use up a good bit of
>memory, but makes the lookups *much* faster. You could save some of
>that space by building up an array of just the relevant items in one
>list or the other by looping over the lines and creating one array
>index for each.
> Regards,
> Scott
>
>> Regards,
>> Greg
>
>********************************************************
>Scott Raney raney at metacard.com http://www.metacard.com
>MetaCard: You know, there's an easier way to do that...
More information about the metacard
mailing list