in_array is quite slow
TweetSo, anyhow, we have this huge array of ids accumulated during the import. So, an in clause with 2 million parts would suck. So, we suck back all the ids in the database that exist and stick that into an array. We then compared the two arrays by looping one array and using in_array() to check if the value was in the second array. Here is a pseudo example that shows the idea:
[sourcecode language='php']
foreach($arr1 as $key=>$i){
if(in_array($i, $arr2)){
unset($arr1[$key]);
}
}
[/sourcecode]
So, that was running for hours with about 400k items. Our data did not contain the value as the key, but it could as the value was unique. So, I added it. So, now, the code looks like:
[sourcecode language='php']
foreach($arr1 as $key=>$i){
if(isset($arr2[$i])){
unset($arr1[$key]);
}
}
[/sourcecode]
Yeah, that runs in .8 seconds. Much better.
So, why were we using in_array to start with if in_array is clearly not the right solution to this problem? Well, it was basic code evolution. Originally, these imports would be maybe 100 items. But, things changed.
FWIW, I tried array_diff() as well. It took 25 seconds. Way better than looping and calling in_array, but still not as quick as a simple isset check. There was refactoring needed to put the values into the keys of the array.
UPDATE: I updated this post to properly reflect that there is nothing wrong with in_array, but simply that it was not the right solution to this problem. I wrote this late and did not properly express this. Thanks to all those people in the comments that helped explain this.
Johny Says:
Well why do you wan´t to compare the data in PHP Anyway,
you said you would like to disable any data you did not update.
Why do it in PHP if you have a Database.
Give the updated entries an update flag or timestamp.
then
UPDATE your_table SET active_flag='0' WHERE update_timestamp < 'last_update_round'
i think it´s the smarter solution.
Best way to generate an Update Timestamp by starting of the cronjob so there is one time for all updates of one run.