PHP Appalachia Corrections

Just got home finally from PHP Appalachia. I enjoyed meeting all the great people.

I presented about what I learned and how we deal with importing large amounts of CSV data into MySQL. I threw my idea onto the wiki at the last minute, made the slides while everyone ate breakfast and I had planned on researching it all (been a few years since I wrote it), but we had no reliable internet. Some claims I made and their corrections.

I said our largest file is about 1.8 million lines. WRONG. Actually it is about 4.6 million. I was correct however that it does finish importing and indexing in about 5 minutes.
I claimed I LOAD DATA INFILE to MyISAM first and then "insert into ... select from" into an InnoDB table for speed reasons. WRONG. In fact, I do that because I need to merge fields from the file sometimes into one field in the databaes. I could not find a way to do that with LOAD DATA INFILE. As to speed. I can't say either way as I have no solid data. Sounds like a good test. MyISAM probably still wins on a LOAD DATA INFILE into a blank, fresh table based on my experience.
Total rows currently indexed is 7.2 million. I did not make a claim, but I thought I would just mention that. I wanted to include that, but did not have Internet. (Damn you Hughes)

PHP Appalachia Corrections

1 comments

David Weingart Says: