Just got home finally from PHP Appalachia.  I enjoyed meeting all the great people.

I presented about what I learned and how we deal with importing large amounts of CSV data into MySQL.  I threw my idea onto the wiki at the last minute, made the slides while everyone ate breakfast and I had planned on researching it all (been a few years since I wrote it), but we had no reliable internet.  Some claims I made and their corrections.

  1. I said our largest file is about 1.8 million lines.  WRONG.  Actually it is about 4.6 million.  I was correct however that it does finish importing and indexing in about 5 minutes.

  2. I claimed I LOAD DATA INFILE to MyISAM first and then "insert into ... select from" into an InnoDB table for speed reasons.  WRONG.  In fact, I do that because I need to merge fields from the file sometimes into one field in the databaes.  I could not find a way to do that with LOAD DATA INFILE.  As to speed.  I can't say either way as I have no solid data.  Sounds like a good test.  MyISAM probably still wins on a LOAD DATA INFILE into a blank, fresh table based on my experience.

  3. Total rows currently indexed is 7.2 million.  I did not make a claim, but I thought I would just mention that.  I wanted to include that, but did not have Internet.  (Damn you Hughes)