PHP Appalachia Corrections
Tue, Oct 14, 2008 11:03 PM
Just got home finally from PHP Appalachia. I enjoyed meeting
all the great people.
I presented about what I learned and how we deal with importing large amounts of CSV data into MySQL. I threw my idea onto the wiki at the last minute, made the slides while everyone ate breakfast and I had planned on researching it all (been a few years since I wrote it), but we had no reliable internet. Some claims I made and their corrections.
I presented about what I learned and how we deal with importing large amounts of CSV data into MySQL. I threw my idea onto the wiki at the last minute, made the slides while everyone ate breakfast and I had planned on researching it all (been a few years since I wrote it), but we had no reliable internet. Some claims I made and their corrections.
-
- I said our largest file is about 1.8 million lines. WRONG. Actually it is about 4.6 million. I was correct however that it does finish importing and indexing in about 5 minutes.
-
- I claimed I LOAD DATA INFILE to MyISAM first and then "insert into ... select from" into an InnoDB table for speed reasons. WRONG. In fact, I do that because I need to merge fields from the file sometimes into one field in the databaes. I could not find a way to do that with LOAD DATA INFILE. As to speed. I can't say either way as I have no solid data. Sounds like a good test. MyISAM probably still wins on a LOAD DATA INFILE into a blank, fresh table based on my experience.
-
- Total rows currently indexed is 7.2 million. I did not make a claim, but I thought I would just mention that. I wanted to include that, but did not have Internet. (Damn you Hughes)
-

David Weingart Says:
Regarding #2, I've recently been working on a project that loads CSV data from various sources into MySQL, and I discovered that starting with version 5 you can use user variables to customize how columns are imported. You are able to discard columns or merge columns on the fly.<br />
<br />
Check the manual for LOAD DATA INFILE for more details.