PHP Appalachia Corrections
Tue, Oct 14, 2008 11:03 PM
Just got home finally from PHP Appalachia. I enjoyed meeting
all the great people.
I presented about what I learned and how we deal with importing
large amounts of CSV data into MySQL. I threw my idea onto
the wiki at the last minute, made the slides while everyone ate
breakfast and I had planned on researching it all (been a few years
since I wrote it), but we had no reliable internet. Some
claims I made and their corrections.
-
- I said our largest file is about 1.8 million lines.
WRONG. Actually it is about 4.6 million. I was
correct however that it does finish importing and indexing in
about 5 minutes.
-
- I claimed I LOAD DATA INFILE to MyISAM first and then
"insert into ... select from" into an InnoDB table for speed
reasons. WRONG. In fact, I do that because I need
to merge fields from the file sometimes into one field in the
databaes. I could not find a way to do that with LOAD
DATA INFILE. As to speed. I can't say either way as
I have no solid data. Sounds like a good test.
MyISAM probably still wins on a LOAD DATA INFILE into a blank,
fresh table based on my experience.
-
- Total rows currently indexed is 7.2 million. I did
not make a claim, but I thought I would just mention
that. I wanted to include that, but did not have
Internet. (Damn you Hughes)
-
Comments are disabled for this post.
David Weingart Says:
Regarding #2, I've recently been working on a project that loads CSV data from various sources into MySQL, and I discovered that starting with version 5 you can use user variables to customize how columns are imported. You are able to discard columns or merge columns on the fly.
Check the manual for LOAD DATA INFILE for more details.