mysqldump-to-csv
https://stackoverflow.com/a/28168617/895245 pointed me to https://github.com/jamesmishra/mysqldump-to-csv which semi hackily converts the dumps to CSV, which allows one to bypass the MySQL slowness:
git clone https://github.com/jamesmishra/mysqldump-to-csv
cd mysqldump-to-csv
git checkout 24301dfa739c13025844ed3ff5a8abe093ced6cc
patch <<'EOF'
diff --git a/mysqldump_to_csv.py b/mysqldump_to_csv.py
index b49cfe7..8d5bb2a 100644
--- a/mysqldump_to_csv.py
+++ b/mysqldump_to_csv.py
@@ -101,7 +101,8 @@ def main():
# listed in sys.argv[1:]
# or stdin if no args given.
try:
- for line in fileinput.input():
+ sys.stdin.reconfigure(errors='ignore')
+ for line in fileinput.input(encoding="utf-8", errors="ignore"):
# Look for an INSERT statement and parse it.
if is_insert(line):
values = get_values(line)
EOF
Then I add csvtool as per How to extract one column of a csv file + awk to filter the columns:
zcat enwiki-latest-page.sql.gz | python mysqldump-to-csv/mysqldump_to_csv.py
You can then use the CSV with several standard tools like csvtool
, e.g.: