Thursday 22 August 2013

Howto: Extract all email address from Google Contacts

Let's say you want to extract all email address from your contacts in Google Contacts, and export it somewhere else. Yes you can just export the CSV file, and then upload. The other service will do the import and cleanup for you. But, if you just want to share the email address, why give them other info that they are not suppose to have?

OK. Here are the steps to only extract email address using command line. I'm using Cygwin, but it should be similar if you are using Linux or other Unix-based operating system.

Follow this steps:

1. Select Contacts in Gmail


2. Your contacts will be shown. Click More > Export


3. Select All Contacts and Google CSV format, then click Export


4. Save the file somewhere. I save it in C:\google.csv . If you are using Cygwin, the file is accessible using the path \cygdrive\c\google.csv

5. Now come the interesting part, to extract the email address. The data is separated by comma, and we don't really know which column holds the email address. So we must iterate all column, and extract anything that resembles an email address. Here is the command:

$ grep @ google.csv | awk -F, '{for(i=1;i<=NF;i++) if ($i ~ /@.*\./) {printf "%s\n", $i};}' | awk -F" ::: " '{for (i=1;i<=NF;i++) {print $i};}' |sort |uniq

6. Let's break the command apart

7. grep @ google.csv

This command will get line that contain "@" character. The result is lines that contain "@", with multiple column, separated by comma ","

8. awk -F, '{for(i=1;i<=NF;i++) if ($i ~ /@.*\./) {printf "%s\n", $i};}'

This command let awk know that the field separator is comma (-F,). It will loop through all the field (for(i=1;i<=NF;i++)). If that field match an email pattern ($i ~ /@.*\./), it will print that field.

9. awk -F" ::: " '{for (i=1;i<=NF;i++) {print $i};}'

Some of the field will have multiple email address separated by " ::: ", because it groups the email address together. This command will split the field using " ::: " separator (-F" ::: ") then loop through each field, and print each of them

10. sort

This command will sort the output

11. uniq

This command will remove any duplicates.

12. In the end you will get a list of emails. But, you must understand that the output might not be 100% clean. Some of your contact might put their email with their name, or the note area of your contact might contain additional information that resembles email. You need to clean up your output, but the effort will be small.