dimanche 28 janvier 2018

How to get only unique(not taking for account all headings) rows from csv?

I'm wondering if there's a simple idea for that problem:

I have a csv file with 30+ columns to be imported to db. There will be lots of rows(10.000+). I need to filter duplicates, but only taking to account few columns.

So, having for example:

 email;name;surname;male
1@example.com;name1;surname1;woman
1@example.com;name1;surname2;woman
2@example.com;name2;surname2;man
3@example.com;name3;surname6;woman
3@example.com;name3;surname1;man
4@example.com;name7;surname2;man
4@example.com;name4;surname1;man
4@example.com;name4;surname2;man

I need to take only rows with not duplicated columns 'email' and 'name'(only first ones). So expected result is:

   [[email => '1@example.com, 'name' => name1, surname => surname1, male => woman],
   ['email => '2@example.com, 'name' => name2, surname => surname2, male => man],
   ['email => '3@example.com, 'name' => name3, surname => surname6, male => woman],
   ['email => '4@example.com, 'name' => name7, surname => surname2, male => man],
   ['email => '4@example.com, 'name' => name4, surname => surname1, male => man]]                ]

I'll be really grateful for sharing your ideas.



via Chebli Mohamed

Aucun commentaire:

Enregistrer un commentaire