OSDir


[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Insert from Select - CQL


I’ve never been a big fan of the “COPY” statement.

My preference for stuff like this (though I am definitely in the minority I think!) — particularly for the amount of data you’re talking about — is to use the open source tool “cassandradump” — which is similar to mysqldump but for cassandra.

Original:

My version which changes a number of things to make it a little easier to work with (and more mysqldump-like):

- Updated for Python 3 (only works on Python 3; tested on Python 3.5)
- Added progress bar
- Added default reading/writing to stdin/stdout
- Added ability to prompt for password
- --protocol-version no longer required for C* 2.x+
- Added capability to export/import into a different keyspace
- Split commandline args into two subparsers "import" and "export"
- Added --truncate option
- Added shorter CLI params, similar to mysqldump (-h for --host, -u for --username, -p for --password, etc)

- Max

On Oct 25, 2018, at 5:07 am, Philip Ó Condúin <philipoconduin@xxxxxxxxx> wrote:

Hi Alain,

That is exactly what I did yesterday in the end.  I ran the selects and output the results to a file, I ran some greps on that file to leave myself with just the data rows removing any white space and headers.
I then copied this data into a notepad on my local machine and saved it as a csv.  Luckily the results of the selects were delimited by pipe "|" so I imported the csv into a spreadsheet and was able to separate the values into columns.

From here I was able to build up the insert statements and now have 4K insert statements as a backup.

Thanks a lot for your reply.

Kind regards,
Phil

On Thu, 25 Oct 2018 at 11:59, Alain RODRIGUEZ <arodrime@xxxxxxxxx> wrote:

Does anyone have any ideas of what I can do to generate inserts based on primary key numbers in an excel spreadsheet?

A quick thought:

What about using a column of the spreadsheet to actually store the SELECT result and generate the INSERT statement (and I would probably do the DELETE too) corresponding to each row using the power of the spreadsheet to write the query once and have it for all the partitions with the proper values?

The spreadsheet would then be your backup somehow.

We are a bit far from any Cassandra advice, but that's my first thought on your problem, use the spreadsheet :).
Another option is probably to SELECT these rows and INSERT them into some other Cassandra table (same cluster or not). Here you would have to code it I think (client app of any kind)
This might not a good fit, but just in case, you might want to check at the 'COPY' statement: https://stackoverflow.com/questions/21363046/how-to-select-data-from-a-table-and-insert-into-another-table
I'm not too sure what suits you the best.

C*heers,
-----------------------
Alain Rodriguez - alain@xxxxxxxxxxxxxxxxx
France / Spain

The Last Pickle - Apache Cassandra Consulting

Le mer. 24 oct. 2018 à 12:46, Philip Ó Condúin <philipoconduin@xxxxxxxxx> a écrit :
Hi All,

I have a problem that I'm trying to work out and can't find anything online that may help me.

I have been asked to delete 4K records from a Column Family that has a total of 1.8 million rows.  I have been given an excel spreadsheet with a list of the 4K PRIMARY KEY numbers to be deleted.  Great, the delete will be easy anyway.

But before I delete them I want to take a backup of what I'm deleting before I do, so that if the customer comes along and says they got the wrong numbers then I can quickly restore one or all of them.
I have been trying to figure out how I can generate inserts from a select but it looks like this is not possible.

I'm using centos and Cassandra 2.11

Does anyone have any ideas of what I can do to generate inserts based on primary key numbers in an excel spreadsheet?

Kind Regards,
Phil




--
Regards,
Phil