I think that something unique along the lines of `REGISTER EXTERNAL DATA SOURCE` is probably fine, as it doesn't conflict with existing behaviors of other dialects.
> There is a lot of value in making sure our common operations closely map to the equivalent common operations in other SQL dialects.
We're trying to make opposite points using the same arguments :) A lot of popular dialects make difference between CREATE TABLE and CREATE EXTERNAL TABLE (or similar):
My understanding is that the behavior of create table is somewhat similar in all of the above dialects, from the high-level perspective it usually creates a persistent table in the current storage context (database). That's not what Beam SQL's create table does right now, and my opinion is that it should not be called create table for this reason.
> I think users will be more confused to find that 'CREATE TABLE' doesn't exist then to learn that it might not always create a table.
I think that having CREATE TABLE do something unexpected or not do something expected (or do the opposite things depending on the table type or some flag) is worse than having users look up the correct way of creating a data source in Beam SQL without expecting something we don't promise.
> (For example, a user guessing at the syntax of CREATE TABLE would have a better experience with the error being "field LOCATION not specified" rather than "operation CREATE TABLE not found".)
They have to look it up anyway (what format is location for a Pubsub topic? or is it a subscription?), and when doing so I think it would be less confusing to read that to get data from Pubsub/Kafka/... in Beam SQL you have to do something like `REGISTER EXTERNAL DATA SOURCE` than `CREATE TABLE`.
External tables and schemas don't have a standard approach and I don't have a strong preference between any one from the above.
Adding dev@ back now.
Did we drop the dev list from this on purpose? (I haven't added it back, but we probably should.)
I'm in favor of sticking with the simple 'CREATE TABLE' and 'CREATE SCHEMA' if there is only to be one option. Sticking with those names minimizes both our deviation from other implementations and user surprise. There is a lot of value in making sure our common operations closely map to the equivalent common operations in other SQL dialects. I think users will be more confused to find that 'CREATE TABLE' doesn't exist then to learn that it might not always create a table. This minimizes the overhead of learning our dialect of SQL and maximizes the odds that a user will be able to guess at the syntax of something and have it work. (For example, a user guessing at the syntax of CREATE TABLE would have a better experience with the error being "field LOCATION not specified" rather than "operation CREATE TABLE not found".)
If the goal is clarity of the operation, how about 'REGISTER EXTERNAL DATA SOURCE
' and 'REGISTER EXTERNAL DATA SOURCE PROVIDER
'? Those names remove the ambiguity around the operation creating and the data source being a table.
My preference is to make `EXTERNAL` mandatory and only support `CREATE EXTERNAL TABLE` for existing semantics. My main reasons are:
- user friendliness, matching expectations, readability. Current `CREATE TABLE` is basically a `CREATE EXTERNAL TABLE`. It is confusing to users familiar with SQL who expect that `CREATE TABLE` will actually create a table;
- forward-compatibility. We could potentially support non-external `CREATE TABLE` at some point in the future, whatever semantics it might have. It will be wrong to use the same syntax for external and non-external CREATEs;
I agree that typing extra word each time is not ideal, but my opinion is on the side that readability of code (including SQL) is important (how much time you spend reading / understanding code vs writing it) and we should try to improve it if we can. In case of DDL every non-trivial statement will already have a ton of unavoidable words (field names, types, location, options) so I would argue that adding extra one word would not noticeably reduce your happiness of writing it :) But it would improve readability and reduce ambiguity, which I think is worth it.
I think that making it optional only introduces more confusion (e.g. what's the difference between the two DDL statements without reading the doc?) and would make situation worse.
I prefer to `CREATE EXTERNAL TABLE`. My question is, do you plan to support both `CREATE TABLE` and `CREATE EXTERNAL TABLE`, by making `EXTERNAL` as optional?