BigQuery's user-friendly SQL: Elevating analytics, data quality, and security
SQL is used by approximately 7 million people worldwide to manage and analyze data on a daily basis. Whether you are a data engineer or analyst, how you manage and effectively use your data to provide business driven insights, has become more important than ever.
BigQuery is an industry leading, fully-managed, cloud data warehouse, that helps simplify the end-to-end analytics experience. It starts from data ingestion, preparation, analysis, all the way to ML training and inference using SQL. Today, we are excited to bring new SQL capabilities to BigQuery that extend our support for data quality, security, and flexibility. These new capabilities include:
Schema operations for better data quality: create/alter views with column descriptions, flexible column name, LOAD DATA SQL statement
Sharing and managing data in a more secure way: authorized stored procedures
Analyzing data with more flexibility: LIKE ANY/SOME/ALL, ANY_VALUE (HAVING), index support for arrays & struct
Extending schema support for better data quality
Here’s an overview of how we’ve extended schema support in BigQuery to make it easier for you to work with your data.
Create/alter views with column descriptions (preview)
We hear from customers that they frequently use views to provide data access to others, and the ability to provide detailed information about what is contained in the columns would be very useful. Similar to column descriptions of tables, we’ve extended the same capability for views. Instead of having to rely on Terraform to precreate views and populate column details, you can now directly create and modify column descriptions on views using CREATE/ALTER Views with Column Descriptions statements.
- [StructValue([(u’code’, u’– Create a view with column description\r\nCREATE VIEW view_name (column_name OPTIONS(description= \u201ccol x\u201d)) AS\u2026\r\n\r\n– Alter a view with column description\r\nALTER VIEW view_name ALTER COLUMN column_name \r\nSET OPTIONS(description=\u201ccol x\u201d)’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca366850>)])]
Flexible column name (preview)
To help you to improve data accessibility and usability, BigQuery now supports more flexibility for naming columns in your preferred international language and using special characters like ampersand (&) and percent sign (%) in the column name. This is especially important for customers with migration needs and international business data. Here is a partial list of the supported special characters:
Any letter in any language
Any numeric character in any language
Any connector punctuation character
Any kind of hyphen or dash
Any character intended to be combined with another character
Example column names:
You can find a full detailed list of the supported characters here.
LOAD DATA SQL statement (GA)
“In the past we mainly used the load API to load data into BigQuery, which required engineer expertise to learn about the API and do configurations. Since LOAD DATA was launched, we are now able to load data with SQL only statements, which made it much simpler, more compact and convenient.” – Steven Yampolsky, Director of Data Engineering, Northbeam
Rather than using the load API or the CLI, BigQuery users like the compatibility and convenience of the SQL interface to load data as part of their SQL data pipeline. To make it even easier to load data into BigQuery, we have extended support for a few new use cases:
- Load data with flexible column name (preview)
- [StructValue([(u’code’, u’LOAD DATA INTO dataset_name.table_name\r\n(`flexible column name \u5217` INT64)\r\nFROM FILES (uris=[\u201cfile_uri\u201d], format=\u201cCSV\u201d);’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18c9040350>)])]
- Load into tables with renamed columns, or columns dropped and added in a short time
- [StructValue([(u’code’, u’–Create a table \r\nCREATE TABLE dataset_name.table_name (col_name_1 INT64);\r\n\r\n–Rename a column in the table \r\nALTER TABLE dataset_name.table_name RENAME COLUMN col_name_1 TO col_name_1_renamed;\r\n\r\n–load data into a table with the renamed column \r\nLOAD DATA INTO dataset_name.table_name\r\n(col_name_1_renamed INT64)\r\nFROM FILES (uris=[\u201cfile_uri\u201d], format=\u201cCSV\u201d);’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18c87b0450>)])]
- Load data into a table partitioned by ingestion time
- [StructValue([(u’code’, u’LOAD DATA INTO dataset_name.table_name\r\n(col_name_1 INT64)\r\nPARTITION BY _PARTITIONDATE\r\nFROM FILES (uris=[\u201cfile_uri\u201d], format=\u201cCSV\u201d);’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca3749d0>)])]
- Load data into or overwrite one selected partition
- [StructValue([(u’code’, u”LOAD DATA INTO dataset_name.table_name\r\nPARTITIONS(_PARTITIONTIME = TIMESTAMP(‘2023-01-01′))\r\n(col_name_1 INT64)\r\nPARTITION BY _PARTITIONDATE\r\nFROM FILES (uris=[\u201cfile_uri\u201d], format=\u201cCSV\u201d);”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca374b50>)])]
Sharing and managing data in a more secure way
Authorized stored procedures (preview)
A stored procedure is a collection of statements that can be called from other queries. If you need to share query results from stored procedures with specific users without giving them read access to the underlying table, the newly introduced authorized stored procedure provides you with a convenient and secure way to share data access.
How does it work?
Data engineers craft specific queries and grant permission on authorized stored procedures for specific analyst groups, who can then run and view query results without the read permission for the underlying table.
Analysts can then use authorized stored procedures to create query entities (tables, views, UDFs, etc.), call procedures, or perform DML operations.
Extended support to analyze data with more flexibility
LIKE ANY/SOME/ALL (preview)
Analysts frequently need to search against business information stored in string columns, e.g., customer names, reviews, or inventory names. Now you can use LIKE ANY/LIKE ALL to check against multiple patterns in one statement. There is no need to use multiple queries with LIKE operators in conjunction with a WHERE clause.
With the newly introduced LIKE qualifiers ANY/SOME/ALL, you can filter rows on fields that match any/or all specified patterns. This can make it more efficient for analysts to filter data and generate insights based on their search criteria.
LIKE ANY (synonym for LIKE SOME): you can filter rows on fields which match any of one or multiple specified patterns
- [StructValue([(u’code’, u”–Filter rows that match any patterns like ‘Intend%’, ‘%intention%’ \r\nWITH Words AS\r\n (SELECT ‘Intend with clarity.’ as value UNION ALL\r\n SELECT ‘Secure with intention.’ UNION ALL\r\n SELECT ‘Clarity and security.’)\r\n SELECT * FROM Words WHERE value LIKE ANY (‘Intend%’, ‘%intention%’);\r\n/*————————+\r\n | value |\r\n +————————+\r\n | Intend with clarity. |\r\n | Secure with intention. |\r\n +————————*/”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca374e10>)])]
LIKE ALL: you can filter rows on fields which match all of the specified patterns
- [StructValue([(u’code’, u”–Filter rows that match all patterns like ‘%ity%’, \u2018%ith\u2019\r\nWITH Words AS\r\n (SELECT ‘Intend with clarity.’ as value UNION ALL\r\n SELECT ‘Secure with identity.’ UNION ALL\r\n SELECT ‘Clarity and security.’)\r\n SELECT * FROM Words WHERE value LIKE ALL (‘%ity%’, \u2018%ith\u2019); \r\n/*———————–+\r\n | value |\r\n +———————–+\r\n | Intend with clarity. |\r\n | Secure with identity. |\r\n +———————–*/”), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca374250>)])]
ANY_VALUE (HAVING MAX | MIN) (GA)
It’s common for customers to query for a value associated with a max or min value in a different column in the same row, e.g., to find the SKU of the best-selling product. Previously, you needed to use a combination of array_agg() and order by (), or last_value() in a window function to get the results, which is more complicated and less efficient, especially when there are duplicate records.
With ANY_VALUE(x HAVING MAX/MIN y), as well as its synonyms MAX_BY and MIN_BY, you can now easily query a column associated with the max/min value of another column, with a much cleaner and readable SQL statement.
Example: find the most recent contract value for each of your customers.
Index support for arrays & struct (GA)
Array is an ordered list of values of the same data type. Currently, to access elements in an array, you can use either OFFSET(index) for zero-based indexes (start counting at 0), or ORDINAL(index) for one-based indexes (start counting at 1). To make it more concise, BigQuery now supports a[n] as a synonym for a[OFFSET(n)]. This makes it easier for users who are already familiar with such array index access conventions.
- [StructValue([(u’code’, u’SELECT\r\n some_numbers,\r\n some_numbers AS index_1, — index starting at 0\r\n some_numbers[OFFSET(1)] AS offset_1, — index starting at 0\r\n some_numbers[ORDINAL(1)] AS ordinal_1 — index starting at 1\r\nFROM Sequences\r\n\r\n/*——————–+———+———-+———–*\r\n | some_numbers | index_1 | offset_1 | ordinal_1 |\r\n +——————–+———+———-+———–+\r\n | [0, 1, 1, 2, 3, 5] | 1 | 1 | 0 |\r\n | [2, 4, 8, 16, 32] | 4 | 4 | 2 |\r\n | [5, 10] | 10 | 10 | 5 |’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca35a6d0>)])]
Struct is a data type that represents an ordered tuple of fields of various data types. Today if there are anonymous fields or duplicate field names, it can be challenging to access field values. Similar to Array, we are introducing OFFSET(index) for zero-based indexes (start counting at 0) andORDINAL(index) for one-based indexes (start counting at 1). With this index support, you can easily get the value of a field at a selected position in a struct.
- [StructValue([(u’code’, u’WITH Items AS (SELECT STRUCT<INT64, STRING, BOOL>(23, “tea”, FALSE) AS item_struct)\r\nSELECT\r\n item_struct AS field_index, — index starting at 0\r\n item_struct[OFFSET(0)] AS field_offset, — index starting at 0\r\n item_struct[ORDINAL(1)] AS field_ordinal — index starting at 1\r\nFROM Items\r\n\r\n/*————-+————–+—————*\r\n | field_index | field_offset | field_ordinal |\r\n +————-+————–+—————+\r\n | 23 | 23 | 23 |\r\n *————-+————–+—————*/’), (u’language’, u”), (u’caption’, <wagtail.wagtailcore.rich_text.RichText object at 0x3e18ca35a750>)])]
More BigQuery features that are now GA
Finally, several BigQuery features have recently moved from preview to GA, and are now fully supported by Google Cloud. These include:
In GA, we have further extended the support to table copies and copy jobs. If you have a table with a column that previously had been renamed or dropped, you can now make a copy of that table by using either CREATE TABLE COPY statement or run a copy job with the most up-to-date schema information.
Case-insensitive string collation
Today, you can compare or sort strings regardless of case sensitivity by specifying ‘und:ci’. This means [A,a] will be treated as equivalent characters and will precede [B. b] for string value operations. In GA, we have extended this support for aggregate functions (MIN, MAX, COUNT DISTINCT), creating views, materialized views, BI engine, and many others. See more detailshere.
We will continue this journey focusing on building user-friendly SQL capabilities to help you load, analyze and manage data in BigQuery in the most efficient way. We would love to hear how you plan to use these features in your day to day. If you have any specific SQL features you want to use, please file a feature request here. To get started, try BigQuery for free.