将JSON文件加载到BigQuery表中时如何管理/处理架构更改
问题内容:
这是我的输入文件的样子:
{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}
{"Id": 2, "Address": {"City":"Mumbai"}}
{"Id": 3, "Address": {"Street":"XYZ Road"}}
{"Id": 4}
{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}
在我的数据流管道中,我如何动态确定每一行中存在哪些字段,以便遵守BigQuery表架构。例如,在第2行中,Street
缺少。我希望Address.Street
BigQuery中的列条目为,"N/A"
或者null
不希望由于架构更改或数据丢失而导致管道失败。
在用Python编写BigQuery之前,如何在数据流作业中处理此逻辑?
问题答案:
我建议只用一种line
类型的字段将数据写入临时表string
将数据导入BigQuery临时表后-现在,您可以应用架构逻辑并将数据从临时表中查询到最终表
以下示例适用于BigQuery标准SQL,该标准SQL如何针对一个字段中整行的表格应用模式逻辑
#standardSQL
WITH t AS (
SELECT '{"Id": 1, "Address": {"Street":"MG Road","City":"Pune"}}' line UNION ALL
SELECT '{"Id": 2, "Address": {"City":"Mumbai"}}' UNION ALL
SELECT '{"Id": 3, "Address": {"Street":"XYZ Road"}}' UNION ALL
SELECT '{"Id": 4} ' UNION ALL
SELECT '{"Id": 5, "PhoneNumber": 12345678, "Address": {"Street":"ABCD Road", "City":"Bangalore"}}'
)
SELECT
JSON_EXTRACT_SCALAR(line, '$.Id') id,
JSON_EXTRACT_SCALAR(line, '$.PhoneNumber') PhoneNumber,
JSON_EXTRACT_SCALAR(line, '$[Address].Street') Street,
JSON_EXTRACT_SCALAR(line, '$[Address].City') City
FROM t
结果如下
Row id PhoneNumber Street City
1 1 null MG Road Pune
2 2 null null Mumbai
3 3 null XYZ Road null
4 4 null null null
5 5 12345678 ABCD Road Bangalore