json - Can't Query AWS Athena Presto Table Because of Dash Character in Column name - Stack Overflow

I have a file in S3 with the following contents:{"foo-bar": {"name":"Mercury&

I have a file in S3 with the following contents:

{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foo-bar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

The Glue table definition is:

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

I get null values for the "name" key when I query Athena like so:

select "foo-bar".name from planets_ion_2 

If I remove the dash from the top level key in the S3 json, from "foo-bar" to "foobar", it works fine: {"foobar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foobar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foobar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

So I can't seem to escape the dash. I've tried a lot of things without success:

select `foo-bar`.name from planets_ion_2 
select `foo-bar.name` from planets_ion_2 

Any time I try to escape with backtick, I get the error Queries of this type are not supported

I'm not sure if this a Athena or Presto issue. But thanks in advance for any help.

I have a file in S3 with the following contents:

{"foo-bar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foo-bar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

The Glue table definition is:

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

I get null values for the "name" key when I query Athena like so:

select "foo-bar".name from planets_ion_2 

If I remove the dash from the top level key in the S3 json, from "foo-bar" to "foobar", it works fine: {"foobar": {"name":"Mercury","distanceFromSun":0.39,"orbitalPeriod":0.24,"dayLength":58.65}}{"foobar": {"name":"Venus","distanceFromSun":0.72,"orbitalPeriod":0.62,"dayLength":243.02}}

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foobar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe' 
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

So I can't seem to escape the dash. I've tried a lot of things without success:

select `foo-bar`.name from planets_ion_2 
select `foo-bar.name` from planets_ion_2 

Any time I try to escape with backtick, I get the error Queries of this type are not supported

I'm not sure if this a Athena or Presto issue. But thanks in advance for any help.

Share Improve this question edited Mar 4 at 17:31 Samer A. asked Mar 3 at 17:01 Samer A.Samer A. 658 bronze badges
Add a comment  | 

1 Answer 1

Reset to default 1

You need to add the following after ROW FORMAT SERDE

WITH SERDEPROPERTIES ("ion.foo-bar.path_extractor" = "('foo-bar')")

With the dash in the name it can't find the path to the data. Use the property path_extractor to create an alias for it with single quotes around the key name since it has a dash in it.

So the create table statement would look something like:

CREATE EXTERNAL TABLE `planets_ion_2`(
  `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  'com.amazon.ionhiveserde.IonHiveSerDe'
WITH SERDEPROPERTIES ("ion.foo-bar.path_extractor" = "('foo-bar')")
STORED AS INPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonInputFormat' 
OUTPUTFORMAT 
  'com.amazon.ionhiveserde.formats.IonOutputFormat'
LOCATION
  's3://<some_bucket>/AthenaDataStore/Planets2'
TBLPROPERTIES (
  'transient_lastDdlTime'='1740775321')

You should then be able to query name using

 select "foo-bar".name from from planets_ion_2

Side note - Amazon Ion Hive SerDe is used to query Amazon Ion format data, but can be used to query non-Amazon Ion JSON datasets.

See this note in the link

Because Amazon Ion is a superset of JSON, you can use the Amazon Ion Hive SerDe to query non-Amazon Ion JSON datasets. Unlike other JSON SerDe libraries, the Amazon Ion SerDe does not expect each row of data to be on a single line. This feature is useful if you want to query JSON datasets that are in "pretty print" format or otherwise break up the fields in a row with newline characters.

If you explore the other JSON SerDe libraries you would still need to account for the dash in the key name.

As an example using OpenX JSON SerDe you would have to provide the property like

WITH SERDEPROPERTIES ("mapping.foo-bar" = "foo-bar")

to "remap" that path to account for the dash.

    CREATE EXTERNAL TABLE `planets_ion_2`(
      `foo-bar` struct<name:string,distancefromsun:decimal(38,18),orbitalperiod:decimal(38,18),daylength:decimal(38,18)> COMMENT '')
ROW FORMAT SERDE 
  '.openx.data.jsonserde.JsonSerDe' 
WITH SERDEPROPERTIES ("mapping.foo-bar" = "foo-bar")
    LOCATION
      's3://<some_bucket>/AthenaDataStore/Planets2'

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1745082086a4610191.html

相关推荐

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信