Spark read csv with semicolon inside cell - Stack Overflow

My spark application read csv file with following options:sparkSession.read.format("com.databricks

My spark application read csv file with following options:

   sparkSession.read
  .format("com.databricks.spark.csv")
  .option("quote", "\ufffd")
  .option("delimiter", ";") 
  .load("/sample.csv")

I use "\ufffd" as a "quote"-value to avoid problem with data values, when quotes appear to be a part of value.

For example, without ("quote", "\ufffd")-option, this sample:

 value1;"value2;value3

will be read as:

+--------+---------------+ 
| value1 | value2;value3 |

instead of:

 +--------+--------+--------+
 | value1 | value2 | value3 |

But, when the semicolon mark is inside value (separated by quotes), I faced with new problem: the cell will be devided on to values

So, this sample:

value1;"val;ue";value3

will be read as:

+--------+-----+------+--------+
| value1 | "val| ue2" | value3 |

 

Is there any way in Spark APi to read csv, both with quotes, and semicolons inside cell-values, when semicolon is set as s delimiter?

发布者:admin,转转请注明出处:http://www.yc00.com/questions/1744849048a4597015.html

相关推荐

  • Spark read csv with semicolon inside cell - Stack Overflow

    My spark application read csv file with following options:sparkSession.read.format("com.databricks

    2天前
    50

发表回复

评论列表(0条)

  • 暂无评论

联系我们

400-800-8888

在线咨询: QQ交谈

邮件:admin@example.com

工作时间:周一至周五,9:30-18:30,节假日休息

关注微信