Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
1.3k views
in Technique[技术] by (71.8m points)

scala - How to let Spark parse a JSON-escaped String field as a JSON Object to infer the proper structure in DataFrames?

I have as input a set of files formatted as a single JSON object per line. The problem, however, is that one field on these JSON objects is a JSON-escaped String. Example

{
  "id":1,
  "name":"some name",
  "problem_field": "{"height":180,"weight":80,}",
}

Expectedly, when using sqlContext.read.json it will create a DataFrame with with the 3 columns id, name and problem_field where problem_field is a String.

I have no control over the input files and I'd prefer to be able to solve this problem within Spark so, Is there any way where I can get Spark to read that String field as JSON and to infer its schema properly?

Note: the json above is just a toy example, the problem_field in my case would have variable different fields and it would be great for Spark to infer these fields and me not having to make any assumptions about what fields exist.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)

Would that be acceptable solution?

val sc: SparkContext = ...
val sqlContext = new SQLContext(sc)

val escapedJsons: RDD[String] = sc.parallelize(Seq("""{"id":1,"name":"some name","problem_field":"{"height":180,"weight":80}"}"""))
val unescapedJsons: RDD[String] = escapedJsons.map(_.replace(""{", "{").replace(""}", "}").replace(""", """))
val dfJsons: DataFrame = sqlContext.read.json(unescapedJsons)

dfJsons.printSchema()

// Output
root
|-- id: long (nullable = true)
|-- name: string (nullable = true)
|-- problem_field: struct (nullable = true)
|    |-- height: long (nullable = true)
|    |-- weight: long (nullable = true)

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...