HomeОбразованиеRelated VideosMore From: itversity

Apache Spark SQL - loading and saving data using the JSON & CSV format

49 ratings | 23730 views
Connect with me or follow me at https://www.linkedin.com/in/durga0gadiraju https://www.facebook.com/itversity https://github.com/dgadiraju https://www.youtube.com/c/TechnologyMentor https://twitter.com/itversity
Html code for embedding videos on your blog
Text Comments (24)
itversity (1 year ago)
For any technical discussions or doubts, please use our forum - discuss.itversity.com For practicing on state of the art big data cluster, please sign up on - labs.itversity.com Lab is under free preview until 12/31/2016 and after that subscription charges are 14.99$ per 31 days, 34.99$ per 93 days and 54.99$ per 185 days
ravi kiran (1 year ago)
Video was good to understand but where you saving the data? I see the output on console. May I know how to save the generated output in different formats like in Table, parquet, Avro,..... It will be useful for us...Thanks
Anitha Yannam (1 year ago)
if we are adding com.databricks:spark-csv_2.11:1.5.0. jar file explicitly then in which path we have to add this as currently my pom.xml is showing error and also when i tried to run through the shell
karthik golagani (1 year ago)
Anitha Yannam what is the error?
jeremyz23 (2 years ago)
Thanks for the video! Can you provide a link to your git repository?
jeremyz23 (2 years ago)
+itversity Thank you! You are the best!
itversity (2 years ago)
Lauma (2 years ago)
How would this look in Spark 2.0.0? val NyseDF = sqlc.load("com.databricks.spark.csv", Map("path" -> args(0), "header" -> "true")) NyseDF.registerTempTable("NYSE") NyseDF.printSchema() It does have problems with "load" part...
Lauma (2 years ago)
load(String source, scala.collection.immutable.Map<String,String> options) Deprecated. As of 1.4.0, replaced by read().format(source).options(options).load(). I think this is more specific.
Moncef Ansseti (2 years ago)
Thank you for the Video . Can Spark SQL and Dataframes cache the Table In Memory ? Your reply is much appreciated .Thank you
itversity (2 years ago)
+Moncef Ansseti Spark creates in memory structure called RDD whether you use spark transformations, Spark SQL or dataframes. In Spark cache is different concept.
Vijay Das (2 years ago)
Thanks for the video....Does Spark able to recognize formatted JSON ? In your example each line in the input file (person.json) represents a record in JSON format...what if the JSON file is a formatted one where each record is splitted into multiple lines. Can Spark or HIVE recognize formatted JSON file? If no, how can we convert formatted JSON file to lines each of which represents record in JSON format ? Your reply is much appreciated. Thanks again...
Vijay Das (2 years ago)
Thank you for clarifying :)
itversity (2 years ago)
Yes, you can see it here. http://spark.apache.org/docs/latest/sql-programming-guide.html#json-datasets
Vijay Das (2 years ago)
Ok...Thank you for the confirmation...Most of the JSON files are in formatted structure. Seems like we need extra effort to make it processable by Hive/Spark :(
itversity (2 years ago)
No, I think you need to have one json record per line. Spark documentation explicitly say it.
Vijay Das (2 years ago)
the second format is not showing up well- It is - {"firstName":"John", "lastName":"Doe"} {"firstName":"Anna", "lastName":"Smith"} {"firstName":"Peter", "lastName":"Jones"}
Rushikesh Veni (2 years ago)
I did not see options to write to CSV files anywhere in the video ..
kumar reddy (2 years ago)
If we take text file instead of loading json or csv?its not allowing to load the data?
kumar reddy (2 years ago)
Hi, Could you provide the example for join between stream data and static data using streamsql please?
Vivekanand sahay (3 years ago)
i am not able to understand where you are loaded the file, is it in any database
itversity (3 years ago)
+Vivekanand sahay : The files are present in src/test/resources folder

Would you like to comment?

Join YouTube for a free account, or sign in if you are already a member.