Read CSV File by Spark

—————Spark Interview Questions————
📕How to read a csv file in spark?
Method 1:
—————
spark.read.csv(“path”)
df=spark.read.csv(“dbfs:/FileStore/small_zipcode.csv”)
df.show()

—+——-+——–+——————-+—–+———-+
|_c0| _c1| _c2| _c3| _c4| _c5|
+—+——-+——–+——————-+—–+———-+
| id|zipcode| type| city|state|population|
| 1| 704|STANDARD| null| PR| 30100|
| 2| 704| null|PASEO COSTA DEL SUR| PR| null|
| 3| 709| null| BDA SAN LUIS| PR| 3700|
| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|
| 5| 76177|STANDARD| null| TX| null|
+—+——-+——–+——————-+—–+———-+
Method 2 :
————–
df=spark.read.format(“csv”).option(“inferSchema”,True).option(“header”,True).option(“sep”,”,”).load(“dbfs:/FileStore/small_zipcode.csv”)
df.show()
+—+——-+——–+——————-+—–+———-+
| id|zipcode| type| city|state|population|
+—+——-+——–+——————-+—–+———-+
| 1| 704|STANDARD| null| PR| 30100|
| 2| 704| null|PASEO COSTA DEL SUR| PR| null|
| 3| 709| null| BDA SAN LUIS| PR| 3700|
| 4| 76166| UNIQUE| CINGULAR WIRELESS| TX| 84000|
| 5| 76177|STANDARD| null| TX| null|
+—+——-+——–+——————-+—–+———-+
if we put option(“header”,False) then it will not take first row as header

df=spark.read.format(“csv”).option(“inferSchema”,True).option(“header”,False).option(“sep”,”,”).load(“dbfs:/FileStore/small_zipcode.csv”)
df.show()

Leave a comment

Create a website or blog at WordPress.com

Up ↑

Design a site like this with WordPress.com
Get started