DEV Community

Cover image for Streaming Data in Databricks Delta Tables

Streaming Data in Databricks Delta Tables

Will Velida on July 23, 2018

Databricks Delta uses both Apache Spark and Databricks File System (DBFS) to provide a transactional storage layer that can do incredible things fo...
Collapse
 
swatiarora0208 profile image
Swati Arora • Edited

Hi Will,
Thanks for amazing write up.
But I am facing an issue while executing cmd:
tweets.write.format("delta").mode("append").saveAsTable("tweets")
for the first time the data is stored in delta table, but executing it again gives me the error:
"org.apache.spark.sql.AnalysisException: Cannot create table ('default.tweets'). The associated location ('dbfs:/user/hive/warehouse/tweets') is not empty.;
"

How can I make sure the data is continuously getting stored in table format as well.

Thanks in advance

Collapse
 
muhammadbilalgits profile image
Muhammad Bilal Shafqat

Hey Will nice post, well I think, I would directly write data to delta table instead of writing it first to parquet files because if I will write them as parquet and then read them in delta table then only first time row present in parquet files on DBFS will get ingested into table, and rows coming after that they will not get ingested into table and I would have to manually run read.saveAsTable() to get them ingested into delta table, Try it, Please share your thoughts. Thanks.

Collapse
 
parathkumar_sabesan_5e6b profile image
Parath kumar Sabesan

Hi Will,

When I try to write a streaming data to a partitioned managed delta table it's not loading data into it and also it's not showing any error,

but the same thing working fine with non partitioned managed delta table
what I'm missing here ??
dfWrite.writeStream\

.partitionBy("submitted_yyyy_mm")\

.format("delta")\

.outputMode("append")\

.queryName("orders")\

.option("checkpointLocation", orders_checkpoint_path)\

.table(user_db+"."+orders_table)