top of page

Coffee and Tips Newsletter

Assine nossa newsletter para receber tutoriais Tech, reviews de dispositivos e notícias do mundo Tech no seu email

Nos vemos em breve!

Foto do escritorJP

Understanding Delta Lake Time Travel in 2 minutes


Delta Lake



Delta Lake provides a way to version data for operations like merge, update and delete. This makes transparent how data life cycle inside Delta Lake works it.


For each operation a version will be incremented and if you have a table with multiple operations, different versions of table will be created. Delta Lake offers a mechanism to navigate over the different versions called Time Travel. It's a temporary way to access data from the past.


For this post we're going to use this feature to see different versions of table. Below we have a Delta Table called people that all versions were generated through write operations using append mode.




Current version


When we perform a simple read on a table, the current version is always the must recent one. So, for this scenario, the current version is 2 (two). Note that we don't need to specify which version we want to use because we're not using Time Travel yet.


session.read().format("delta").load("table/people")
                              .orderBy("id").show();


Nothing changes at the moment, let's keep for the next steps.


Working with Time Travel


Here begins how we can work with Time Travel, for the next steps, we'll perform readings on the people table specifying different versions to understand how Time travel works.


Reading Delta table - Version 0 (zero)


Now we're going to work with different versions starting from the 0 (zero) version, let's read the table again but now adding a new parameter, take a look at the code below.

session.read().format("delta")
        .option("versionAsOf", 0)
        .load("table/people")
        .orderBy("id").show();

Notice that we added a new parameter called versionAsOf , this parameter allows us to configure the number of version you want to restore temporarily for a table. For this scenario we configure the reading for the Delta Table version zero (0). This was the first version generated by Delta Lake after write operation.



Reading Delta table - Version 1 (one)


For this last step we're using the version one (1), note that the data from the previous version has been maintained because an append mode was executed.


session.read().format("delta")
        .option("versionAsOf", 1)
        .load("table/people")
        .orderBy("id").show();

Delta lake has a lot of benefits and Time travels allows us flexibility in a Big Data architecture, for more details I recommend see the Delta Lake docs .


 

Books to study and read


If you want to learn more about and reach a high level of knowledge, I strongly recommend reading the following book(s):



AWS Cookbook is a practical guide containing 70 familiar recipes about AWS resources and how to solve different challenges. It's a well-written, easy-to-understand book covering key AWS services through practical examples. AWS or Amazon Web Services is the most widely used cloud service in the world today, if you want to understand more about the subject to be well positioned in the market, I strongly recommend the study.













Well that's it, I hope you enjoyed it.




bottom of page