3 Essential Questions to Address When Building an API-Involved Incremental Data Loading Script | by Daniel Khoa Le

Now let’s say you already extracted a bunch of data by making API requests with the above-mentioned params, it’s time so that you can determine the way you need to write them to the vacation spot desk.

👉 Reply: Merge/Dedup mode (really useful)

This query considerations the selection of Write disposition or Sync mode. The instant reply is that, given you wish to load your information incrementally, you’ll probably choose to write down your extracted information in both append mode or merge mode (often known as deduplication mode).

Nonetheless, let’s step again to look at our choices extra intently and decide which technique is greatest fitted to incremental loading.

Listed below are the favored write tendencies.

🟪 overwrite/change: drop all present data within the vacation spot tables after which insert the extracted data.
🟪 append: merely append extracted data to the vacation spot tables.
🟪 merge / dedup: insert new(*) data and replace(**) present data.

(*) How do we all know which data are new?: Normally, we’ll use a main key to find out that. For those who use dlt, their merging technique might be extra subtle than that, together with the excellence between merge_key and primary_key (one is used for merging and one is used for dedupication earlier than merging) or dedup_sort (which data are to be deleted with the identical key within the dedup course of). I’ll go away that half for one more tutorial.

(**) This can be a easy rationalization, if you wish to discover out extra about how dlt handles this merging technique, learn extra here.

👁️👁️ Right here is an instance to assist us perceive the outcomes of various write tendencies.

↪️ On 2024.06.19: We make the primary sync.

🅰️ Information in supply software️️

🅱️ ️Information loaded to our vacation spot database

It doesn’t matter what sync technique you select, the desk on the vacation spot is actually a replica of the supply desk.

Saved state of updated_at= 2024–06–03, which is the most recent updated_at mong the two data we synced.

↪️ On 2024.06.2: We make the second sync.

🅰️ ️️️️️️️Information in supply software

✍️ Adjustments within the supply desk:

Document id=1 was up to date (gross sales determine).
Document id=2 was dropped.
Document id=3 was inserted.

At this sync, we ONLY extract data with the updated_at> 2024–06–03 (state saved from final sync). Subsequently, we’ll extracted solely document id=1 and id=3. Since document id=2 was faraway from the supply information, there isn’t any method for us to acknowledge this variation.

With the second sync, you now will see the distinction among the many write methods.

🅱️ Information loaded to our vacation spot database

❗ Situation 1: Overwrite

The vacation spot desk will likely be overwritten by the two data extracted this time.

❗ Situation 2: Append

The two extracted data will likely be appended to the vacation spot desk, the prevailing data will not be affected.

❗ Situation 3: Merge or dedup

The two extracted data with id=1 and three will change the prevailing data at vacation spot. This processing is so known as merging or deduplicating. Document id=2 within the vacation spot desk stays intact.

🟢 Takeaways: The merge (dedup) technique might be efficient within the incremental information loading pipeline, but when your desk could be very massive, this dedup course of would possibly take a substantial period of time.

Source link

The Invisible Revolution: How Vectors Are (Re)defining Business Success | by Felix Schmidt | Jan, 2025

Great Books for AI Engineering. 10 books with valuable insights about… | by Duncan McKinnon | Jan, 2025

AI Ethics for the Everyday User — Why Should You Care? | by Murtaza Ali | Jan, 2025

Despite return, Rams should still prepare for future without Stafford

New Coin Listing – Sealana Crypto Presale Hits $5 Million, 24 Hours Left

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

Nigeria not an easy place for startups

Best AI Nude Generators Revealed (2024)

Our Picks

How to Build Your Own AI: Creating an LLM from Scratch 🤯 | by Leo Anello 💡 | Jan, 2025

Gun Manufacturer Smith & Wesson Thanks Elon Musk for Supporting Free Speech After Being Banned on Facebook | The Gateway Pundit

Joel Embiid’s latest setback is a hammer blow to 76ers

Most Popular

Despite return, Rams should still prepare for future without Stafford

New Coin Listing – Sealana Crypto Presale Hits $5 Million, 24 Hours Left

Financial Peace University vs. True Financial Freedom vs. Crown Financial MoneyLife

3 Essential Questions to Address When Building an API-Involved Incremental Data Loading Script | by Daniel Khoa Le | Jun, 2024

👉 Reply: Merge/Dedup mode (really useful)

Related Posts