Hello friend! This is a newsletter for builders.
What do you like to build?
I hope the ideas in this week’s newsletter can help you get it done.
In this weeks newsletter
🤔 Your mind garden
💻 Tutorial: Checking for missing timeseries data with SQL
📜 Quote of the week
Your mind garden
I love coming up with ideas.
As some points in my life I’ve used a journal to save them. In university I would jot them down on sticky notes and post them on my wall. More recently I’ve been using Google Keep.
I also love taking notes, and I’ve been learning about how insanely valuable a practice that can be.
Now comes the interesting part - making connections. You can picture a detective building that classic “crime web wall” on TV, a graph database with nodes and edges, or even hyperlinks in the world wide web!
I was inspired to start building my idea / knowledge graph in some tangible way, as a means cultivating the “garden in my mind”
The hard truth is that taking notes and coming up with ideas is the easy part.
99% of the ideas I’ve jotted down over the years have gone nowhere and been lost or forgotten
My hope is that in cultivating my mind garden, I might be able to connect ideas to my knowledge graph, build on past inspiration, and ultimately decide which ones are worth pursuing.
Tutorial: Checking for missing timeseries data with SQL
I noticed some missing data today and went to work backfilling it.
Most of my pipelines have 2 versions in Airflow. One runs daily and does the usual thing - pulling recent data and updating the relevant tables, while the other runs on a manual trigger and references Airflow variables to know which days to backfill.
When I noticed the missing data I was looking at a chart on a dashboard and saw the lines branching down to zero in that classic unnatural “something is terribly wrong” kinda way.
Now I could tell from that chart that a few days were missing, but I know that where there’s smoke there’s fire. In other words, other similar data sources could be f*#&ed up as well.
So I hopped over to my SQL database tool and ran this query on all the related datasets:
select * from ( with date_range as ( select generate_series( '2020-01-01', '2020-10-10', interval '1 day' ) date ) select date_range.date, count(tab.*) as cnt from target_table tab right join date_range on tab.date = date_range.date group by date_range.date order by date_range.date asc ) tab2 where tab2.cnt = 0;
What this does is
Generate an array of days between
Merge that with the
Count the number of rows that get merged from
target_tablefor each date
Filter on days that have 0 row counts, i.e. missing data
Quote of the week
“When you arise in the morning, think of what a precious privilege it is to be alive - to breathe, to think, to enjoy, to love.”
Do you agree with this? I don’t think of life as a privilege but rather a condition underlying my consciousness.
I believe gratitude is the most natural path to happiness. Regardless of your current situation, there are things that you can be grateful for.
Aurelius conveys that idea in the most basic sense, identifying 4 things we can be grateful for every day: drawing breath, thinking, enjoying and expressing love.
Subscribe the Datapipe weekly newsletter ⬇️