Datapipe Weekly #4

Mind gardens, life and timeseries'

Hello friend! This is a newsletter for builders.

What do you like to build?

I hope the ideas in this week’s newsletter can help you get it done.

In this weeks newsletter

  • 🤔 Your mind garden

  • 💻 Tutorial: Checking for missing timeseries data with SQL

  • 📜 Quote of the week


Your mind garden

I love coming up with ideas.

As some points in my life I’ve used a journal to save them. In university I would jot them down on sticky notes and post them on my wall. More recently I’ve been using Google Keep.

I also love taking notes, and I’ve been learning about how insanely valuable a practice that can be.

Put simply:

Now comes the interesting part - making connections. You can picture a detective building that classic “crime web wall” on TV, a graph database with nodes and edges, or even hyperlinks in the world wide web!

I was inspired to start building my idea / knowledge graph in some tangible way, as a means cultivating the “garden in my mind”


The hard truth is that taking notes and coming up with ideas is the easy part.

99% of the ideas I’ve jotted down over the years have gone nowhere and been lost or forgotten

My hope is that in cultivating my mind garden, I might be able to connect ideas to my knowledge graph, build on past inspiration, and ultimately decide which ones are worth pursuing.


Tutorial: Checking for missing timeseries data with SQL

I noticed some missing data today and went to work backfilling it.

Most of my pipelines have 2 versions in Airflow. One runs daily and does the usual thing - pulling recent data and updating the relevant tables, while the other runs on a manual trigger and references Airflow variables to know which days to backfill.

When I noticed the missing data I was looking at a chart on a dashboard and saw the lines branching down to zero in that classic unnatural “something is terribly wrong” kinda way.

Now I could tell from that chart that a few days were missing, but I know that where there’s smoke there’s fire. In other words, other similar data sources could be f*#&ed up as well.

So I hopped over to my SQL database tool and ran this query on all the related datasets:

select * from (
  with date_range as (
    select generate_series(
      '2020-01-01', '2020-10-10', interval '1 day'
    ) date
  )
  select
    date_range.date, count(tab.*) as cnt
  from target_table tab
    right join date_range
    on tab.date = date_range.date
  group by date_range.date
  order by date_range.date asc
) tab2
where tab2.cnt = 0;
 

What this does is

  • Generate an array of days between 2020-01-01 and 2020-10-10

  • Merge that with the target_table

  • Count the number of rows that get merged from target_table for each date

  • Filter on days that have 0 row counts, i.e. missing data


Quote of the week

“When you arise in the morning, think of what a precious privilege it is to be alive - to breathe, to think, to enjoy, to love.”
-Marcus Aurelius

Do you agree with this? I don’t think of life as a privilege but rather a condition underlying my consciousness.

However,

I believe gratitude is the most natural path to happiness. Regardless of your current situation, there are things that you can be grateful for.

Aurelius conveys that idea in the most basic sense, identifying 4 things we can be grateful for every day: drawing breath, thinking, enjoying and expressing love.

-Alex


Subscribe the Datapipe weekly newsletter ⬇️