Datapipe Weekly #16

Hello friend! This is a newsletter for builders.

What do you like to build?

I hope the ideas in this week’s newsletter can help you get it done.

In this weeks newsletter

  • 💻 Trick: Checking number of rows in CSV zipfile

  • 📜 Quote of the week


Trick: Checking number of rows in CSV zipfile

At work we have SFTP client data feeds which I ingest to our data warehouse using Airflow. For one such feed, the client sends daily CSV files in zip format.

If I do

ls -lrt

(a command I find myself running multiple times a day in one terminal or another)

then I get something like this:

Nov  8 16:03 seo_20201107.zip
Nov  9 16:19 seo_20201108.zip
Nov 10 16:03 seo_20201109.zip

Today I wanted to know how many lines are in one of these zipfiles. In the past I had figured this out by unzipping the file and then doing a wc -l on the result.

But there must be a better way, I thought. And I was right.

Here’s the one-liner using seo_20201107.zip as an example:

>>> unzip -p seo_20201107.zip | wc -l
222902

The -p option tells zip to unzip the file to stdout (the screen), and then I pipe that into the word count utility wc and add the usual -l flag to get number of lines.

In this case it’s 222,902. This is roughly equivalent (+1 or 2, for the header and newline char at the end of the file) to the number of rows in the CSV.


Quote of the week

“If all you have is a hammer, everything looks like a nail”
- Abraham Maslow

This quote reminds me to focus on solutions 1st and tools 2nd.

Last week at work a request was made to report on fiscal timeframes, instead of calendar months.

I brainstormed a dimensional model (fiscal date would be handled in a dimension table to compliment the records in the fact table) with my colleague and we started thinking about the data sources we needed to fetch and warehouse.

But today we found a solution in our front-end BI tool that would provide equivalent benefit to the client, at significantly reduced time investment from us on the Data Engineering team.

I like my hammer. I love airflow and Python. But it’s not all I have, and there’s more than just nails out there.

-Alex


Thank you for reading Datapipe 👋


Subscribe the Datapipe weekly newsletter ⬇️