Datapipe Weekly #25

Redshift IF EXISTS and slowing down

Hello friend! This is a newsletter for builders.

What do you like to build?

I hope the ideas in this week’s newsletter can help you get it done.

In this weeks newsletter

  • 💻 Builders: Redshift IF EXISTS for ad-hoc commands

  • 📜 Philosophers: Quote of the week


Redshift IF EXISTS for ad-hoc commands

If I want to create a table I can run:

CREATE TABLE fav_foods (
  food varchar(256),
  rank int,
  last_eaten date
);

Now- say my application want’s to create this table, but only if it doesn’t exist yet. This can be done very easily:

CREATE TABLE IF NOT EXISTS fav_foods (
  ...
);

I use this quite a lot, usually so that new pipelines can be instantiated from existing templates without manually creating tables for them. It’s also useful when a schema change occurs. I can just drop the old table and let the application create the new one.

Unfortunately IF NOT EXISTS cannot be used on every SQL command. Last week I was getting errors trying to DELETE rows from a table that did not exist. In a perfect world I would simply write DELTE FROM fav_foods IF EXISTS ... but that syntax is not supported.

The solution is issue two queries from the application. The first checks if the table exists and then second runs the delete if that’s the case.

Using my DataBase abstraction class I was able to accomplish this very elegantly actually. The code goes something like this:

class DataBase:
    ...
    def table_exists(self, table_name, table_schema="public"):
        sql = f"""
        select exists (
            select * from information_schema.tables
            where
                table_name = '{table_name}'
                and table_schema = '{table_schema}'
        );
        """
        result = self.run(query=sql, return_vals=True)[0][0]
        if type(result) is not bool:
            raise ValueError(
                'Unexpected return value from exists '
                'query ({})'.format(result)
            )
        return result
    ...

Then in my pipeline I do:

with DataBase() as db:
    ...
    if db.table_exists("fav_foods"):
        db.run("DELETE from fav_foods ...")
    ...

This sidesteps the problem of no IF EXISTS in the DELETE command for Redshift.


Quote of the week

“Anything worth doing is worth doing slowly”
- Mae West

I’ve been pondering this recently.

Doing things slowly allows more time to think and take better action. Living this way we can also move through the world with more elegance and be mindful of the pride we should take in this.

Of course, slow is a relative term. The hummingbird flaps their wings about 50 times a second, whereas the albatross can soar hundreds of meters on the wind without even one flap of their wings.

The point is - slow for you means something different than slow for me. But both of us should probably slow down at least a little bit.

-Alex


Thank you for reading Datapipe 👋


Subscribe the Datapipe weekly newsletter ⬇️