Posts by Category

til

Nix: Streaming gsutil transfers

less than 1 minute read

Cloud Storage supports streaming transfers, which allow you to stream data to and from your Cloud Storage account without requiring that the data first be...

SQL: Calculate percentage of column

less than 1 minute read

```sql Starting with a table. select * from sales; +——-+——+ | rep | sale | +——-+——+ | Bob | 15 | | Sally | 30 | | Peter | 15 | +——-+——+ Use a ...

Python: Get a notification via knockknock

less than 1 minute read

There is a Python library called knockknock that allows you to get a notification when your training is complete or when it crashes during the process. Al...

Nix: Make a noise!

less than 1 minute read

If you’re writing a script, and want it to make a noise to notify you when it’s done, look no further than SoX:

Pandas: Some notes on groupby

less than 1 minute read

The count() aggregation function counts only non-null values. To count all values, whether null or non-null, use size.

Website: How to create drafts in Jekyll

less than 1 minute read

In case you haven’t noticed, I use Jekyll to create my blog from markdown files. And I typically write short posts, mostly TILs like this one. But I am tryin...

Pandas: Three new functions

1 minute read

Towards Data Science on Medium has been a great source of tips so far, and this article is a great example that highlights the following functions new to me.

Git: Dynamic identity

less than 1 minute read

Sometimes I use my terminal for personal work, like this article. In those cases, I like my git commits to use my personal email address and not my work e...

Git: Sort branches by recency

less than 1 minute read

When you type git branch, your branch list is sorted alphabetically by default. This isn’t super helpful. To sort your branches by their last commit date,...

Bash: Run entire shell script as root

less than 1 minute read

Placing sudo in the shebang line of a shell script runs the entire thing as root. Useful for scripts designed to, e.g. automate system upgrades or package...

Nix: Convert reStructuredText to Markdown

less than 1 minute read

I’ve used pandoc a lot before to convert Markdown files to PDFs. I just found out it can also convert reStructuredText text files to Markdown format:

Bash: Escaping strings easily

less than 1 minute read

Enter a line of Bash starting with a # comment, then run !:q on the next line to see what that would be with proper Bash escaping applied.

Python: Get the most of floats

less than 1 minute read

Similar to the int data type, floats also have several additional methods useful in various scenarios:

Python: The SimpleNamespace Utility Class

less than 1 minute read

The SimpleNamespace type from the types library provides an alternative to an empty class (class MyClass: pass) from which one can add and remove attribut...

Mac: Ask user for password via GUI

less than 1 minute read

This function will use AppleScript to present a password entry dialog to make your scripts a little more user friendly:

Nix: Stty - sane terminal settings

less than 1 minute read

Restore sane shell settings, in case your shell session went insane because some script or application turned it into a garbled mess:

Python: Salted Hash

less than 1 minute read

The salt is just a randomly derived bit of data that you prefix or postfix your data with to dramatically increase the complexity of a dictionary atta...

Pandas: Make Data Frame

less than 1 minute read

pandas has a built-in function makeDataFrame() to return a DataFrame containing random floats. Note that this is using the private API, and the exact details...

Pandas: Read Clipboard

less than 1 minute read

The pandas.read_clipboard() method is as simple as it sounds: it reads copy-pasted tabular data and parses it into a Data Frame. For instance, try running...

Pandas: Pipe function

less than 1 minute read

Pandas introduced pipe() starting from version 0.16.2. pipe() enables user-defined methods in method chains.

Sklearn: Tree diagram

less than 1 minute read

The plot_tree() function allows you to create a diagram of steps present in a decision tree model:

Sklearn: Pipeline diagram

less than 1 minute read

Estimators can be displayed with a HTML representation when shown in a jupyter notebook. This can be useful to diagnose or visualize a Pipeline with many ...

Sklearn: Column transformations

less than 1 minute read

The Scikit-learn pipeline has a function called ColumnTransformer which allows you to easily specify which columns to apply the most appropriate preproces...

Pandas: Named Aggregation

1 minute read

pandas>=0.25 supports named aggregation, allowing you to specify the output column names when you aggregate a groupby, instead of renaming. This will be e...

Mac: Remove quarantine flag from app

less than 1 minute read

I encountered an issue with my favorite macos Markdown editor MacDown where macos Catalina was reporting the file as damaged. It turns out that Catalina has ...

Mac: Software Update from the Command Line

less than 1 minute read

There are lots of Terminal commands that you can use to change or update your Mac’s OS. My favorite is this quick tip to download macOS updates and installat...

Science: Light v Sound

less than 1 minute read

Roughly: Light travels about a foot per nanosecond, sound travels about a foot per millisecond. A factor of almost exactly a million. — Colin Wright ...

Git: Blocked ssh port

less than 1 minute read

I mostly clone GitHub and Bitbucket repositories using SSH URLs, so that I can protect this access with an SSH private/public keypair. Unfortunately, some fi...

Git: Stash tips

1 minute read

I’m a big fan of using git stash to shelve some changes in my repository so that I can move on to some other task. Here are some advanced git stash commands ...

TSQL: Basic T-SQL

less than 1 minute read

```sql – Show all databases. select name from master.sys.databases;

Spark: Date Arithmetic with Multiple Columns

less than 1 minute read

Say you have a timestamp column created_at, and an integer column number that represents a number of weeks, how do you use the date_add function to calculate...

Spark: Count number of duplicate rows

less than 1 minute read

To count the number of duplicate rows in a pyspark DataFrame, you want to groupBy() all the columns and count(), then select the sum of the counts for the ro...

Docker: Set Timezone

less than 1 minute read

To set which timezone your docker container should use, add the following to your Dockerfile:

Git: Untrack A File Without Deleting It

less than 1 minute read

Generally when I invoke git rm <filename>, I do so with the intention of removing a file from the project entirely. git-rm does exactly that, removing ...

Git: Stashing Untracked Files

less than 1 minute read

Normally when stashing changes, using git stash, git is only going to stash changes to tracked files. If there are any new files in your project that aren’t ...

Git: Snapshot

less than 1 minute read

To save a snapshot of your current work in git, try this command:

Git: Stashing Only Unstaged Changes

less than 1 minute read

If you have both staged and unstaged changes in your project, you can perform a stash on just the unstaged ones by using the -k flag. The staged changes will...

Git: Two ways of squashing commits

less than 1 minute read

It is handy to squash down your commits before merging your PR with my-new-cool-feature. You can either squash them down by doing an interactive rebase like ...

Git: Show The diffstat Summary Of A Commit

less than 1 minute read

Use the --stat flag when running git show on a commit to see the diffstat summary of that commit. For instance, this is what I get for a recent commit to del...

Git: Interactively Unstage Changes

less than 1 minute read

I often use git add --patch to interactively stage changes for a commit. Git takes me through changes to tracked files piece by piece to check if I want to s...

Git: Undo a Git Mistake

less than 1 minute read

git reflog is a record of your actions in Git. With this command, you can undo almost any Git mistake.

Git: Resetting A Reset

less than 1 minute read

Sometimes we run commands like git reset --hard HEAD~ when we shouldn’t have. We wish we could undo what we’ve done, but the commit we’ve reset is gone forev...

Git: Accessing A Lost Commit

less than 1 minute read

If you have lost track of a recent commit (perhaps you did a reset), you can generally still get it back. Run git reflog and look through the output to see i...

Git: Git Log With Authors

less than 1 minute read

In my never-ending quest to better summarize my work at the end of the day using computers, I discovered today the Git --author flag. It works like this:

Git: Git Log since

less than 1 minute read

At the end of each day, I try to record what I did, to jog my memory during the next morning’s standup. This is a helpful aid:

Git: List Filenames Without The Diffs

less than 1 minute read

The git show command will list all changes for a given reference including the diffs. With diffs included, this can get rather verbose at times. If you just ...

Git: Last Commit A File Appeared In

less than 1 minute read

In my project, I have a README.md file that I haven’t modified in a while. I’d like to take a look at the last commit that modified it. The git log command c...

Git: Ignore Changes To A Tracked File

less than 1 minute read

Files that should never be tracked are listed in your .gitignore file. But what about if you want to ignore some local changes to a tracked file?

Git: Determine The Hash Id For A Blob

less than 1 minute read

Git’s hash-object command can be used to determine what hash id will be used by git when creating a blob in its internal file system.

Git: LFS Track

less than 1 minute read

When you add a new type of large file to your repository, you’ll need to tell Git LFS to track it by specifying a pattern using the git lfs track command:

Git: LFS Pull

less than 1 minute read

You can pull from a Git LFS repository using a normal git pull. No explicit commands are needed to retrieve Git LFS content. However, if the checkout fails f...

Git: LFS Prune

1 minute read

You can delete files from your local Git LFS cache with the git lfs prune command. This will delete any local Git LFS files that are considered ‘old’. An old...

Git: Migrate LFS hosting provider

less than 1 minute read

To migrate a Git LFS repository from one hosting provider to another, you can use a combination of git lfs fetch and git lfs push with the --all option speci...

Git: LFS Fetch

1 minute read

Git LFS typically only downloads the files needed for commits that you actually checkout locally. However, you can force Git LFS to download extra content fo...

Git: LFS Clone

less than 1 minute read

Once Git LFS is installed, you can clone a Git LFS repository as normal using git clone. At the end of the cloning process Git will checkout the default bran...

Git: Clean Up Old Remote Tracking References

less than 1 minute read

After working on a Git-versioned project for a while, you may find that there are a bunch of references to remote branches in your local repository. You know...

Git: Delete Remote Git Tags

less than 1 minute read

Tagging releases with Git is a good idea. In case your tags get off track, here is how you delete a Git tag locally and on a remote:

Git: Diffing With Patience

less than 1 minute read

The default diff algorithm used by Git is pretty good, but it can get mislead by larger, complex changesets. The result is a noisier, misaligned diff output.

Git: Delete All Untracked Files

less than 1 minute read

Git provides a command explicitly intended for cleaning up (read: removing) untracked files from a local copy of a repository.

Git: Checkout Old Version Of A File

less than 1 minute read

When you want to return to a past version of a file, you can reset to a past commit. When you don’t want to abandon a bunch of other changes, this isn’t goin...

Git: Use a file from another branch

less than 1 minute read

Sometimes you just need one file from another branch. Sure you could git cherry-pick but then you’re dealing with commits. That sort of thing gets sticky fas...

Git: Clean Out All Local Branches

less than 1 minute read

Sometimes a project can get to a point where there are so many local branches that deleting them one by one is too tedious. This one-liner can help:

Git: Intent To Add

less than 1 minute read

Git commands like git diff and git add --patch are awesome, but their little caveat is that they only work on files that are currently tracked in the reposit...

Stats: Better descriptive statistics

less than 1 minute read

Instead of the mean, use the median and/or the mode. Instead of the standard deviation, use the mean absolute deviation, the median absolute deviation, ...

Docker: Attach/Detach

less than 1 minute read

To detach from a container, you hold Ctrl and press P, then Q. This only works if the container was started with both -t and -i.

AWS CLI: Modify volume size

less than 1 minute read

Here’s how to modify the size of the volume attached to an EC2 instance “my_ec2”:

Travis: Skip unnecessary builds

less than 1 minute read

Especially when you’re working with a large team with multiple Travis-enabled repositories, you’ll want to avoid running any unnecessary builds. The most ...

Jupyter: Output of all variables

less than 1 minute read

If you try to see the output of more variables without explicitly writing print in front of each, only the last one gets outputted. With this, you get the...

Pandas: sort_index

less than 1 minute read

Dataframes have a new sort_index method to sort a dataframe by index. This is equivalent to the deprecated sort method with the columns argument set to `None.

Pandas: Options

less than 1 minute read

You can use the following functions to interact with the options in pandas:

Pandas: Option Context

less than 1 minute read

If you want to temporarily change pandas options, instead of doing so manually as follows:

Pandas: Speed up merges

less than 1 minute read

You can improve the speed of a merge by first specifying the key column of the merge as the index of your dataframes, and then using join instead of merge:

Pandas: Count number of non-NaN entries

less than 1 minute read

The count() method returns the number of non-NaN values in each column. Similarly, count(axis=1) returns the number of non-NaN values in each row.

CompSci: GUIDs are not strings

less than 1 minute read

GUIDs are not strings. They are numbers. We render them as strings for readability. We should not process them as strings. We should not pass them around ...

AWS CLI: List account aliases

less than 1 minute read

If you want the URL for your sign-in page to contain your company name (or other friendly identifier) instead of your AWS account ID, you can create an al...

Jupyter: High-res plots

less than 1 minute read

You can enable high-resolution plots in jupyter notebook using the following configuration:

Mac: Emoji

less than 1 minute read

Press the Command-Control-space to launch the characters palette. Then click on the Emoji icon in the sidebar on the left side of the Character window. You c...

Mac: Copy files intelligently with ditto

less than 1 minute read

ditto is slightly more advanced but can be advantageous to ‘cp’ for several reasons, as it not only preserves ownership attributes and permissions but als...

Git: Using multiple worktrees

less than 1 minute read

When working with multiple branches at the same time, people clone the whole git repository again.

Travis: Why is my build not running?

less than 1 minute read

Sometimes you push to Travis CI and there is no new build. What to do in such case? Has Travis CI got your commits? Is the branch you were using disabled?...

Mac: Power shortcuts

less than 1 minute read

Control-Command-Power/Eject will reboot the Mac instantly. Command-Option-Control-Power/Eject will shut it down. Command-Shift-Q will log off. Shift...

Python: SpooledTemporaryFile

less than 1 minute read

The tempfile.SpooledTemporaryFile function operates exactly as TemporaryFile() does, except that data is spooled in memory until the file size exceeds the pa...

Python: Private variables

less than 1 minute read

To make Python treat a variable as pseudo-private, follow the convention of putting two underscores (i.e., __) at the beginning of the variable’s name, e.g.:

Python: Print without newline

less than 1 minute read

To print a string without appending the usual newline, use the end parameter of the print function:

Python: Flushing while printing

less than 1 minute read

Since Python 3.3, you can force the normal print() function to flush without the need to use sys.stdout.flush(); just set the flush keyword argument to Tr...

Python: Pickle in Python2 and Python3

less than 1 minute read

The python3 pickle.load() function has optional keyword arguments that can be used to control compatibility support for pickle stream generated by Python 2:

Python: An improved tuple

1 minute read

A downside of plain tuples is that the data you store in them can only be pulled out by accessing it through integer indexes. You can’t give names to individ...

Python: MyPy variable annotations

less than 1 minute read

In Python 3.6, variables (in global, class or local scope) can now have type annotations using either of the following two forms:

Python: Mutable default arguments

1 minute read

One of the most confusing moments for new developers is when they discover how Python treats default arguments in function definitions.

Python: Lambdas as lexical closures

less than 1 minute read

A ‘lexical closure’ is a fancy name for a function that remembers the values from the enclosing lexical scope even when the program flow is no longer in that...

Python: Parallel for loops

1 minute read

Joblib provides a simple helper class to write parallel for loops using multiprocessing. The core idea is to write the code to be executed as a generator ...

Python: Cache function output

1 minute read

Joblib traces parameters passed to a function, and if the function has been called with the same parameters it returns the return value cached on a disk.

Python: Function disassembler

less than 1 minute read

You can use Python’s built-in dis module to disassemble functions and inspect their CPython VM bytecode:

Python: Deep copy a compound object

2 minute read

Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain muta...

Python: Collect garbage

less than 1 minute read

If you have a variable with a large memory footprint, you can force garbage collection using the gc Garbage Collector module:

Python: Class inheritance

less than 1 minute read

You can check for class inheritance relationships with the issubclass() built-in:

AWS CLI: Assuming a role

less than 1 minute read

You can configure the AWS Command Line Interface to use a role by creating a profile for the role in the ~/.aws/config file. The following example shows a ro...

Matplotlib: Get current axis

less than 1 minute read

matplotlib.pyplot.gca(**kwargs) gets the current Axes instance on the current figure matching the given keyword args, or create one.

Nix: Check If A Port Is In Use

less than 1 minute read

The lsof command is used to list open files. This includes listing network connections. This means I can check if a particular port is in use and what proces...

Nix: Check Ubuntu Version

less than 1 minute read

Are you on Ubuntu? Want to know what version (release) of Ubuntu you are using?

Nix: Disk Speed Benchmark

less than 1 minute read

In Linux, the dd command can be used for simple I/O performance measurements as follows:

Nix: CPU Benchmark

less than 1 minute read

dd in conjunction with any stream-processing CPU-intensive program can be used as a simple CPU benchmark!

Nix: Saying Yes

less than 1 minute read

Tired of being prompted for confirmation by command-line utilities? Wish you could blindly respond ‘yes’ to whatever it is they are bugging you about? The ye...

Nix: Max out CPU with Yes

less than 1 minute read

If you want a quick and easy method to max out the usage of a CPU core, just use yes:

Nix: Watch That Program

less than 1 minute read

Have you ever been working in the terminal and found yourself repeating the same command many times? Delegate that work to the computer.

Nix: Monitor System Memory with vmstat

less than 1 minute read

vmstat allows the user to monitor virtual memory statistics such as processes, memory, paging, block IO, traps , disks and cpu activity.

Nix: Duplicate pipe content

less than 1 minute read

To duplicate the content while piping you can use the tee utility. One straightforward and useful example is that tee can be used to write to multiple fil...

Nix: Sort In Numerical Order

less than 1 minute read

By default, the sort command will sort things alphabetically. If you have numerical input though, you may want a numerical sort. This is what the -n flag is ...

Nix: Search Man Page Descriptions

less than 1 minute read

You can use the apropos command with a keyword argument to search for that words occurrence throughout all the man pages on your system. For instance, invoki...

Nix: Search Files Specific To A Language

less than 1 minute read

The ack command makes it easy to narrow the set of searched files to those of a specific programming language. For instance, if you have a rails project and ...

Nix: SSH pipes

less than 1 minute read

One of the benefits of piping is that you can use it over networks and it does wonders for data transfer. Note that half of the command is executed locally, ...

Nix: Killing A Frozen SSH Session

less than 1 minute read

Whenever an SSH session freezes, I usually mash the keyboard in desperation and then kill the terminal session. This can be avoided though. SSH will listen f...

Nix: Merge pdf files

less than 1 minute read

You can use qpdf to merge pdf files into a single file as follow:

Nix: List parent pid with ps

less than 1 minute read

The ps command, which stands for process status, is a great way to find different processes running on a machine. Information like their pid (process id) is ...

Nix: PID Of The Current Shell

less than 1 minute read

$ expands to the process ID of the shell. So, you can see the PID of the current shell with echo $$.

Nix: Printing with lpr

less than 1 minute read

Recently while trying to fix a printer I used lpr a bunch of times. It’s not exactly new to me, but never fails to surprise people when I use it.

Nix: Search History

less than 1 minute read

Often times there is a very specific command you have entered into your bash prompt that you need to run again. You don’t want to have to type it again and s...

Nix: Last Argument Of The Last Command

less than 1 minute read

You can use !$ as a way to reference the last argument in the last command. This makes for an easy shortcut when you want to switch out commands for the same...

Nix: Hexdump A Compiled File

2 minute read

The hexdump unix utility allows you to dump the contents of a compiled/executable file in a _readable_ hexadecimal format. Adding the -C flag includes a side...

Nix: Only Show The Matches

less than 1 minute read

Tools like grep, ack, and ag make it easy to search for lines in a file that contain certain text and patterns. They all come with the -o flag which tells th...

Nix: List Names Of Files With Matches

less than 1 minute read

I often use grep and ag to search for patterns in a group or directory of files. Generally I am interested in looking at the matching lines themselves. Howev...

Nix: Grep For Multiple Patterns

less than 1 minute read

You can use the -e flag with the grep command to search for a pattern. Additionally, you can use multiple -e flags to search for multiple patterns. For insta...

Nix: Grep For Files Without A Match

less than 1 minute read

The grep command is generally used to find files whose contents match a pattern. With the -L (--files-without-match) flag, grep can be used to find files tha...

Nix: Find Newer Files

less than 1 minute read

Use the -newer flag with the name of a file to find files that have a newer modification date than the named file.

Nix: Exclude A Directory With Find

less than 1 minute read

Using find is a handy way to track down files that meet certain criteria. However, if there are directories full of irrelevant files, you may end up with a l...

Nix: Upgrading Ubuntu

less than 1 minute read

I recently discovered that my Linode box was running a fairly old version of Ubuntu. Because it is a remote box that I SSH into, there is no graphical user i...

Nix: Determine The IP Address Of A Domain

less than 1 minute read

The dig (domain information grouper) command can be used to get more information about a domain name. To discover the IP address for a given domain, invoke d...

Nix: Curling For Headers

less than 1 minute read

If you want to inspect the headers of a response from some endpoint, look no further than a quick curl command. By including the -I flag, curl will return ju...

Nix: Curling With Basic Auth Credentials

less than 1 minute read

I often use curl to take a quick look at the responses of particular endpoints. If I try to curl a URL that is secured with HTTP Basic Authentication, this i...

Nix: Convert tabs to/from spaces

less than 1 minute read

The command expand in GNU coreutils converts tabs in each input file to spaces. The command unexpand does the reverse, converting spaces in each input files ...

Nix: Change Default Shell For A User

less than 1 minute read

You can change the default shell program for a particular unix user with the chsh command. Just tell it what shell program you want to use (e.g. bash or zsh)...

Spark: Orderby Partitioning

less than 1 minute read

Remember that orderBy uses the number of partitions specified by spark.conf.get("spark.sql.shuffle.partitions"). The default for this is 200. Can change manu...

Tmux: tmux in your tmux

less than 1 minute read

If you are running tmux locally and you shell into another machine to access tmux remotely, you will suddenly find yourself in tmux inception. You will have ...

Tmux: Adjusting Window Pane Size

less than 1 minute read

In tmux, the size of window panes can be adjusted incrementally with the resize-pane command. For instance, to resize a pane in any direction (left, down, up...

Tmux: Rename The Current Session

less than 1 minute read

If you’ve created an unnamed tmux session or you no longer like the original name, you can open a prompt to change it by hitting

Tmux: Pane Killer

less than 1 minute read

The current pane can be killed (closed) using the following key binding:

Tmux: Paging Up And Down

less than 1 minute read

When in copy mode (<prefix>[), you can move the cursor around like you would in vim with the directional keys (hjkl). This works fine until you want to...

Tmux: Create A New Session In A New Server

less than 1 minute read

Any tmux command will, by default, be invoked against the default server. You can instruct tmux to perform commands against a different server with the -L fl...

Tmux: List Sessions

less than 1 minute read

Not sure if tmux is running or, if it is, which sessions are available? You can list all the currently running sessions right from the command-line.

Tmux: List All Key Bindings

less than 1 minute read

There are a couple ways to list all the tmux key bindings. If you are not currently in a tmux session, you can still access the list from the terminal with

Tmux: Kill The Current Session

less than 1 minute read

When you are done with the current tmux session and you no longer need it, you can simply kill it. You can do so within the session with the following comman...

SQL: Day Of Week By Name For A Date

less than 1 minute read

By using the to_char() function with a date or timestamp, we can determine the day of the week by name (e.g. Monday). For instance, to determine what day tod...

SQL: Count Records By Type

less than 1 minute read

If you have a table with some sort of type column on it, you can come up with a count of the records in that table by type. You just need to take advantage o...

PSQL: Terminating A Connection

less than 1 minute read

Consider the scenario where you are trying to drop a database, but there are existing connections.

PSQL: List Connections To A Database

less than 1 minute read

The pg_stat_activity table can be used to determine what connections there currently are to the PostgreSQL server and to a particular database. To see the pr...

PSQL: Sleeping

less than 1 minute read

Generally you want your SQL statements to run against your database as quickly as possible. For those times when you are doing some sort of debugging or just...

PSQL: Get The Size Of A Table

less than 1 minute read

With the pg_relation_size() function, we can get the size of a given table. For instance, if we’d like to see the size of the reservations table, we can by e...

PSQL: Dump a database

less than 1 minute read

Using the pg_dump with the -Fc flag will create a dump of the given database in a custom format. The output of this command can be redirected into a file (th...

PSQL: Get The Size Of A Database

less than 1 minute read

If you have connect access to a PostgreSQL database, you can use the pg_database_size() function to get the size of a database in bytes.

PSQL: Change The Current Directory

less than 1 minute read

When you start a psql session, your current directory is what psql will use as its current directory. This is important for meta-commands that use relative p...

PSQL: Auto Expanded Display

less than 1 minute read

By default, postgres has expanded display turned off. This means that results of a query are displayed horizontally. At times, the results of a query can be ...

PSQL: Use Argument Indexes

less than 1 minute read

In Postgres, each of the arguments you specify in a select statement has a 1-based index tied to it. You can use these indexes in the order by and group by p...

PSQL: Types By Category

less than 1 minute read

Postgres has many types, each of which fall into a particular category. These categories include Array, Boolean, String, Numeric, Composite, etc. Each of the...

PSQL: Truncate Tables With Dependents

less than 1 minute read

If you have tables A and B where B has a foreign key referencing A, then trying to truncate A will result in something like this:

PSQL: Truncate All Rows

less than 1 minute read

Given a postgres database, if you want to delete all rows in a table, you can use the DELETE query without any conditions.

PSQL: Turn Timing On

less than 1 minute read

When digging around your database and running queries, it is helpful to have an eye on the speed of those queries. This can give insight into where there are...

PSQL: Limit Execution Time Of Statements

less than 1 minute read

You can limit the amount of time that postgres will execute a statement by setting a hard timeout. By default the timeout is 0 (see show statement_timeout;) ...

PSQL: Find The Data Directory

less than 1 minute read

Where does postgres store all of the data for a database cluster? Well, in its data directory. Where exactly that data directory is can depend on how the dat...

PSQL: Configure The Timezone

less than 1 minute read

Running show timezone; will reveal the timezone for your postgres connection. If you want to change the timezone for the duration of the connection, you can ...

PSQL: Use a psqlrc File For Common Settings

less than 1 minute read

There are a handful of settings that I inevitably turn on or configure each time I open up a psql session. I can save myself a little time and sanity by conf...

PSQL: A Better Null Display Character

less than 1 minute read

By default, psql will display null values with whitespace. This makes it difficult to quickly identify null values when they appear amongst a bunch of other ...

PSQL: Salt And Hash A Password With pgcrypto

less than 1 minute read

The pgcrypto extension that ships with PostgreSQL can be used to do a number of interesting things. This includes functions for doing salted password hashing...

PSQL: Compute Hashes With pgcrypto

less than 1 minute read

The pgcrypto extension that comes with PostgreSQL adds access to some general hashing functions. Included are md5, sha1, sha224, sha256, sha384 and sha512. A...

PSQL: List Various Kinds Of Objects

less than 1 minute read

Our PostgreSQL database can end up with all kinds of objects: tables, sequences, views, etc. We can use a variety of psql meta-commands to list the different...

PSQL: List Database Users

less than 1 minute read

Within psql, type \du to list all the users for a database and their respective permissions.

PSQL: List Database Objects With Disk Usage

less than 1 minute read

I’ll often times use \d or \dt to check out the tables in my database. This shows the schema, object name, object type (e.g. table), and owner for each.

PSQL: Insert Just The Defaults

less than 1 minute read

If you are constructing an INSERT statement for a table whose required columns all have default values, you may just want to use the defaults. In this situat...

PSQL: Generate Series Of Numbers

less than 1 minute read

Postgres has a generate_series function that can be used to, well, generate a series of something. The simplest way to use it is by giving it start and stop ...

PSQL: Export Query Results To A CSV

less than 1 minute read

Digging through the results of queries in Postgres’s psql is great if you are a programmer, but eventually someone without the skills or access may need to c...

PSQL: Clear The Screen In psql

less than 1 minute read

The psql interactive terminal does not have a built-in way of clearing the screen. What I usually do if I really need the screen cleared is quit, run clear f...

PSQL: Storing Emails With citext

less than 1 minute read

Email addresses should be treated as case-insensitive because they are. If a user is trying to sign in with their email address, we shouldn’t care if they ty...

PSQL: Getting A Slice Of An Array

less than 1 minute read

Postgres has a very natural syntax for grabbing a slice of an array. You simply add brackets after the array declaring the lower and upper bounds of the slic...

PSQL: Defining Arrays

less than 1 minute read

In postgres, an array can be defined using the array syntax like so:

PSQL: Renaming A Table

less than 1 minute read

Using the alter table command in PostgreSQL, you can rename an existing table. This command will also update any references to the table such as via foreign ...

PSQL: Restart A Sequence

less than 1 minute read

In postgres, if you are truncating a table or doing some other sort of destructive action on a table in a development or testing environment, you may notice ...

GitHub: Link to headers in READMEs

less than 1 minute read

Anytime you add a header to a markdown file, GitHub attaches an href with its downcased name. ‘JavaScript’ receives a link to #javascript, for instance.

Bash: Directional Commands

less than 1 minute read

You can move the cursor without arrow keys. Here is the keyboard equivalent for each.

Bash: Jump To The Ends Of Your Shell History

less than 1 minute read

There are all sorts of ways to do things in your shell environment without reaching for the arrow keys. For instance, if you want to move up to the previous ...

Homebrew: Switch Versions of a Brew Formula

less than 1 minute read

If you’ve installed a couple versions of a program via brew and you’d like to switch from the currently linked version to the other installed version, you ca...

Back to Top ↑

posts

The evolution of US gun violence

1 minute read

Heather Cox Richardson is an American historian and professor of history at Boston College. She has a wonderful newsletter Letters from an American worth sub...

The social contract of open source

1 minute read

I’m a huge fan of open source software. Much of my work and play involve using it, either directly or indirectly. And so I’m a big proponent of people giving...

Entropy Explained, With Sheep

less than 1 minute read

All Physics students learn the Second Law of Thermodynamics, that entropy always increases. But not all such students understand why this is. To do this, let...

Debunk Flat Earthers

less than 1 minute read

Carl Sagan debunks Flat Earthers using nothing more than a piece of cardboard:

How to talk to children

1 minute read

As a parent of two small children, I have to constantly remind myself that some positive or negative event that seems trivial to me may be incredible or deva...

How to create a healthy society

less than 1 minute read

Microsoft Research’s danah boyd has been given an award by the Electronic Frontier Foundation, and gave a magnificant speech on her experience as a women in ...

How to be Black

less than 1 minute read

I just finished Baratunde Thurston’s How to be Black, a wonderful “satirical guide to race issues – written for black people and those who love them”.

SHAttered

less than 1 minute read

Awesome work to demonstrate how to deliberately cause a SHA-1 collision.

The Truth About Bad Science

less than 1 minute read

This wired.com article on ‘bad science’ speaks to me on so many levels. It hurts my soul to see how many published studies are not reproducible.

Death by Diagnosis on Freakonomics

less than 1 minute read

Q: What’s the number-one problem in healthcare? A: I think the number-one problem is we don’t measure performance. We don’t measure the outcomes of patien...

Making machine learning models interpretable

less than 1 minute read

This month, the 20th European Symposium on Artificial Neural Networks, Computational Intelligence and Machine Learning held a special session on “Interpretab...

The relativity of raw data

1 minute read

Data scientists often say that they want access to the ‘raw data’ – but what does that term mean?

The Friendliest of Fire

less than 1 minute read

How on earth could an American pilot get a Distinguished Flying Cross for shooting down an American plane? The Friendliest of Fire (via Now I Know) tells all.

Hello World!

less than 1 minute read

In the proud tradition of programmers everywhere, I use my first blog post to say “Hello World!”.

Back to Top ↑

tips

Machine Learning for Product Managers

less than 1 minute read

In a previous post, I discussed the importance of learning how to properly communicate Data Science to maximize the impact of your work. Product Managers are...

Communicating Data Science with impact

less than 1 minute read

One of the major differentiators between a new Data Scientist and a more experienced one is how the more senior practitioner spends a lot of time understandi...

Data problems

less than 1 minute read

Another nice Medium post from Benjamin Obi Tayo has a good summary of the types of issues you should always be mindful of when you get a new data set:

Model error quantification

less than 1 minute read

When we train a machine-learning model, we almost always report some performance metric, such as accuracy, recall, or F1-score.

Probabilistic interpretation of AUC

less than 1 minute read

The area under the ROC (receiver operating characteristic) curve, or AUC, is a popular and robust metric for machine learning classification. However, one is...

The Modal American

less than 1 minute read

When trying to characterize a dataset, we often reach for the old standby: the mean of each property. If we give it some more thought, we might consider usin...

The Data Visualisation Catalogue

less than 1 minute read

Looking for inspiration for your data viz project? Can’t remember what a particular visualization is called? Check out the Data Visualisation Catalogue.

Back to Top ↑

tres

Minnow Telescope Finds Massive Planet

8 minute read

Since ancient times, mankind has studied the sky and wondered what the ‘wandering stars’ (planets) might be. In the last two decades, we have found hundreds ...

Back to Top ↑

Back to Top ↑