Recent Posts

(TIL) Spark: Count number of duplicate rows

less than 1 minute read

To count the number of duplicate rows in a pyspark DataFrame, you want to groupBy() all the columns and count(), then select the sum of the counts for the ro...

(TIL) Docker: Set Timezone

less than 1 minute read

To set which timezone your docker container should use, add the following to your Dockerfile: