You can quickly view a file using cat
Remember that orderBy uses the number of partitions specified by spark.conf.get("spark.sql.shuffle.partitions"). The default for this is 200. Can change manu...
Awesome work to demonstrate how to deliberately cause a SHA-1 collision.
If you want to reuse a dataframe df without having to recreate it, you can use df.cache() to tell Spark to keep the dataframe in memory.
Top tips on better descriptive statistics: