Pandas: Use chunksize to iterate through files with read_csv

November 9, 2017 less than 1 minute read

Suppose you wish to iterate through a (potentially very large) file lazily rather than reading the entire file into memory.

By specifying a chunksize to read_csv or read_table, the return value will be an iterable object of type TextFileReader:

In [163]: reader = pd.read_table('tmp.sv', sep='|', chunksize=4)

In [164]: reader
Out[164]: <pandas.io.parsers.TextFileReader at 0x7ff27e15a450>

In [165]: for chunk in reader:
   .....:     print(chunk)
   .....:
Out[165]:
   Unnamed: 0         0         1         2         3
0           0  0.469112 -0.282863 -1.509059 -1.135632
1           1  1.212112 -0.173215  0.119209 -1.044236
2           2 -0.861849 -2.104569 -0.494929  1.071804
3           3  0.721555 -0.706771 -1.039575  0.271860
   Unnamed: 0         0         1         2         3
4           4 -0.424972  0.567020  0.276232 -1.087401
5           5 -0.673690  0.113648 -1.478427  0.524988
6           6  0.404705  0.577046 -1.715002 -1.039268
7           7 -0.370647 -1.157892 -1.344312  0.844885
   Unnamed: 0         0        1         2         3
8           8  1.075770 -0.10905  1.643563 -1.469388
9           9  0.357021 -0.67460 -1.776904 -0.968914

Via pandas-docs .

Share on

Twitter Facebook LinkedIn

Writing Code Was Never The Bottleneck

July 24, 2025 less than 1 minute read

With all the recent hype around large language models (LLMs) and their ability to effortlessly generate code, Pedro Tavares reminds us that it’s worth reflec...

Declare your python dependencies within your Jupyter notebook

January 16, 2025 less than 1 minute read

Reproducible workflows are simplified with tools like Nix for shell scripts and juv for Jupyter notebooks, enabling dependency declarations directly within s...

Why you should really prepare for your one-on-ones

January 16, 2025 less than 1 minute read

Maximize the impact of your 1-on-1 meetings by preparing thoroughly, not just with your direct reports but also with your managers, to boost both job perform...

Why You’re Not Getting Value from Your Data Science

May 16, 2023 1 minute read

If companies want to get value from their data, they need to focus on accelerating human understanding of data, scaling the number of modeling questions they...

Francis T. O'Donovan