memory error reading big size csv in pandas

Question 1

My laptops memory is 8 gig and I was trying to read and process a big csv file, and got memory issues, I found a solution which is usingchunksize to process the file chunk by chunk, but apperntly when uisng chunsize the file format vecoetextreaderfile and the code I was using to process normal csvs with it doesnt work anymore, this is the code I'm trying to use to read how many sentences inside the csv file.

wdata = pd.read_csv(fileinput, nrows=0,).columns[0]skip = int(wdata.count(' ') == 0)wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=1000)data = wdata.count()print(data)

the error I'm getting is:-

Traceback (most recent call last):  File "table.py", line 24, in <module>    data = wdata.count()AttributeError: 'TextFileReader' object has no attribute 'count'

I tried another way arround aswell by running this code

TextFileReader = pd.read_csv(fileinput, chunksize=1000)  # the number of rows per chunkdfList = []for df in TextFileReader:    dfList.append(df)df = pd.concat(dfList,sort=False)print(df)

and it gives this error

   data = self._reader.read(nrows)  File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read  File "pandas/_libs/parsers.pyx", line 908, in pandas._libs.parsers.TextReader._read_low_memory  File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows  File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows  File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_errorpandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4

Question 2

You have to iterate over the chunks:

csv_length = 0    for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000):    csv_length += chunk.count()print(csv_length )

Question 3

this is printing 1000 , 1000 , 1000, 1000 fore more than 200 times

Question 4

@programmingfreak yes. obviously. you are reading chunks with length of 1000. and printing the length

Question 5

@programmingfreak you have to add the count of each chunk to a variable to get the full length

Question 6

I know I tried to append all of them to print the length of the file but it killed the process automatically for some reason, is there a way around it ?

Question 7

the other attempt doesnt really make sense. your memory is too small to process the full csv. you cant read the chunks and append them together. you have to process each chunk and clear it out of memory

gbruenjes 4,2251 gold badge18 silver badges32 bronze badges · Accepted Answer · 2020-01-20 07:04:08Z

2

You have to iterate over the chunks:

csv_length = 0    for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000):    csv_length += chunk.count()print(csv_length )

Share

Improve this answer

editedJan 20, 2020 at 7:04

answeredJan 20, 2020 at 6:51

gbruenjes

4,2251 gold badge18 silver badges32 bronze badges

Sign up to request clarification or add additional context in comments.

10 Comments

programming freak

programming freak Over a year ago

this is printing 1000 , 1000 , 1000, 1000 fore more than 200 times

2020-01-20T07:00:16.997Z+00:00

gbruenjes

gbruenjes Over a year ago

@programmingfreak yes. obviously. you are reading chunks with length of 1000. and printing the length

2020-01-20T07:01:40.653Z+00:00

gbruenjes

gbruenjes Over a year ago

@programmingfreak you have to add the count of each chunk to a variable to get the full length

2020-01-20T07:03:11.43Z+00:00

programming freak

programming freak Over a year ago

I know I tried to append all of them to print the length of the file but it killed the process automatically for some reason, is there a way around it ?

2020-01-20T07:03:18.82Z+00:00

gbruenjes

gbruenjes Over a year ago

the other attempt doesnt really make sense. your memory is too small to process the full csv. you cant read the chunks and append them together. you have to process each chunk and clear it out of memory

2020-01-20T07:05:58.24Z+00:00

|

Movatterモバイル変換

Collectives™ on Stack Overflow

memory error reading big size csv in pandas

1 Answer1

10 Comments

Your Answer

Sign up orlog in

Post as a guest

Related

Hot Network Questions

Subscribe to RSS