1

My laptops memory is 8 gig and I was trying to read and process a big csv file, and got memory issues, I found a solution which is usingchunksize to process the file chunk by chunk, but apperntly when uisng chunsize the file format vecoetextreaderfile and the code I was using to process normal csvs with it doesnt work anymore, this is the code I'm trying to use to read how many sentences inside the csv file.

wdata = pd.read_csv(fileinput, nrows=0,).columns[0]skip = int(wdata.count(' ') == 0)wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=1000)data = wdata.count()print(data)

the error I'm getting is:-

Traceback (most recent call last):  File "table.py", line 24, in <module>    data = wdata.count()AttributeError: 'TextFileReader' object has no attribute 'count'

I tried another way arround aswell by running this code

TextFileReader = pd.read_csv(fileinput, chunksize=1000)  # the number of rows per chunkdfList = []for df in TextFileReader:    dfList.append(df)df = pd.concat(dfList,sort=False)print(df)

and it gives this error

   data = self._reader.read(nrows)  File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read  File "pandas/_libs/parsers.pyx", line 908, in pandas._libs.parsers.TextReader._read_low_memory  File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows  File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows  File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_errorpandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 4
askedJan 20, 2020 at 6:46
programming freak's user avatar

1 Answer1

2

You have to iterate over the chunks:

csv_length = 0    for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000):    csv_length += chunk.count()print(csv_length )
answeredJan 20, 2020 at 6:51
gbruenjes's user avatar
Sign up to request clarification or add additional context in comments.

10 Comments

this is printing 1000 , 1000 , 1000, 1000 fore more than 200 times
@programmingfreak yes. obviously. you are reading chunks with length of 1000. and printing the length
@programmingfreak you have to add the count of each chunk to a variable to get the full length
I know I tried to append all of them to print the length of the file but it killed the process automatically for some reason, is there a way around it ?
the other attempt doesnt really make sense. your memory is too small to process the full csv. you cant read the chunks and append them together. you have to process each chunk and clear it out of memory
|

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.