My laptops memory is 8 gig and I was trying to read and process a big csv file, and got memory issues, I found a solution which is usingchunksize to process the file chunk by chunk, but apperntly when uisng chunsize the file format vecoetextreaderfile and the code I was using to process normal csvs with it doesnt work anymore, this is the code I'm trying to use to read how many sentences inside the csv file.
wdata = pd.read_csv(fileinput, nrows=0,).columns[0]skip = int(wdata.count(' ') == 0)wdata = pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=1000)data = wdata.count()print(data)the error I'm getting is:-
Traceback (most recent call last): File "table.py", line 24, in <module> data = wdata.count()AttributeError: 'TextFileReader' object has no attribute 'count'I tried another way arround aswell by running this code
TextFileReader = pd.read_csv(fileinput, chunksize=1000) # the number of rows per chunkdfList = []for df in TextFileReader: dfList.append(df)df = pd.concat(dfList,sort=False)print(df)and it gives this error
data = self._reader.read(nrows) File "pandas/_libs/parsers.pyx", line 881, in pandas._libs.parsers.TextReader.read File "pandas/_libs/parsers.pyx", line 908, in pandas._libs.parsers.TextReader._read_low_memory File "pandas/_libs/parsers.pyx", line 950, in pandas._libs.parsers.TextReader._read_rows File "pandas/_libs/parsers.pyx", line 937, in pandas._libs.parsers.TextReader._tokenize_rows File "pandas/_libs/parsers.pyx", line 2132, in pandas._libs.parsers.raise_parser_errorpandas.errors.ParserError: Error tokenizing data. C error: Expected 2 fields in line 3, saw 41 Answer1
You have to iterate over the chunks:
csv_length = 0 for chunk in pd.read_csv(fileinput, names=['sentences'], skiprows=skip, chunksize=10000): csv_length += chunk.count()print(csv_length )10 Comments
Explore related questions
See similar questions with these tags.