Movatterモバイル変換

neetcode-gh/leetcodePublic

NotificationsYou must be signed in to change notification settings
Fork2.4k
Star6.2k

Bug Report for gpt-dataset #4875

New issue

Open

Bug Report for gpt-dataset#4875

Description

mak2508

opened

on Oct 1, 2025

Bug Report forhttps://neetcode.io/problems/gpt-dataset

Please describe the bug below and include any steps to reproduce the bug or screenshots if possible.

class Solution:
def batch_loader(self, raw_dataset: str, context_length: int, batch_size: int) -> Tuple[List[List[str]]]:
torch.manual_seed(0)
tokenized = raw_dataset.split()
indices = torch.randint(low=0, high=len(tokenized) - context_length, size=(batch_size,)).tolist()
X = []
Y = []
for idx in indices:
X.append(tokenized[idx:idx+context_length])
Y.append(tokenized[idx+1:idx+1+context_length])
return X, Y

In the provided solution,high=len(tokenized) - context_length can result in invalid index when generating they output vector. If the random index =len(tokenized)-context_length, then the end index foridx+1:idx+1+context_length will belen(tokenized)-context_length+idx+1+context_length-1 which equalslen(tokenized) and is out of bounds.

Metadata

Assignees

No one assigned

Labels

No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug Report for gpt-dataset #4875

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions