Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Duplicate objects returned from list #2979

Closed as not planned
Closed as not planned
Labels
@ericfrederich

Description

@ericfrederich

Description of the problem, including code/CLI snippet

At least several REST resources are returning duplicate objects. I have noticed this on both projects and users.
This may be the expected behavior of GitLab itself, but perhaps this Python package which handles pagination could also handle deduplication based onid.

Expected Behavior

I would expect no duplicate objects when using a.list(get_all=True, iterator=True) even if objects of that type are created while in the middle of all the pages.

Actual Behavior

If callinggl.projects.list(get_all=True, iterator=True) and a project is created (or the same with users and likely all other object types as well), you'll get a duplicate object.

end-user mitigation and thoughts

It would be nice if end users didn't have to dedupe themselves.

The below code is overkill but has info I was using while trying to understand the problem.

What I have found is that I do get that warning log about an exact match being returned. I have never seen theAssertionError raised. I also tracked the indices for information. In every instance it's been at indexx99 andx00 (right on a page boundary).
This makes sense as a new project or user is created we've already missed it and everything shifts by one index.

WARNING  Duplicate project id 31393 at index 1099 and 1100WARNING  Duplicate project id 30028 at index 2099 and 2100WARNING  Duplicate project id 22457 at index 7899 and 7900WARNING  Duplicate user id 222 at index 10299 and 10300

If deduplication is implemented within python-gitlab itself it wouldn't need to keep track of all object ids, just the previous page's object ids, since this only occurs on page boundaries.

defget_stuff(manager:CRUDMixin,**kwargs):things= []things_by_id= {}obj_type=manager.__class__.__name__.removesuffix("Manager").lower()fori,thinginenumerate(manager.list(iterator=True,**kwargs)):ifthing.idinthings_by_id:existing_idx,existing_thing=things_by_id.get(thing.id)ifexisting_thing==thing:logger.warning("Duplicate %s id %s at index %d and %d",obj_type,thing.id,existing_idx,i)continueelse:p1=Path(tempfile.gettempdir())/f"{obj_type}_{existing_thing.id}_idx_{existing_idx}"p2=Path(tempfile.gettempdir())/f"{obj_type}_{thing.id}_idx_{i}"withp1.open("wt")asf:print(json.dumps(existing_thing.attributes,indent=2,sort_keys=True),file=f)withp2.open("wt")asf:print(json.dumps(thing.attributes,indent=2,sort_keys=True),file=f)raiseAssertionError(f"Duplicate{obj_type} id{thing.id} at index{existing_idx} and{i}; look at{str(p1)} and{str(p2)}"                )things_by_id[thing.id]= (len(things),thing)things.append(thing)# TODO: this would be better done w/ rich or somethingiflen(things)%100==0:iflen(things)%200==0:click.secho("...",nl=False,fg="yellow",bold=True)else:click.secho("...",nl=False,fg="green",bold=True)iflen(things)%1000==0:click.secho(f"\n{len(things)}{obj_type}s",fg="blue")click.secho(f"\n{len(things)}{obj_type}s total",fg="blue",bold=True)returnthings

Specifications

  • python-gitlab version:python-gitlab==4.10.0
  • API version you are using (v3/v4):v4
  • Gitlab server version (or gitlab.com):16.11.6-ee

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions


      [8]ページ先頭

      ©2009-2025 Movatter.jp