Following the discussion in#56301, we (a group of folks working on the Scientific Python grant) have been working to set up infrastructure and translate the contents of the pandas web site. As of this moment, we have 100% translations for the pandas website into Spanish and Brazilian Portuguese, with other languages available for translation (depending on volunteer translators).

To build, the command remains the same:

python pandas_web.py pandas/content --target-path build

If you want to check out other related work, please take a look atscipy/scipy.org#617

You an read more about how the translation process works athttps://scientific-python-translations.github.io/docs/

What this PR does?

Download and extract the latest available translations (over 90% completion) fromhttps://github.com/Scientific-Python-Translations/pandas-translations. The setting can be changedhere
Adds a Language switcher (Thanks@melissawm ❤️ 🚀 ).
Added a new section to the config to store additional translations information.
Handles site generation for each language.
Left everything in the same script.

Supersedes#61220

Demo

cc@mroeschke @datapythonista

goanpeca force-pushed thetranslations branch from59c82e1 to987b544Compare

April 30, 2025 03:25

goanpeca mentioned this pull request

Apr 30, 2025

ENH: Create infrastructure for translations#61220

Closed

5 tasks

Copy link

Member

datapythonista commentedApr 30, 2025

/preview

datapythonista added the Docs label

Apr 30, 2025

Copy link

Contributor

github-actionsbot commentedApr 30, 2025

Website preview of this PR available at:https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

Copy link

Member

datapythonista commentedApr 30, 2025

Thanks@goanpeca for this. Do you mind adding some more context here? I can't see inhttps://github.com/Scientific-Python-Translations/pandas-translations much information, like what languages are available, or how to fix a bad translation, which would be useful to know.

Also, in the docs generated from this PR I can see any language dropdown or anything different from our current docs. What are we expecting?

Copy link

Contributor

melissawm commentedMay 1, 2025

Hi@datapythonista - this is a follow up to#61220, a proof-of-concept CI job to build the website with translations that don't live in this repo. This PR and#61220 are meant to work together and I'm happy to incorporate one into the other once we agree on the general direction and workflow for this.

Let us know if we can answer any other questions. Unfortunately I'm not sure how to get the preview for the other PR, I relied on building locally to test that things were working.

Copy link

Member

datapythonista commentedMay 1, 2025

Sorry, I missed#61220 and the issue discussion.

I don't fully understand what you're doing here, but I describe next how to add translations without adding too much complexity in this repo, which I don't think any core dev would be onboard with.

You decide on how to generate translations and manage it independently from this repo, and end up with a structure like this with the translated documents:

+ es/  - index.md  + about/    - team.md    - ...  - ...+ pt/  - index.md  + about/    - team.md    - ...  - ...

In our CI, before callingpandas_web.py you download this directory structure to theweb/ directory. No other changes needed, this will create all translated pages.
We add a dropdown with the languages to the website (you can add the language list toweb/pandas/config.yml)

I think this makes everyone's life easy, and we get the expected result.

Copy link

Contributor

melissawm commentedMay 2, 2025

Thanks@datapythonista !

Can you clarify what is missing from#61220 to match your description? That is pretty much what is done in that PR. Maybe this is confusing because we chose to do it in two parts exactly because we wanted to decouple the reorganization of the repo + switcher (in#61220) from the actual translations (this PR).

Happy to follow up with any feedback in the other PR as well. Cheers!

Copy link

Member

datapythonista commentedMay 2, 2025

In#61220 you are moving all the current website pages, that should be undone. You are adding the translated pages to this repo, we don't want it. You are making changes to pandas_web.py, this is not needed based on what I described above.

Only changes in a PR to this pandas repo should be addi g a CI step as per step 2, editing the wevsite template with the language dropdown as per step 3.

Copy link

Contributor

melissawm commentedMay 2, 2025

I see! I will rework what I have there to match your proposal. Thanks!

goanpeca force-pushed thetranslations branch 3 times, most recently fromcd72e5c to2b85ad4Compare

May 8, 2025 01:47

Add script to import translations

d7f0545

goanpeca force-pushed thetranslations branch from2b85ad4 tod7f0545Compare

May 8, 2025 01:50

datapythonista reviewed

May 8, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great improvement to the PR, this makes a lot of sense to me.

I'd personally still simplify things more here in two ways. And feel free to disagree, as it's something opinionated.

First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.

Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.

Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.

Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.

In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.

goanpeca force-pushed thetranslations branch 3 times, most recently from4a6532b to04f9259Compare

May 12, 2025 02:38

goanpeca marked this pull request as ready for review

May 12, 2025 02:47

Copy link

Author

goanpeca commentedMay 12, 2025•
edited
Loading

Great improvement to the PR, this makes a lot of sense to me.

Thanks for the review@datapythonista.

First, if I understand correctly, you download the translations of the web, and the percentage of the translated content, amd then you check for each language if it's translated enough to be published. Personally, I think you should better take care of this logic in your repo when generating the tar, not here. First to simplify the code here, and second to avoid downloading translations that are not going to be used.

Fixed!

Just as a suggestion, I wouldn't use this approach, even in your repo. Imagine you publish translations that are at least 90%, and we have Spanish at 100%. Then I add new content that is 11% of the website. And automatically the Spanish translations that are already indexed by search engines, in user bookmarks, in links in blog posts... are deleted from our website. Not great in my opinion, much better to simply get the new content in English and hope it will eventually be translated.

This is also fixed!

Another thing I would do is to extract the tar file as it is downloaded. So the tar file is passed to gzip/tarfile in memory, with the io module, not as a path in disk. With this you can get all the code here in a single short function. Or we ciuld even create a github action in your repo with this, as it's generic, and just use it here. So only the CI step would live in this PR.

Did not follow this one as I think things are now simpler and in a single script.

Finally, we already have a configuration file for the website. We could save the url of the translations tar there. Also good in the script, just a question of preference. Or if you go for the github action approach, it could simply be a parameter in the CI step. Then you would need another one for the target dir in this repo.

Added the information the to the config file as requested and updated the scripts to handle site generation for languages. Moved all logic to the existing script.

In any case, the approach here is also very reasonable, all above are suggestions that personally I think would make things simpler.

Please let me know what do you think about the current changes.

This PR now supersedes#61220

Thanks@melissawm!

goanpeca requested a review fromdatapythonista

May 12, 2025 02:50

goanpeca changed the title~~Update CI to include Translations from Scientific Python Repo~~Implement translations infrastructure

May 12, 2025

goanpeca force-pushed thetranslations branch 3 times, most recently from09aaca5 to7e68731Compare

May 12, 2025 03:04

Update scripts to handle tranlsations

17063a7

goanpeca force-pushed thetranslations branch from7e68731 to17063a7Compare

May 12, 2025 03:04

Merge branch 'main' of github.com:pandas-dev/pandas into translations

8eec135

datapythonista reviewed

May 20, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Thanks@goanpeca for the updates. This is getting much simpler and cleaner. I added few more comments that I think should simplify this PR even more, but this starts to be much more reasonable in my opinion.

web/pandas/_templates/layout.html OutdatedShow resolvedHide resolved

web/pandas/config.yml OutdatedShow resolvedHide resolved

web/pandas_web.py OutdatedShow resolvedHide resolved

Copy link

Author

goanpeca commentedMay 21, 2025•
edited
Loading

Hi@datapythonista, I implemented most of the additional suggestions. Please see comments.

goanpeca requested a review fromdatapythonista

May 21, 2025 09:01

goanpeca force-pushed thetranslations branch 2 times, most recently from4a99e10 to0964c17Compare

May 21, 2025 09:33

Create preprocessor and fix review comments

ac0b8f0

goanpeca force-pushed thetranslations branch from0964c17 toac0b8f0Compare

May 21, 2025 09:35

datapythonista reviewed

May 21, 2025

View reviewed changes

Copy link

Member

datapythonista left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others.Learn more.

Great job. This looks very clear now. I'd split the preprocessors in a slightly different way as suggested in the comments, but I think the way the code is now is very simple and easy to understand and maintain. Thanks a lot for all the updates here.

web/pandas_web.pyShow resolvedHide resolved

web/pandas_web.py OutdatedShow resolvedHide resolved

.gitignore OutdatedShow resolvedHide resolved

Copy link

Author

goanpeca commentedMay 21, 2025

I'd split the preprocessors in a slightly different way as suggested in the comments, but I think the way the code is now is very simple and easy to understand and maintain. Thanks a lot for all the updates here.

Made some new changes based on your suggestions@datapythonista.

goanpeca force-pushed thetranslations branch 2 times, most recently from9bc7032 to5245445Compare

May 21, 2025 13:02

goanpeca requested a review fromdatapythonista

May 21, 2025 13:54

Copy link

Member

datapythonista commentedMay 21, 2025

/preview

Copy link

Contributor

github-actionsbot commentedMay 21, 2025

Website preview of this PR available at:https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

Split preprocessor logic and more code review changes

99e9635

goanpeca force-pushed thetranslations branch from5245445 to99e9635Compare

May 21, 2025 14:19

Copy link

Member

datapythonista commentedMay 21, 2025

The sponsor logos in the home page don't render correctly. I guess the problem is not in this PR, but in the translation of the html file, no?

Copy link

Author

goanpeca commentedMay 21, 2025•
edited
Loading

The sponsor logos in the home page don't render correctly. I guess the problem is not in this PR, but in the translation of the html file, no?

Would it be ok to use absolute URLs? since the english pages live in the root but the translated pages live ines/something. Would not work on preview though.

I could use/static/img... instead of../static at is currently used. Frameworks rely on filters likerelative_url / absolute_url to handle this cases and append the appropriate base_folder or base_url to the link.

Either that, or copying assets folder into each language.

Copy link

Member

datapythonista commentedMay 21, 2025

The images of the books should be implemented in the same exact way, and those seem to be working fine in the translated pages. Doesn't seem like we need to change the links, feels more like a problem in the translated content for those images, no?

Copy link

Author

goanpeca commentedMay 21, 2025

The images of the books should be implemented in the same exact way, and those seem to be working fine in the translated pages. Doesn't seem like we need to change the links, feels more like a problem in the translated content for those images, no?

I will look into the content.

Copy link

Author

goanpeca commentedMay 21, 2025•
edited
Loading

I guess the problem is not in this PR, but in the translation of the html file, no?

Correct. This is an issue in the translations. (Working on those fixes)

¿Besides that is there anything else you consider needs a revision?

Merge branch 'main' into translations

eadb5c9

Copy link

Member

datapythonista commentedJun 1, 2025

/preview

Copy link

Contributor

github-actionsbot commentedJun 1, 2025

Website preview of this PR available at:https://pandas.pydata.org/preview/pandas-dev/pandas/61380/

Copy link

Member

datapythonista commentedJun 1, 2025

Thanks for the updates@goanpeca. The PR seems reasonable now.

Seeing how the translations are implemented, like having a copy of the whole home page for each language, I'm a bit concerned on how this translations are going to be maintained.

It would be good first to know more details of the grant. Like, for how long there are funds to keep the translations up to date after we merge this.

With approaches like how Django translates content, if a translation is not maintained, updated texts will default back to the base language. With this approach, if tomorrow we change the styles of the website, all the translated pages will appear immediately broken. This will make it very hard for us to make any change to the website. If this was a community effort it would be a problem. But if this is a time limited grant, which I guess it's the case, this is a much bigger problem.

It'd be good to get your feedback, but if we can't use an approach with.po files, maybe it's better to keep the translated pages in an unofficial domain that we can link from our website. Otherwise feels like we'll be merging this, and in a couple of years when the translations are outdated and broken we'll have to revert this.

Copy link

Author

goanpeca commentedJun 1, 2025•
edited
Loading

Hi@datapythonista !

Seeing how the translations are implemented, like having a copy of the whole home page for each language, I'm a bit concerned on how this translations are going to be maintained.

Volunteers will maintain translations as they keep coming and new volunteers will be added as more translation sprints are organized within the scientific python organization.

There will be also periodic mainteinance to ensure all the infrastructure keeps running smoothly.

It would be good first to know more details of the grant. Like, for how long there are funds to keep the translations up to date after we merge this.

@trallard can provide more details on this.

With approaches like how Django translates content, if a translation is not maintained, updated texts will default back to the base language. With this approach, if tomorrow we change the styles of the website, all the translated pages will appear immediately broken. This will make it very hard for us to make any change to the website. If this was a community effort it would be a problem. But if this is a time limited grant, which I guess it's the case, this is a much bigger problem.

The same approach works with the crowdin infrastructure, any untranslated content will use the original language.

This has been running smoothly for numpy.org for many months (year now?) !

It'd be good to get your feedback, but if we can't use an approach with .po files, maybe it's better to keep the translated pages in an unofficial domain that we can link from our website. Otherwise feels like we'll be merging this, and in a couple of years when the translations are outdated and broken we'll have to revert this.

md, po files work in more or less the same way as the segmentation in crowdin stores sentences/paragraphs, so even if the files are moved, if a specific phrase was already translated it will be automatically available.

As new files are added or changed, the infrastructure and bot will pick up the changes on a weekly basis and inform translators that new strings are available.

In any case, any big change will require some extra attention in case the whole site infrastructure changes, but even then, already translated content will be available to be reused.

I hope this answers most of your questions.

Copy link

Member

datapythonista commentedJun 1, 2025

I don't fully understand how given what you say, the home page of Spanish and Portuguese is broken (the sponsor logos), but let's give it a try.

Can you fix the broken home pages please.Then we can introduce a change to the website in this same PR to see what really happens when content changes.

Copy link

Contributor

github-actionsbot commentedJul 2, 2025

This pull request is stale because it has been open for thirty days with no activity. Pleaseupdate and respond to this comment if you're still interested in working on this.

github-actionsbot added the Stale label

Jul 2, 2025

Copy link

Contributor

trallard commentedJul 2, 2025

I thought I had replied here@datapythonista but it seems I did not. So apologies as things have been way too chaotic on my end.
Anyway our grant has ended but we can commit some amount of maintenance level for the foreseeable future.
If we are still interested in moving this forward I can coordinate internally and have someone look at your questions or outstanding actions to get this over the finish line.

Labels

Docs Stale

4 participants

Movatterモバイル変換

Uh oh!

ENH: Implement translations infrastructure#61380

Are you sure you want to change the base?

ENH: Implement translations infrastructure#61380

Conversation

goanpeca commentedApr 30, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

What this PR does?

Demo

Uh oh!

datapythonista commentedApr 30, 2025

Uh oh!

github-actionsbot commentedApr 30, 2025

Uh oh!

datapythonista commentedApr 30, 2025

Uh oh!

melissawm commentedMay 1, 2025

Uh oh!

datapythonista commentedMay 1, 2025

Uh oh!

melissawm commentedMay 2, 2025

Uh oh!

datapythonista commentedMay 2, 2025

Uh oh!

melissawm commentedMay 2, 2025

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

goanpeca commentedMay 12, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goanpeca commentedMay 21, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

goanpeca commentedMay 21, 2025

Uh oh!

datapythonista commentedMay 21, 2025

Uh oh!

github-actionsbot commentedMay 21, 2025

Uh oh!

datapythonista commentedMay 21, 2025

Uh oh!

goanpeca commentedMay 21, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista commentedMay 21, 2025

Uh oh!

goanpeca commentedMay 21, 2025

Uh oh!

goanpeca commentedMay 21, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

Uh oh!

datapythonista commentedJun 1, 2025

Uh oh!

github-actionsbot commentedJun 1, 2025

Uh oh!

datapythonista commentedJun 1, 2025

Uh oh!

goanpeca commentedJun 1, 2025• editedLoading Uh oh!There was an error while loading.Please reload this page.

Uh oh!

goanpeca commentedApr 30, 2025•
edited
Loading

goanpeca commentedMay 12, 2025•
edited
Loading

goanpeca commentedMay 21, 2025•
edited
Loading

goanpeca commentedMay 21, 2025•
edited
Loading

goanpeca commentedMay 21, 2025•
edited
Loading

goanpeca commentedJun 1, 2025•
edited
Loading