- Notifications
You must be signed in to change notification settings - Fork0
This is a web scraper that produces publicly accessible, static JSON feeds directly and automatically from the public COS directory website.
License
jlumbroso/princeton-scraper-cos-people
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
This is a web scraper that produces machine-processable JSON feedsof Princeton University's Department of Computer Science directory, sourcedfromthe official, publicly available directory.
You can seethe main JSON feed by clicking here.
There are also sub-feeds by category of persons (faculty, grad students, staff, etc.).These feeds are all updatedevery week on Saturday. Read on to learn more.
You can access the main (regularly updated) JSON feed directly from this URL:
https://jlumbroso.github.io/princeton-scraper-cos-people/feeds/
There are sub-feeds available for the different categories of people:
admin-staff
affiliated-faculty
emeritus-faculty
faculty
grad-students
research-instructors
researchers
technical-staff
For example using Python, you can use therequests
package toget the JSON feed:
importrequestsr=requests.get("https://jlumbroso.github.io/princeton-scraper-cos-people/feeds/")ifr.ok:data=r.json()["data"]
This feed provides most people in the directory as a JSON dictionary withthe following fields:
{"email":"lumbroso@cs.princeton.edu","office":"035 Corwin Hall","degree":"Ph.D., Universit\u00e9 Pierre et Marie Curie, 2012","title":"Lecturer","name":"J\u00e9r\u00e9mie Lumbroso","research-interests":"Probabilistic algorithms, data streaming, data structures, analysis of algorithms, analytic combinatorics.","profile-url":"https://www.cs.princeton.edu/people/profile/lumbroso","image-url":"https://www.cs.princeton.edu/sites/all/modules/custom/cs_people/generate_thumbnail.php?id=2488&thumb=","image":"<base 64 encoded JPEG of the image>","netid":"lumbroso","first":"J\u00e9r\u00e9mie","last":"Lumbroso","type":"faculty" }
Other categories of people may have other fields, such asleave
,advisers
,website
, etc.
Previously, I had implementedJSON feeds to programmatically obtain the faculty ofPrinceton's School of Engineering and Applied Sciences,to build the web portal for the BSE 2024 First Year Advising program.
This time, I needed to access the directory information of the Department of Computer Sciencegraduate students. Unfortunately, like for the SEAS faculty, there is no programmaticallyavailable data source that also contains important information such as photos; the only suchsource is the Department of Computer Science official website.
Despite having had conversations with@sckarlin about notscraping the contents of the directory, it appeared that this was the easiest way to obtainup-to-date grad student information.
The first application for this feed will be to configure and provision the Slack profiles ofthe CS grad student Slack.
This repository is licensed underThe Unlicense. This means I have no liability, butyou can do absolutely what you want with this.
About
This is a web scraper that produces publicly accessible, static JSON feeds directly and automatically from the public COS directory website.
Topics
Resources
License
Uh oh!
There was an error while loading.Please reload this page.
Stars
Watchers
Forks
Contributors2
Uh oh!
There was an error while loading.Please reload this page.