Movatterモバイル変換


[0]ホーム

URL:


Picture of William Cohen

William W. Cohen

Visiting Professor,Bio |Announcements and FAQs |Teaching |Publications (recent,all) |Software |Datasets |Talks |Students & Colleagues |Other Stuff]

Prospective visitors/students: seeannouncements

Biography

William Cohen is a Visiting Professor at Carnegie Mellon University intheMachine Learning Department.He also holds a position as a Principal Scientist at Google, where heworked full-time between May 2018 and March 2024. He received hisbachelor's degree in Computer Science fromDuke University in 1984, and a PhDin Computer Science fromRutgersUniversity in 1990. From 1990 to 2000 Dr. Cohen worked atAT&TBell Labs andlaterAT&T Labs-Research,and from April 2000 to May 2002 Dr. Cohen workedatWhizbang Labs, a companyspecializing in extracting information from the web. From 2002 to2018, Dr. Cohen worked at Carnegie Mellon University intheMachine Learning Department,with a joint appointment intheLanguage TechnologyInstitute.

Dr. Cohen is a past president oftheInternational MachineLearning Society. In the past he has also served as an actioneditor for thetheAIand Machine Learning series of books publishedbyMorgan Claypool, forthejournalMachineLearning, thejournalArtificialIntelligence, theJournal ofMachine Learning Research, andtheJournal of ArtificialIntelligence Research. He was General Chair forthe2008 InternationalMachine Learning Conference, held July 6-9 attheUniversity ofHelsinki,inFinland;Program Co-Chair ofthe2006International Machine Learning Conference; and Co-Chair ofthe1994International Machine Learning Conference. Dr. Cohen was also theco-Chair for the3rdInt'l AAAI Conference on Weblogs and Social Media, which was heldMay 17-20, 2009 in San Jose, and was the co-Program Chair forthe4rd Int'l AAAIConference on Weblogs and Social Media. He isaAAAIFellow, and was a winner of the 2008theSIGMOD"Test of Time" Award for the most influential SIGMOD paper of1998, the2014 SIGIR"Test of Time" Award for the most influential SIGIR paper of2002-2004, and the 2023 Semantic Web ScienceAssociation'sTen-YearAward for the most influential paper of the ISWC-2013 conference.

Dr. Cohen's research interests include include question answering,machine learning for NLP tasks, and neuro-symbolic reasoning, and hehas a long-standing interest in statistical relational learning. Heholds seven patents related to learning, discovery, informationretrieval, and data integration, and is the author of more than 300publications.

Announcements and FAQs

Teaching

For nowmy old course notes and lectures are avilable through CMU.

Software and demos

  • Enron email dataset(400Mb, once you get there) contains 800,000+ emails from 150 users+organized into 4700+ folders.
  • classify.tar.gz (0.4Mb) containsnine problems in which the goal is to classify short entity names.This data was used inJoins that Generalize: Text ClassificationUsing WHIRL (KDD-98).
  • match.tar.gz (0.7Mb) contains a suite oflabeled entity-name matching and clustering problems(i.e. problems for which the correct matches/clusters are provided),in a single consistent format. In most cases WHIRL's performance isgiven as a benchmark. (These are also distributed in theRIDDLERepository. Extraction-oriented versions of some of this data areavailable on theRISERepository. (I.e., represented as a problem of extracting data froma website, rather than matching two datasets).)
  • whirl-bench.tgz (1.1Mb) contains somemore WHIRL-format entity name matching problems.

    Talks and presentations

    Publications