|
| 1 | +{ |
| 2 | +"cells": [ |
| 3 | + { |
| 4 | +"cell_type":"markdown", |
| 5 | +"metadata": {}, |
| 6 | +"source": [ |
| 7 | +"# Parse a RSS feed\n", |
| 8 | +"In this Python snippte we use the feedparser package to parse a RSS feed from 'Medium'.\n", |
| 9 | +"- https://medium.com/feed/tag/machine-learning" |
| 10 | + ] |
| 11 | + }, |
| 12 | + { |
| 13 | +"cell_type":"markdown", |
| 14 | +"metadata": {}, |
| 15 | +"source": [ |
| 16 | +"Let's get the RSS feed and parse the content." |
| 17 | + ] |
| 18 | + }, |
| 19 | + { |
| 20 | +"cell_type":"code", |
| 21 | +"execution_count":107, |
| 22 | +"metadata": { |
| 23 | +"collapsed":false |
| 24 | + }, |
| 25 | +"outputs": [], |
| 26 | +"source": [ |
| 27 | +"import feedparser\n", |
| 28 | +"\n", |
| 29 | +"url = 'https://medium.com/feed/tag/machine-learning'\n", |
| 30 | +"\n", |
| 31 | +"resp = feedparser.parse(url)" |
| 32 | + ] |
| 33 | + }, |
| 34 | + { |
| 35 | +"cell_type":"markdown", |
| 36 | +"metadata": {}, |
| 37 | +"source": [ |
| 38 | +"We need a function to extract _\"urls\"_ from the text. One of the URL will linkt to the orginal article." |
| 39 | + ] |
| 40 | + }, |
| 41 | + { |
| 42 | +"cell_type":"code", |
| 43 | +"execution_count":108, |
| 44 | +"metadata": { |
| 45 | +"collapsed":false |
| 46 | + }, |
| 47 | +"outputs": [], |
| 48 | +"source": [ |
| 49 | +"import re\n", |
| 50 | +"\n", |
| 51 | +"def extract_url(text):\n", |
| 52 | +" urls = re.findall(r'href=[\\'\"]?([^\\'\" >]+)', text)\n", |
| 53 | +" return urls" |
| 54 | + ] |
| 55 | + }, |
| 56 | + { |
| 57 | +"cell_type":"markdown", |
| 58 | +"metadata": {}, |
| 59 | +"source": [ |
| 60 | +"Now we iterate over all entries in our feed and display the title and urls which were found in the summary." |
| 61 | + ] |
| 62 | + }, |
| 63 | + { |
| 64 | +"cell_type":"code", |
| 65 | +"execution_count":109, |
| 66 | +"metadata": { |
| 67 | +"collapsed":false |
| 68 | + }, |
| 69 | +"outputs": [ |
| 70 | + { |
| 71 | +"name":"stdout", |
| 72 | +"output_type":"stream", |
| 73 | +"text": [ |
| 74 | +"Computer Vision: Why is This So Difficult?\n", |
| 75 | +"--> https://medium.com/@anishchopra/computer-vision-why-is-this-so-difficult-2b4f22e94efe?source=rss------machine_learning-5\n", |
| 76 | +"\n", |
| 77 | +"‘messaging first’, and the era of just-in-time user experiences\n", |
| 78 | +"--> https://medium.com/@jdevados/messaging-first-and-the-era-of-just-in-time-user-experiences-256f751e35e2?source=rss------machine_learning-5\n", |
| 79 | +"\n", |
| 80 | +"Qu’est ce que le Machine Learning ?\n", |
| 81 | +"--> https://medium.com/@redouanechafi/data-science-0-0-quest-ce-que-le-machine-learning-fde2b3c5f19f?source=rss------machine_learning-5\n", |
| 82 | +"\n", |
| 83 | +"$0.53\n", |
| 84 | +"--> https://medium.com/@vw4motion/0-53-32f819753a47?source=rss------machine_learning-5\n", |
| 85 | +"\n", |
| 86 | +"The symbolic approach to computerization in healthcare PART 1\n", |
| 87 | +"--> https://medium.com/@CheckDoctor/the-symbolic-approach-to-computerization-in-healthcare-part-1-45f9ae32c517?source=rss------machine_learning-5\n", |
| 88 | +"\n", |
| 89 | +"Cognitive Computing and the Global Building Industry\n", |
| 90 | +"--> https://medium.com/cognitivebusiness/cognitive-computing-and-the-global-building-industry-1172e375738d?source=rss------machine_learning-5\n", |
| 91 | +"\n", |
| 92 | +"News — At The Edge — 12/17\n", |
| 93 | +"--> https://medium.com/a-passion-to-evolve/news-at-the-edge-12-17-7d6d780e948e?source=rss------machine_learning-5\n", |
| 94 | +"\n", |
| 95 | +"IBM Watson ……. Modern day Genghis khan\n", |
| 96 | +"--> https://medium.com/@Cayno_Sadler/ibm-watson-modern-day-genghis-khan-add9b1a58c0?source=rss------machine_learning-5\n", |
| 97 | +"\n", |
| 98 | +"Machine Learning progress update\n", |
| 99 | +"--> https://medium.com/@laimis/one-month-into-machine-learning-69c041cf2b5a?source=rss------machine_learning-5\n", |
| 100 | +"\n", |
| 101 | +"Don’t replace your old NVR! Enhance it with oZone!\n", |
| 102 | +"--> https://medium.com/ozone-security/dont-replace-your-old-nvr-enhance-it-with-ozone-14ab2ebd007d?source=rss------machine_learning-5\n", |
| 103 | +"\n" |
| 104 | + ] |
| 105 | + } |
| 106 | + ], |
| 107 | +"source": [ |
| 108 | +"for r in resp['entries']:\n", |
| 109 | +" print(r['title'])\n", |
| 110 | +" urls = extract_url(r['summary'])\n", |
| 111 | +" if urls:\n", |
| 112 | +" print('-->', urls[0], '\\n')\n", |
| 113 | +"" |
| 114 | + ] |
| 115 | + }, |
| 116 | + { |
| 117 | +"cell_type":"code", |
| 118 | +"execution_count":null, |
| 119 | +"metadata": { |
| 120 | +"collapsed":true |
| 121 | + }, |
| 122 | +"outputs": [], |
| 123 | +"source": [] |
| 124 | + } |
| 125 | + ], |
| 126 | +"metadata": { |
| 127 | +"kernelspec": { |
| 128 | +"display_name":"Python 3", |
| 129 | +"language":"python", |
| 130 | +"name":"python3" |
| 131 | + }, |
| 132 | +"language_info": { |
| 133 | +"codemirror_mode": { |
| 134 | +"name":"ipython", |
| 135 | +"version":3 |
| 136 | + }, |
| 137 | +"file_extension":".py", |
| 138 | +"mimetype":"text/x-python", |
| 139 | +"name":"python", |
| 140 | +"nbconvert_exporter":"python", |
| 141 | +"pygments_lexer":"ipython3", |
| 142 | +"version":"3.5.2" |
| 143 | + } |
| 144 | + }, |
| 145 | +"nbformat":4, |
| 146 | +"nbformat_minor":2 |
| 147 | +} |