release 2016.06.30

[meta] Add support for pladform embeds
[pladform] Improve embed detection
2026-01-24 00:00:10 -05:00 · 2016-06-30 23:56:55 +07:00 · 2016-06-30 23:20:44 +07:00 · 2016-06-30 23:19:29 +07:00 · 2016-06-30 23:06:13 +07:00 · 2016-06-30 23:04:18 +07:00
21 changed files with 550 additions and 108 deletions
--- a/.github/ISSUE_TEMPLATE.md
+++ b/.github/ISSUE_TEMPLATE.md
@@ -6,8 +6,8 @@

 ---

-### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.26*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.26**
+### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.06.30*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
+- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.06.30**

 ### Before submitting an *issue* make sure you have:
 - [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
 [debug] User config: []
 [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
-[debug] youtube-dl version 2016.06.26
+[debug] youtube-dl version 2016.06.30
 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
 [debug] Proxy map: {}
--- a/docs/supportedsites.md
+++ b/docs/supportedsites.md
@@ -45,7 +45,6 @@
 - **archive.org**: archive.org videos
 - **ARD**
 - **ARD:mediathek**
- - **ARD:mediathek**: Saarländischer Rundfunk
 - **arte.tv**
 - **arte.tv:+7**
 - **arte.tv:cinema**
@@ -273,6 +272,7 @@
 - **Helsinki**: helsinki.fi
 - **HentaiStigma**
 - **HistoricFilms**
+ - **history:topic**: History.com Topic
 - **hitbox**
 - **hitbox:live**
 - **HornBunny**
@@ -359,6 +359,7 @@
 - **MatchTV**
 - **MDR**: MDR.DE and KiKA
 - **media.ccc.de**
+ - **META**
 - **metacafe**
 - **Metacritic**
 - **Mgoon**
@@ -588,8 +589,10 @@
 - **Shared**: shared.sx and vivo.sx
 - **ShareSix**
 - **Sina**
+ - **SixPlay**
+ - **skynewsarabia:article**
 - **skynewsarabia:video**
- - **skynewsarabia:video**
+ - **SkySports**
 - **Slideshare**
 - **Slutload**
 - **smotri**: Smotri.com
@@ -621,6 +624,7 @@
 - **SportBoxEmbed**
 - **SportDeutschland**
 - **Sportschau**
+ - **sr:mediathek**: Saarländischer Rundfunk
 - **SRGSSR**
 - **SRGSSRPlay**: srf.ch, rts.ch, rsi.ch, rtr.ch and swissinfo.ch play sites
 - **SSA**
@@ -721,6 +725,7 @@
 - **UDNEmbed**: 聯合影音
 - **Unistra**
 - **Urort**: NRK P3 Urørt
+ - **URPlay**
 - **USAToday**
 - **ustream**
 - **ustream:channel**
--- a/test/test_all_urls.py
+++ b/test/test_all_urls.py
@@ -6,6 +6,7 @@ from __future__ import unicode_literals
 import os
 import sys
 import unittest
+import collections
 sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))


@@ -130,6 +131,15 @@ class TestAllURLsMatching(unittest.TestCase):
            'https://screen.yahoo.com/smartwatches-latest-wearable-gadgets-163745379-cbs.html',
            ['Yahoo'])

+    def test_no_duplicated_ie_names(self):
+        name_accu = collections.defaultdict(list)
+        for ie in self.ies:
+            name_accu[ie.IE_NAME.lower()].append(type(ie).__name__)
+        for (ie_name, ie_list) in name_accu.items():
+            self.assertEqual(
+                len(ie_list), 1,
+                'Multiple extractors with the same IE_NAME "%s" (%s)' % (ie_name, ', '.join(ie_list)))
+

 if __name__ == '__main__':
    unittest.main()
--- a/youtube_dl/extractor/aenetworks.py
+++ b/youtube_dl/extractor/aenetworks.py
@@ -7,18 +7,123 @@ from ..utils import (
    smuggle_url,
    update_url_query,
    unescapeHTML,
+    extract_attributes,
+    get_element_by_attribute,
+)
+from ..compat import (
+    compat_urlparse,
 )


-class AENetworksIE(InfoExtractor):
+class AENetworksBaseIE(InfoExtractor):
+    def theplatform_url_result(self, theplatform_url, video_id, query):
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': smuggle_url(
+                update_url_query(theplatform_url, query),
+                {
+                    'sig': {
+                        'key': 'crazyjava',
+                        'secret': 's3cr3t'
+                    },
+                    'force_smil_url': True
+                }),
+            'ie_key': 'ThePlatform',
+        }
+
+
+class AENetworksIE(AENetworksBaseIE):
    IE_NAME = 'aenetworks'
    IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
-    _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
+    _VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:shows/(?P<show_path>[^/]+(?:/[^/]+){0,2})|movies/(?P<movie_display_id>[^/]+)/full-movie)'
+    _TESTS = [{
+        'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
+        'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
+        'info_dict': {
+            'id': '22253814',
+            'ext': 'mp4',
+            'title': 'Winter Is Coming',
+            'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
+            'timestamp': 1338306241,
+            'upload_date': '20120529',
+            'uploader': 'AENE-NEW',
+        },
+        'add_ie': ['ThePlatform'],
+    }, {
+        'url': 'http://www.history.com/shows/ancient-aliens/season-1',
+        'info_dict': {
+            'id': '71889446852',
+        },
+        'playlist_mincount': 5,
+    }, {
+        'url': 'http://www.mylifetime.com/shows/atlanta-plastic',
+        'info_dict': {
+            'id': 'SERIES4317',
+            'title': 'Atlanta Plastic',
+        },
+        'playlist_mincount': 2,
+    }, {
+        'url': 'http://www.aetv.com/shows/duck-dynasty/season-9/episode-1',
+        'only_matching': True
+    }, {
+        'url': 'http://www.fyi.tv/shows/tiny-house-nation/season-1/episode-8',
+        'only_matching': True
+    }, {
+        'url': 'http://www.mylifetime.com/shows/project-runway-junior/season-1/episode-6',
+        'only_matching': True
+    }, {
+        'url': 'http://www.mylifetime.com/movies/center-stage-on-pointe/full-movie',
+        'only_matching': True
+    }]

+    def _real_extract(self, url):
+        show_path, movie_display_id = re.match(self._VALID_URL, url).groups()
+        display_id = show_path or movie_display_id
+        webpage = self._download_webpage(url, display_id)
+        if show_path:
+            url_parts = show_path.split('/')
+            url_parts_len = len(url_parts)
+            if url_parts_len == 1:
+                entries = []
+                for season_url_path in re.findall(r'(?s)<li[^>]+data-href="(/shows/%s/season-\d+)"' % url_parts[0], webpage):
+                    entries.append(self.url_result(
+                        compat_urlparse.urljoin(url, season_url_path), 'AENetworks'))
+                return self.playlist_result(
+                    entries, self._html_search_meta('aetn:SeriesId', webpage),
+                    self._html_search_meta('aetn:SeriesTitle', webpage))
+            elif url_parts_len == 2:
+                entries = []
+                for episode_item in re.findall(r'(?s)<div[^>]+class="[^"]*episode-item[^"]*"[^>]*>', webpage):
+                    episode_attributes = extract_attributes(episode_item)
+                    episode_url = compat_urlparse.urljoin(
+                        url, episode_attributes['data-canonical'])
+                    entries.append(self.url_result(
+                        episode_url, 'AENetworks',
+                        episode_attributes['data-videoid']))
+                return self.playlist_result(
+                    entries, self._html_search_meta('aetn:SeasonId', webpage))
+        video_id = self._html_search_meta('aetn:VideoID', webpage)
+        media_url = self._search_regex(
+            r"media_url\s*=\s*'([^']+)'", webpage, 'video url')
+
+        info = self._search_json_ld(webpage, video_id, fatal=False)
+        info.update(self.theplatform_url_result(
+            media_url, video_id, {
+                'mbr': 'true',
+                'assetTypes': 'medium_video_s3'
+            }))
+        return info
+
+
+class HistoryTopicIE(AENetworksBaseIE):
+    IE_NAME = 'history:topic'
+    IE_DESC = 'History.com Topic'
+    _VALID_URL = r'https?://(?:www\.)?history\.com/topics/(?:[^/]+/)?(?P<topic_id>[^/]+)/videos(?:/(?P<video_display_id>[^/?#]+))?'
    _TESTS = [{
        'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
        'info_dict': {
-            'id': 'g12m5Gyt3fdR',
+            'id': '40700995724',
            'ext': 'mp4',
            'title': "Bet You Didn't Know: Valentine's Day",
            'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
@@ -31,57 +136,39 @@ class AENetworksIE(InfoExtractor):
            'skip_download': True,
        },
        'add_ie': ['ThePlatform'],
-        'expected_warnings': ['JSON-LD'],
    }, {
-        'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
-        'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
-        'info_dict': {
-            'id': 'eg47EERs_JsZ',
-            'ext': 'mp4',
-            'title': 'Winter Is Coming',
-            'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
-            'timestamp': 1338306241,
-            'upload_date': '20120529',
-            'uploader': 'AENE-NEW',
+        'url': 'http://www.history.com/topics/world-war-i/world-war-i-history/videos',
+        'info_dict':
+        {
+            'id': 'world-war-i-history',
+            'title': 'World War I History',
        },
-        'add_ie': ['ThePlatform'],
+        'playlist_mincount': 24,
    }, {
-        'url': 'http://www.aetv.com/shows/duck-dynasty/video/inlawful-entry',
-        'only_matching': True
-    }, {
-        'url': 'http://www.fyi.tv/shows/tiny-house-nation/videos/207-sq-ft-minnesota-prairie-cottage',
-        'only_matching': True
-    }, {
-        'url': 'http://www.mylifetime.com/shows/project-runway-junior/video/season-1/episode-6/superstar-clients',
-        'only_matching': True
+        'url': 'http://www.history.com/topics/world-war-i-history/videos',
+        'only_matching': True,
    }]

    def _real_extract(self, url):
-        page_type, video_id = re.match(self._VALID_URL, url).groups()
+        topic_id, video_display_id = re.match(self._VALID_URL, url).groups()
+        if video_display_id:
+            webpage = self._download_webpage(url, video_display_id)
+            release_url, video_id = re.search(r"_videoPlayer.play\('([^']+)'\s*,\s*'[^']+'\s*,\s*'(\d+)'\)", webpage).groups()
+            release_url = unescapeHTML(release_url)

-        webpage = self._download_webpage(url, video_id)
-
-        video_url_re = [
-            r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
-            r"media_url\s*=\s*'([^']+)'"
-        ]
-        video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
-        query = {'mbr': 'true'}
-        if page_type == 'shows':
-            query['assetTypes'] = 'medium_video_s3'
-        if 'switch=hds' in video_url:
-            query['switch'] = 'hls'
-
-        info = self._search_json_ld(webpage, video_id, fatal=False)
-        info.update({
-            '_type': 'url_transparent',
-            'url': smuggle_url(
-                update_url_query(video_url, query),
-                {
-                    'sig': {
-                        'key': 'crazyjava',
-                        'secret': 's3cr3t'},
-                    'force_smil_url': True
-                }),
-        })
-        return info
+            return self.theplatform_url_result(
+                release_url, video_id, {
+                    'mbr': 'true',
+                    'switch': 'hls'
+                })
+        else:
+            webpage = self._download_webpage(url, topic_id)
+            entries = []
+            for episode_item in re.findall(r'<a.+?data-release-url="[^"]+"[^>]*>', webpage):
+                video_attributes = extract_attributes(episode_item)
+                entries.append(self.theplatform_url_result(
+                    video_attributes['data-release-url'], video_attributes['data-id'], {
+                        'mbr': 'true',
+                        'switch': 'hls'
+                    }))
+            return self.playlist_result(entries, topic_id, get_element_by_attribute('class', 'show-title', webpage))
--- a/youtube_dl/extractor/arte.py
+++ b/youtube_dl/extractor/arte.py
@@ -419,6 +419,7 @@ class ArteTVPlaylistIE(ArteTVBaseIE):
        'info_dict': {
            'id': 'PL-013263',
            'title': 'Areva & Uramin',
+            'description': 'md5:a1dc0312ce357c262259139cfd48c9bf',
        },
        'playlist_mincount': 6,
    }, {
--- a/youtube_dl/extractor/eagleplatform.py
+++ b/youtube_dl/extractor/eagleplatform.py
@@ -50,6 +50,14 @@ class EaglePlatformIE(InfoExtractor):
        'skip': 'Georestricted',
    }]

+    @staticmethod
+    def _extract_url(webpage):
+        mobj = re.search(
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//.+?\.media\.eagleplatform\.com/index/player\?.+?)\1',
+            webpage)
+        if mobj is not None:
+            return mobj.group('url')
+
    @staticmethod
    def _handle_error(response):
        status = int_or_none(response.get('status', 200))
--- a/youtube_dl/extractor/extractors.py
+++ b/youtube_dl/extractor/extractors.py
@@ -20,7 +20,10 @@ from .adobetv import (
    AdobeTVVideoIE,
 )
 from .adultswim import AdultSwimIE
-from .aenetworks import AENetworksIE
+from .aenetworks import (
+    AENetworksIE,
+    HistoryTopicIE,
+)
 from .afreecatv import AfreecaTVIE
 from .aftonbladet import AftonbladetIE
 from .airmozilla import AirMozillaIE
@@ -422,6 +425,7 @@ from .makerschannel import MakersChannelIE
 from .makertv import MakerTVIE
 from .matchtv import MatchTVIE
 from .mdr import MDRIE
+from .meta import METAIE
 from .metacafe import MetacafeIE
 from .metacritic import MetacriticIE
 from .mgoon import MgoonIE
@@ -706,10 +710,12 @@ from .shahid import ShahidIE
 from .shared import SharedIE
 from .sharesix import ShareSixIE
 from .sina import SinaIE
+from .sixplay import SixPlayIE
 from .skynewsarabia import (
    SkyNewsArabiaIE,
    SkyNewsArabiaArticleIE,
 )
+from .skysports import SkySportsIE
 from .slideshare import SlideshareIE
 from .slutload import SlutloadIE
 from .smotri import (
@@ -891,6 +897,7 @@ from .udn import UDNEmbedIE
 from .digiteka import DigitekaIE
 from .unistra import UnistraIE
 from .urort import UrortIE
+from .urplay import URPlayIE
 from .usatoday import USATodayIE
 from .ustream import UstreamIE, UstreamChannelIE
 from .ustudio import (
--- a/youtube_dl/extractor/generic.py
+++ b/youtube_dl/extractor/generic.py
@@ -64,6 +64,8 @@ from .liveleak import LiveLeakIE
 from .threeqsdn import ThreeQSDNIE
 from .theplatform import ThePlatformIE
 from .vessel import VesselIE
+from .kaltura import KalturaIE
+from .eagleplatform import EaglePlatformIE


 class GenericIE(InfoExtractor):
@@ -920,6 +922,24 @@ class GenericIE(InfoExtractor):
            },
            'add_ie': ['Kaltura'],
        },
+        {
+            # Kaltura embedded via quoted entry_id
+            'url': 'https://www.oreilly.com/ideas/my-cloud-makes-pretty-pictures',
+            'info_dict': {
+                'id': '0_utuok90b',
+                'ext': 'mp4',
+                'title': '06_matthew_brender_raj_dutt',
+                'timestamp': 1466638791,
+                'upload_date': '20160622',
+            },
+            'add_ie': ['Kaltura'],
+            'expected_warnings': [
+                'Could not send HEAD request'
+            ],
+            'params': {
+                'skip_download': True,
+            }
+        },
        # Eagle.Platform embed (generic URL)
        {
            'url': 'http://lenta.ru/news/2015/03/06/navalny/',
@@ -1225,6 +1245,22 @@ class GenericIE(InfoExtractor):
                'uploader': 'www.hudl.com',
            },
        },
+        # twitter:player embed
+        {
+            'url': 'http://www.theatlantic.com/video/index/484130/what-do-black-holes-sound-like/',
+            'md5': 'a3e0df96369831de324f0778e126653c',
+            'info_dict': {
+                'id': '4909620399001',
+                'ext': 'mp4',
+                'title': 'What Do Black Holes Sound Like?',
+                'description': 'what do black holes sound like',
+                'upload_date': '20160524',
+                'uploader_id': '29913724001',
+                'timestamp': 1464107587,
+                'uploader': 'TheAtlantic',
+            },
+            'add_ie': ['BrightcoveLegacy'],
+        }
    ]

    def report_following_redirect(self, new_url):
@@ -1908,18 +1944,14 @@ class GenericIE(InfoExtractor):
            return self.url_result(mobj.group('url'), 'Zapiks')

        # Look for Kaltura embeds
-        mobj = (re.search(r"(?s)kWidget\.(?:thumb)?[Ee]mbed\(\{.*?(?P<q1>['\"])wid(?P=q1)\s*:\s*(?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?(?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*(?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),", webpage) or
-                re.search(r'(?s)(?P<q1>["\'])(?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?(?P=q1).*?entry_?[Ii]d\s*:\s*(?P<q2>["\'])(?P<id>.+?)(?P=q2)', webpage))
-        if mobj is not None:
-            return self.url_result(smuggle_url(
-                'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict(),
-                {'source_url': url}), 'Kaltura')
+        kaltura_url = KalturaIE._extract_url(webpage)
+        if kaltura_url:
+            return self.url_result(smuggle_url(kaltura_url, {'source_url': url}), KalturaIE.ie_key())

        # Look for Eagle.Platform embeds
-        mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>https?://.+?\.media\.eagleplatform\.com/index/player\?.+?)"', webpage)
-        if mobj is not None:
-            return self.url_result(mobj.group('url'), 'EaglePlatform')
+        eagleplatform_url = EaglePlatformIE._extract_url(webpage)
+        if eagleplatform_url:
+            return self.url_result(eagleplatform_url, EaglePlatformIE.ie_key())

        # Look for ClipYou (uses Eagle.Platform) embeds
        mobj = re.search(
@@ -2065,6 +2097,11 @@ class GenericIE(InfoExtractor):
                'uploader': video_uploader,
            }

+        # https://dev.twitter.com/cards/types/player#On_twitter.com_via_desktop_browser
+        embed_url = self._html_search_meta('twitter:player', webpage, default=None)
+        if embed_url:
+            return self.url_result(embed_url)
+
        def check_video(vurl):
            if YoutubeIE.suitable(vurl):
                return True
--- a/youtube_dl/extractor/kaltura.py
+++ b/youtube_dl/extractor/kaltura.py
@@ -64,6 +64,32 @@ class KalturaIE(InfoExtractor):
        }
    ]

+    @staticmethod
+    def _extract_url(webpage):
+        mobj = (
+            re.search(
+                r"""(?xs)
+                    kWidget\.(?:thumb)?[Ee]mbed\(
+                    \{.*?
+                        (?P<q1>['\"])wid(?P=q1)\s*:\s*
+                        (?P<q2>['\"])_?(?P<partner_id>[^'\"]+)(?P=q2),.*?
+                        (?P<q3>['\"])entry_?[Ii]d(?P=q3)\s*:\s*
+                        (?P<q4>['\"])(?P<id>[^'\"]+)(?P=q4),
+                """, webpage) or
+            re.search(
+                r'''(?xs)
+                    (?P<q1>["\'])
+                        (?:https?:)?//cdnapi(?:sec)?\.kaltura\.com/.*?(?:p|partner_id)/(?P<partner_id>\d+).*?
+                    (?P=q1).*?
+                    (?:
+                        entry_?[Ii]d|
+                        (?P<q2>["\'])entry_?[Ii]d(?P=q2)
+                    )\s*:\s*
+                    (?P<q3>["\'])(?P<id>.+?)(?P=q3)
+                ''', webpage))
+        if mobj:
+            return 'kaltura:%(partner_id)s:%(id)s' % mobj.groupdict()
+
    def _kaltura_api_call(self, video_id, actions, *args, **kwargs):
        params = actions[0]
        if len(actions) > 1:
--- a/youtube_dl/extractor/m6.py
+++ b/youtube_dl/extractor/m6.py
@@ -1,8 +1,6 @@
 # encoding: utf-8
 from __future__ import unicode_literals

-import re
-
 from .common import InfoExtractor


@@ -23,34 +21,5 @@ class M6IE(InfoExtractor):
    }

    def _real_extract(self, url):
-        mobj = re.match(self._VALID_URL, url)
-        video_id = mobj.group('id')
-
-        rss = self._download_xml('http://ws.m6.fr/v1/video/info/m6/bonus/%s' % video_id, video_id,
-                                 'Downloading video RSS')
-
-        title = rss.find('./channel/item/title').text
-        description = rss.find('./channel/item/description').text
-        thumbnail = rss.find('./channel/item/visuel_clip_big').text
-        duration = int(rss.find('./channel/item/duration').text)
-        view_count = int(rss.find('./channel/item/nombre_vues').text)
-
-        formats = []
-        for format_id in ['lq', 'sd', 'hq', 'hd']:
-            video_url = rss.find('./channel/item/url_video_%s' % format_id)
-            if video_url is None:
-                continue
-            formats.append({
-                'url': video_url.text,
-                'format_id': format_id,
-            })
-
-        return {
-            'id': video_id,
-            'title': title,
-            'description': description,
-            'thumbnail': thumbnail,
-            'duration': duration,
-            'view_count': view_count,
-            'formats': formats,
-        }
+        video_id = self._match_id(url)
+        return self.url_result('6play:%s' % video_id, 'SixPlay', video_id)
--- a/youtube_dl/extractor/meta.py
+++ b/youtube_dl/extractor/meta.py
@@ -0,0 +1,72 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from .pladform import PladformIE
+from ..utils import (
+    unescapeHTML,
+    int_or_none,
+    ExtractorError,
+)
+
+
+class METAIE(InfoExtractor):
+    _VALID_URL = r'https?://video\.meta\.ua/(?:iframe/)?(?P<id>[0-9]+)'
+    _TESTS = [{
+        'url': 'http://video.meta.ua/5502115.video',
+        'md5': '71b6f3ee274bef16f1ab410f7f56b476',
+        'info_dict': {
+            'id': '5502115',
+            'ext': 'mp4',
+            'title': 'Sony Xperia Z camera test [HQ]',
+            'description': 'Xperia Z shoots video in FullHD HDR.',
+            'uploader_id': 'nomobile',
+            'uploader': 'CHЁZA.TV',
+            'upload_date': '20130211',
+        },
+        'add_ie': ['Youtube'],
+    }, {
+        'url': 'http://video.meta.ua/iframe/5502115',
+        'only_matching': True,
+    }, {
+        # pladform embed
+        'url': 'http://video.meta.ua/7121015.video',
+        'only_matching': True,
+    }]
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        st_html5 = self._search_regex(
+            r"st_html5\s*=\s*'#([^']+)'", webpage, 'uppod html5 st', default=None)
+
+        if st_html5:
+            json_str = ''
+            for i in range(0, len(st_html5), 3):
+                json_str += '&#x0%s;' % st_html5[i:i + 3]
+            uppod_data = self._parse_json(unescapeHTML(json_str), video_id)
+            error = uppod_data.get('customnotfound')
+            if error:
+                raise ExtractorError('%s said: %s' % (self.IE_NAME, error), expected=True)
+
+            video_url = uppod_data['file']
+            info = {
+                'id': video_id,
+                'url': video_url,
+                'title': uppod_data.get('comment') or self._og_search_title(webpage),
+                'description': self._og_search_description(webpage, default=None),
+                'thumbnail': uppod_data.get('poster') or self._og_search_thumbnail(webpage),
+                'duration': int_or_none(self._og_search_property(
+                    'video:duration', webpage, default=None)),
+            }
+            if 'youtube.com/' in video_url:
+                info.update({
+                    '_type': 'url_transparent',
+                    'ie_key': 'Youtube',
+                })
+            return info
+
+        pladform_url = PladformIE._extract_url(webpage)
+        if pladform_url:
+            return self.url_result(pladform_url)
--- a/youtube_dl/extractor/msn.py
+++ b/youtube_dl/extractor/msn.py
@@ -38,6 +38,9 @@ class MSNIE(InfoExtractor):
        # geo restricted
        'url': 'http://www.msn.com/en-ae/foodanddrink/joinourtable/the-first-fart-makes-you-laugh-the-last-fart-makes-you-cry/vp-AAhzIBU',
        'only_matching': True,
+    }, {
+        'url': 'http://www.msn.com/en-ae/entertainment/bollywood/watch-how-salman-khan-reacted-when-asked-if-he-would-apologize-for-his-‘raped-woman’-comment/vi-AAhvzW6',
+        'only_matching': True,
    }]

    def _real_extract(self, url):
--- a/youtube_dl/extractor/pbs.py
+++ b/youtube_dl/extractor/pbs.py
@@ -516,9 +516,14 @@ class PBSIE(InfoExtractor):
                # https://projects.pbs.org/confluence/display/coveapi/COVE+Video+Specifications
                if not bitrate or bitrate not in ('400k', '800k', '1200k', '2500k'):
                    continue
+                f_url = re.sub(r'\d+k|baseline', bitrate, http_url)
+                # This may produce invalid links sometimes (e.g.
+                # http://www.pbs.org/wgbh/frontline/film/suicide-plan)
+                if not self._is_valid_url(f_url, display_id, 'http-%s video' % bitrate):
+                    continue
                f = m3u8_format.copy()
                f.update({
-                    'url': re.sub(r'\d+k|baseline', bitrate, http_url),
+                    'url': f_url,
                    'format_id': m3u8_format['format_id'].replace('hls', 'http'),
                    'protocol': 'http',
                })
--- a/youtube_dl/extractor/pladform.py
+++ b/youtube_dl/extractor/pladform.py
@@ -49,7 +49,7 @@ class PladformIE(InfoExtractor):
    @staticmethod
    def _extract_url(webpage):
        mobj = re.search(
-            r'<iframe[^>]+src="(?P<url>(?:https?:)?//out\.pladform\.ru/player\?.+?)"', webpage)
+            r'<iframe[^>]+src=(["\'])(?P<url>(?:https?:)?//out\.pladform\.ru/player\?.+?)\1', webpage)
        if mobj:
            return mobj.group('url')

--- a/youtube_dl/extractor/sixplay.py
+++ b/youtube_dl/extractor/sixplay.py
@@ -0,0 +1,60 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+from ..utils import (
+    qualities,
+    int_or_none,
+)
+
+
+class SixPlayIE(InfoExtractor):
+    _VALID_URL = r'(?:6play:|https?://(?:www\.)?6play\.fr/.+?-c_)(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.6play.fr/jamel-et-ses-amis-au-marrakech-du-rire-p_1316/jamel-et-ses-amis-au-marrakech-du-rire-2015-c_11495320',
+        'md5': '42310bffe4ba3982db112b9cd3467328',
+        'info_dict': {
+            'id': '11495320',
+            'ext': 'mp4',
+            'title': 'Jamel et ses amis au Marrakech du rire 2015',
+            'description': 'md5:ba2149d5c321d5201b78070ee839d872',
+        },
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        clip_data = self._download_json(
+            'https://player.m6web.fr/v2/video/config/6play-auth/FR/%s.json' % video_id,
+            video_id)
+        video_data = clip_data['videoInfo']
+
+        preference = qualities(['lq', 'sd', 'hq', 'hd'])
+        formats = []
+        for source in clip_data['sources']:
+            source_type, source_url = source.get('type'), source.get('src')
+            if not source_url or source_type == 'hls/primetime':
+                continue
+            if source_type == 'application/vnd.apple.mpegURL':
+                formats.extend(self._extract_m3u8_formats(
+                    source_url, video_id, 'mp4', 'm3u8_native',
+                    m3u8_id='hls', fatal=False))
+                formats.extend(self._extract_f4m_formats(
+                    source_url.replace('.m3u8', '.f4m'),
+                    video_id, f4m_id='hds', fatal=False))
+            elif source_type == 'video/mp4':
+                quality = source.get('quality')
+                formats.append({
+                    'url': source_url,
+                    'format_id': quality,
+                    'preference': preference(quality),
+                })
+        self._sort_formats(formats)
+
+        return {
+            'id': video_id,
+            'title': video_data['title'].strip(),
+            'description': video_data.get('description'),
+            'duration': int_or_none(video_data.get('duration')),
+            'series': video_data.get('titlePgm'),
+            'formats': formats,
+        }
--- a/youtube_dl/extractor/skynewsarabia.py
+++ b/youtube_dl/extractor/skynewsarabia.py
@@ -67,7 +67,7 @@ class SkyNewsArabiaIE(SkyNewsArabiaBaseIE):


 class SkyNewsArabiaArticleIE(SkyNewsArabiaBaseIE):
-    IE_NAME = 'skynewsarabia:video'
+    IE_NAME = 'skynewsarabia:article'
    _VALID_URL = r'https?://(?:www\.)?skynewsarabia\.com/web/article/(?P<id>[0-9]+)'
    _TESTS = [{
        'url': 'http://www.skynewsarabia.com/web/article/794549/%D8%A7%D9%94%D8%AD%D8%AF%D8%A7%D8%AB-%D8%A7%D9%84%D8%B4%D8%B1%D9%82-%D8%A7%D9%84%D8%A7%D9%94%D9%88%D8%B3%D8%B7-%D8%AE%D8%B1%D9%8A%D8%B7%D8%A9-%D8%A7%D9%84%D8%A7%D9%94%D9%84%D8%B9%D8%A7%D8%A8-%D8%A7%D9%84%D8%B0%D9%83%D9%8A%D8%A9',
--- a/youtube_dl/extractor/skysports.py
+++ b/youtube_dl/extractor/skysports.py
@@ -0,0 +1,33 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class SkySportsIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?skysports\.com/watch/video/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://www.skysports.com/watch/video/10328419/bale-its-our-time-to-shine',
+        'md5': 'c44a1db29f27daf9a0003e010af82100',
+        'info_dict': {
+            'id': '10328419',
+            'ext': 'flv',
+            'title': 'Bale: Its our time to shine',
+            'description': 'md5:9fd1de3614d525f5addda32ac3c482c9',
+        },
+        'add_ie': ['Ooyala'],
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+        webpage = self._download_webpage(url, video_id)
+
+        return {
+            '_type': 'url_transparent',
+            'id': video_id,
+            'url': 'ooyala:%s' % self._search_regex(
+                r'data-video-id="([^"]+)"', webpage, 'ooyala id'),
+            'title': self._og_search_title(webpage),
+            'description': self._og_search_description(webpage),
+            'ie_key': 'Ooyala',
+        }
--- a/youtube_dl/extractor/srmediathek.py
+++ b/youtube_dl/extractor/srmediathek.py
@@ -9,6 +9,7 @@ from ..utils import (


 class SRMediathekIE(ARDMediathekIE):
+    IE_NAME = 'sr:mediathek'
    IE_DESC = 'Saarländischer Rundfunk'
    _VALID_URL = r'https?://sr-mediathek\.sr-online\.de/index\.php\?.*?&id=(?P<id>[0-9]+)'

--- a/youtube_dl/extractor/urplay.py
+++ b/youtube_dl/extractor/urplay.py
@@ -0,0 +1,67 @@
+# coding: utf-8
+from __future__ import unicode_literals
+
+from .common import InfoExtractor
+
+
+class URPlayIE(InfoExtractor):
+    _VALID_URL = r'https?://(?:www\.)?urplay\.se/program/(?P<id>[0-9]+)'
+    _TEST = {
+        'url': 'http://urplay.se/program/190031-tripp-trapp-trad-sovkudde',
+        'md5': '15ca67b63fd8fb320ac2bcd854bad7b6',
+        'info_dict': {
+            'id': '190031',
+            'ext': 'mp4',
+            'title': 'Tripp, Trapp, Träd : Sovkudde',
+            'description': 'md5:b86bffdae04a7e9379d1d7e5947df1d1',
+        }
+    }
+
+    def _real_extract(self, url):
+        video_id = self._match_id(url)
+
+        webpage = self._download_webpage(url, video_id)
+        urplayer_data = self._parse_json(self._search_regex(
+            r'urPlayer\.init\(({.+?})\);', webpage, 'urplayer data'), video_id)
+        host = self._download_json('http://streaming-loadbalancer.ur.se/loadbalancer.json', video_id)['redirect']
+
+        formats = []
+        for quality_attr, quality, preference in (('', 'sd', 0), ('_hd', 'hd', 1)):
+            file_rtmp = urplayer_data.get('file_rtmp' + quality_attr)
+            if file_rtmp:
+                formats.append({
+                    'url': 'rtmp://%s/urplay/mp4:%s' % (host, file_rtmp),
+                    'format_id': quality + '-rtmp',
+                    'ext': 'flv',
+                    'preference': preference,
+                })
+            file_http = urplayer_data.get('file_http' + quality_attr) or urplayer_data.get('file_http_sub' + quality_attr)
+            if file_http:
+                file_http_base_url = 'http://%s/%s' % (host, file_http)
+                formats.extend(self._extract_f4m_formats(
+                    file_http_base_url + 'manifest.f4m', video_id,
+                    preference, '%s-hds' % quality, fatal=False))
+                formats.extend(self._extract_m3u8_formats(
+                    file_http_base_url + 'playlist.m3u8', video_id, 'mp4',
+                    'm3u8_native', preference, '%s-hls' % quality, fatal=False))
+        self._sort_formats(formats)
+
+        subtitles = {}
+        for subtitle in urplayer_data.get('subtitles', []):
+            subtitle_url = subtitle.get('file')
+            kind = subtitle.get('kind')
+            if subtitle_url or kind and kind != 'captions':
+                continue
+            subtitles.setdefault(subtitle.get('label', 'Svenska'), []).append({
+                'url': subtitle_url,
+            })
+
+        return {
+            'id': video_id,
+            'title': urplayer_data['title'],
+            'description': self._og_search_description(webpage),
+            'thumbnail': urplayer_data.get('image'),
+            'series': urplayer_data.get('series_title'),
+            'subtitles': subtitles,
+            'formats': formats,
+        }
--- a/youtube_dl/extractor/vrt.py
+++ b/youtube_dl/extractor/vrt.py
@@ -25,7 +25,8 @@ class VRTIE(InfoExtractor):
                'timestamp': 1414271750.949,
                'upload_date': '20141025',
                'duration': 929,
-            }
+            },
+            'skip': 'HTTP Error 404: Not Found',
        },
        # sporza.be
        {
@@ -39,7 +40,8 @@ class VRTIE(InfoExtractor):
                'timestamp': 1413835980.560,
                'upload_date': '20141020',
                'duration': 3238,
-            }
+            },
+            'skip': 'HTTP Error 404: Not Found',
        },
        # cobra.be
        {
@@ -53,16 +55,39 @@ class VRTIE(InfoExtractor):
                'timestamp': 1413967500.494,
                'upload_date': '20141022',
                'duration': 661,
-            }
+            },
+            'skip': 'HTTP Error 404: Not Found',
        },
        {
            # YouTube video
            'url': 'http://deredactie.be/cm/vrtnieuws/videozone/nieuws/cultuurenmedia/1.2622957',
-            'only_matching': True,
+            'md5': 'b8b93da1df1cea6c8556255a796b7d61',
+            'info_dict': {
+                'id': 'Wji-BZ0oCwg',
+                'ext': 'mp4',
+                'title': 'ROGUE ONE: A STAR WARS STORY Official Teaser Trailer',
+                'description': 'md5:8e468944dce15567a786a67f74262583',
+                'uploader': 'Star Wars',
+                'uploader_id': 'starwars',
+                'upload_date': '20160407',
+            },
+            'add_ie': ['Youtube'],
        },
        {
            'url': 'http://cobra.canvas.be/cm/cobra/videozone/rubriek/film-videozone/1.2377055',
-            'only_matching': True,
+            'md5': '',
+            'info_dict': {
+                'id': '2377055',
+                'ext': 'mp4',
+                'title': 'Cafe Derby',
+                'description': 'Lenny Van Wesemael debuteert met de langspeelfilm Café Derby. Een waar gebeurd maar ook verzonnen verhaal.',
+                'upload_date': '20150626',
+                'timestamp': 1435305240.769,
+            },
+            'params': {
+                # m3u8 download
+                'skip_download': True,
+            }
        }
    ]

@@ -98,6 +123,32 @@ class VRTIE(InfoExtractor):
                formats.extend(self._extract_m3u8_formats(
                    src, video_id, 'mp4', entry_protocol='m3u8_native',
                    m3u8_id='hls', fatal=False))
+                formats.extend(self._extract_f4m_formats(
+                    src.replace('playlist.m3u8', 'manifest.f4m'),
+                    video_id, f4m_id='hds', fatal=False))
+                if 'data-video-geoblocking="true"' not in webpage:
+                    rtmp_formats = self._extract_smil_formats(
+                        src.replace('playlist.m3u8', 'jwplayer.smil'),
+                        video_id, fatal=False)
+                    formats.extend(rtmp_formats)
+                    for rtmp_format in rtmp_formats:
+                        rtmp_format_c = rtmp_format.copy()
+                        rtmp_format_c['url'] = '%s/%s' % (rtmp_format['url'], rtmp_format['play_path'])
+                        del rtmp_format_c['play_path']
+                        del rtmp_format_c['ext']
+                        http_format = rtmp_format_c.copy()
+                        http_format.update({
+                            'url': rtmp_format_c['url'].replace('rtmp://', 'http://').replace('vod.', 'download.').replace('/_definst_/', '/').replace('mp4:', ''),
+                            'format_id': rtmp_format['format_id'].replace('rtmp', 'http'),
+                            'protocol': 'http',
+                        })
+                        rtsp_format = rtmp_format_c.copy()
+                        rtsp_format.update({
+                            'url': rtsp_format['url'].replace('rtmp://', 'rtsp://'),
+                            'format_id': rtmp_format['format_id'].replace('rtmp', 'rtsp'),
+                            'protocol': 'rtsp',
+                        })
+                        formats.extend([http_format, rtsp_format])
            else:
                formats.extend(self._extract_f4m_formats(
                    '%s/manifest.f4m' % src, video_id, f4m_id='hds', fatal=False))
--- a/youtube_dl/version.py
+++ b/youtube_dl/version.py
@@ -1,3 +1,3 @@
 from __future__ import unicode_literals

-__version__ = '2016.06.26'
+__version__ = '2016.06.30'
Author	SHA1	Message	Date
Sergey M․	66a42309fa	release 2016.06.30	2016-06-30 23:56:55 +07:00
Sergey M․	fd94e2671a	[meta] Add support for pladform embeds	2016-06-30 23:20:44 +07:00
Sergey M․	8ff6697861	[pladform] Improve embed detection	2016-06-30 23:19:29 +07:00
Sergey M․	eafa643715	[meta] Make duration and description optional For iframe URLs	2016-06-30 23:06:13 +07:00
Sergey M․	049da7cb6c	[meta] Extend _VALID_URL	2016-06-30 23:04:18 +07:00
Remita Amine	7dbeee7e22	[generic] make twitter:player extraction non fatal	2016-06-30 14:11:55 +01:00
Remita Amine	93ad6c6bfa	[sixplay] Add new extractor(closes #2183 )	2016-06-30 13:50:49 +01:00
Remita Amine	329179073b	[generic] add generic support for twitter:player embeds	2016-06-30 12:01:30 +01:00
Remita Amine	4d86d2008e	[urplay] fix typo and check with flake8	2016-06-30 11:30:42 +01:00
Remita Amine	ab47b6e881	[theatlantic] Add new extractor(closes #6611 )	2016-06-30 04:08:56 +01:00
Remita Amine	df43389ade	[skysports] Add new extractor(closes #7066 )	2016-06-30 02:54:21 +01:00
Remita Amine	397b305cfe	[meta] Add new extractor(closes #8789 )	2016-06-30 00:21:03 +01:00
Remita Amine	e496fa50cd	[urplay] Add new extractor(closes #9332 )	2016-06-29 20:19:31 +01:00
Sergey M․	06a96da15b	[eagleplatform] Improve embed detection and extract in separate routine (Closes #9926 )	2016-06-29 23:01:34 +07:00
Remita Amine	70157c2c43	[aenetworks] add support for movie pages	2016-06-29 16:55:17 +01:00
Remita Amine	c58ed8563d	[aenetworks] extract history topic playlist title	2016-06-29 16:18:16 +01:00
Remita Amine	4c7821227c	[aenetworks:historytopic] fix topic video url	2016-06-29 16:03:32 +01:00
Remita Amine	42362fdb5e	[aenetworks] add support for show and season for A&E Network sites and History topics(closes #9816 )	2016-06-29 15:49:17 +01:00
Sergey M․	97124e572d	[arte:playlist] Fix test	2016-06-28 22:39:53 +07:00
Remita Amine	32616c14cc	[vrt] extract all formats	2016-06-28 14:02:03 +01:00
Sergey M․	8174d0fe95	release 2016.06.27	2016-06-27 23:09:39 +07:00
Sergey M․	8704778d95	[pbs] Check manually constructed http links (Closes #9921 )	2016-06-27 23:06:42 +07:00
Sergey M․	c287f2bc60	[extractor/generic] Use _extract_url for kaltura embeds (Closes #9922 )	2016-06-27 22:45:26 +07:00
Sergey M․	9ea5c04c0d	[kaltura] Add _extract_url with fixed regex	2016-06-27 22:44:17 +07:00
Sergey M․	fd7a7498a4	[test_all_urls] PEP 8 and change wording	2016-06-27 22:11:45 +07:00
Matthieu Muffato	e3a6747d8f	New test-case: extractor names are supposed to be unique @dstftw explained in https://github.com/rg3/youtube-dl/pull/9918#issuecomment-228625878 that extractor names are supposed to be unique. @dstftw has fixed the two offending extractors, and here I add a test to ensure this does not happen in the future.	2016-06-27 22:09:29 +07:00
Sergey M․	f41ffc00d1	[skynewsarabia:article] Clarify IE_NAME	2016-06-27 05:08:09 +07:00
Sergey M․	81fda15369	[sr:mediathek] Clarify IE_NAME	2016-06-27 05:07:12 +07:00
Sergey M․	427cd050a3	[extractor/generic] Improve kaltura embed detection (Closes #9911 )	2016-06-27 04:11:53 +07:00
Sergey M․	b0c200f1ec	[msn] Add test URL with non-alphanumeric characters	2016-06-26 22:03:36 +07:00