mirror of
https://gitlab.com/ytdl-org/youtube-dl.git
synced 2026-01-24 00:00:10 -05:00
Compare commits
76 Commits
2016.08.06
...
2016.08.12
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
b0081562d2 | ||
|
|
fff37cfd4f | ||
|
|
a3be69b7f0 | ||
|
|
0fd1b1624c | ||
|
|
367976d49f | ||
|
|
0aef0771f8 | ||
|
|
0c070681c5 | ||
|
|
30b25d382d | ||
|
|
e5f878c205 | ||
|
|
e2e84aed7e | ||
|
|
b1927f4e8a | ||
|
|
3b9323d96e | ||
|
|
7f832413d6 | ||
|
|
7f2ed47595 | ||
|
|
c3fa77bdef | ||
|
|
57ce8a6d08 | ||
|
|
69d8eeeec5 | ||
|
|
81c13222c6 | ||
|
|
b1ce2ba197 | ||
|
|
5c8411e968 | ||
|
|
cc9c8ce5df | ||
|
|
20ef4123b9 | ||
|
|
4e62d26aa2 | ||
|
|
b657816684 | ||
|
|
9778b3e7ee | ||
|
|
25dd58ca6a | ||
|
|
5e42f8a0ad | ||
|
|
1ad6b891b2 | ||
|
|
7aa589a5e1 | ||
|
|
065bc35489 | ||
|
|
3a380766d1 | ||
|
|
affaea0688 | ||
|
|
77426a087b | ||
|
|
8991844ea2 | ||
|
|
082395d0a0 | ||
|
|
e8ed7354e6 | ||
|
|
1e7f602e2a | ||
|
|
522f6c066d | ||
|
|
321b5e082a | ||
|
|
3711fa1eb2 | ||
|
|
395c74615c | ||
|
|
3dc240e8c6 | ||
|
|
a41a6c5094 | ||
|
|
d71207121d | ||
|
|
b1c6f21c74 | ||
|
|
412abb8760 | ||
|
|
f17d5f6d14 | ||
|
|
6bb801cfaf | ||
|
|
de02d1f4e9 | ||
|
|
e1f93a0a76 | ||
|
|
d21a661bb4 | ||
|
|
b2bd968f4b | ||
|
|
4a01befb34 | ||
|
|
845dfcdc40 | ||
|
|
d92cb46305 | ||
|
|
a8795327ca | ||
|
|
d34995a9e3 | ||
|
|
958849275f | ||
|
|
998f094452 | ||
|
|
aaa42cf0cf | ||
|
|
9fb64c04cd | ||
|
|
f9622868e7 | ||
|
|
37768f9242 | ||
|
|
a1aadd09a4 | ||
|
|
b47a75017b | ||
|
|
e37b54b140 | ||
|
|
c1decda58c | ||
|
|
d3f8e038fe | ||
|
|
ad152e2d95 | ||
|
|
b0af12154e | ||
|
|
d16b3c6677 | ||
|
|
c57244cdb1 | ||
|
|
a7e5f27412 | ||
|
|
089a40955c | ||
|
|
d73ebac100 | ||
|
|
e563c0d73b |
6
.github/ISSUE_TEMPLATE.md
vendored
6
.github/ISSUE_TEMPLATE.md
vendored
@@ -6,8 +6,8 @@
|
||||
|
||||
---
|
||||
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.06*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.06**
|
||||
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.08.12*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
|
||||
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.08.12**
|
||||
|
||||
### Before submitting an *issue* make sure you have:
|
||||
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
|
||||
@@ -35,7 +35,7 @@ $ youtube-dl -v <your command line>
|
||||
[debug] User config: []
|
||||
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
|
||||
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
|
||||
[debug] youtube-dl version 2016.08.06
|
||||
[debug] youtube-dl version 2016.08.12
|
||||
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
|
||||
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
|
||||
[debug] Proxy map: {}
|
||||
|
||||
2
AUTHORS
2
AUTHORS
@@ -179,3 +179,5 @@ Jakub Adam Wieczorek
|
||||
Aleksandar Topuzović
|
||||
Nehal Patel
|
||||
Rob van Bekkum
|
||||
Petr Zvoníček
|
||||
Pratyush Singh
|
||||
|
||||
@@ -46,7 +46,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
|
||||
|
||||
### Why are existing options not enough?
|
||||
|
||||
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
|
||||
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
|
||||
|
||||
### Is there enough context in your bug report?
|
||||
|
||||
|
||||
62
ChangeLog
62
ChangeLog
@@ -1,3 +1,65 @@
|
||||
version 2016.08.12
|
||||
|
||||
Core
|
||||
* Subtitles are now written as is. Newline conversions are disabled. (#10268)
|
||||
+ Recognize more formats in unified_timestamp
|
||||
|
||||
Extractors
|
||||
- [goldenmoustache] Remove extractor (#10298)
|
||||
* [drtuber] Improve title extraction
|
||||
* [drtuber] Make dislike count optional (#10297)
|
||||
* [chirbit] Fix extraction (#10296)
|
||||
* [francetvinfo] Relax URL regular expression
|
||||
* [rtlnl] Relax URL regular expression (#10282)
|
||||
* [formula1] Relax URL regular expression (#10283)
|
||||
* [wat] Improve extraction (#10281)
|
||||
* [ctsnews] Fix extraction
|
||||
|
||||
|
||||
version 2016.08.10
|
||||
|
||||
Core
|
||||
* Make --metadata-from-title non fatal when title does not match the pattern
|
||||
* Introduce options for randomized sleep before each download
|
||||
--min-sleep-interval and --max-sleep-interval (#9930)
|
||||
* Respect default in _search_json_ld
|
||||
|
||||
Extractors
|
||||
+ [uol] Add extractor for uol.com.br (#4263)
|
||||
* [rbmaradio] Fix extraction and extract all formats (#10242)
|
||||
+ [sonyliv] Add extractor for sonyliv.com (#10258)
|
||||
* [aparat] Fix extraction
|
||||
* [cwtv] Extract HTTP formats
|
||||
+ [rozhlas] Add extractor for prehravac.rozhlas.cz (#10253)
|
||||
* [kuwo:singer] Fix extraction
|
||||
|
||||
|
||||
version 2016.08.07
|
||||
|
||||
Core
|
||||
+ Add support for TV Parental Guidelines ratings in parse_age_limit
|
||||
+ Add decode_png (#9706)
|
||||
+ Add support for partOfTVSeries in JSON-LD
|
||||
* Lower master M3U8 manifest preference for better format sorting
|
||||
|
||||
Extractors
|
||||
+ [discoverygo] Add extractor (#10245)
|
||||
* [flipagram] Make JSON-LD extraction non fatal
|
||||
* [generic] Make JSON-LD extraction non fatal
|
||||
+ [bbc] Add support for morph embeds (#10239)
|
||||
* [tnaflixnetworkbase] Improve title extraction
|
||||
* [tnaflix] Fix metadata extraction (#10249)
|
||||
* [fox] Fix theplatform release URL query
|
||||
* [openload] Fix extraction (#9706)
|
||||
* [bbc] Skip duplicate manifest URLs
|
||||
* [bbc] Improve format code
|
||||
+ [bbc] Add support for DASH and F4M
|
||||
* [bbc] Improve format sorting and listing
|
||||
* [bbc] Improve playlist extraction
|
||||
+ [pokemon] Add extractor (#10093)
|
||||
+ [condenast] Add fallback scenario for video info extraction
|
||||
|
||||
|
||||
version 2016.08.06
|
||||
|
||||
Core
|
||||
|
||||
12
README.md
12
README.md
@@ -330,7 +330,15 @@ which means you can modify it, redistribute it or use it however you like.
|
||||
bidirectional text support. Requires bidiv
|
||||
or fribidi executable in PATH
|
||||
--sleep-interval SECONDS Number of seconds to sleep before each
|
||||
download.
|
||||
download when used alone or a lower bound
|
||||
of a range for randomized sleep before each
|
||||
download (minimum possible number of
|
||||
seconds to sleep) when used along with
|
||||
--max-sleep-interval.
|
||||
--max-sleep-interval SECONDS Upper bound of a range for randomized sleep
|
||||
before each download (maximum possible
|
||||
number of seconds to sleep). Must only be
|
||||
used along with --min-sleep-interval.
|
||||
|
||||
## Video Format Options:
|
||||
-f, --format FORMAT Video format code, see the "FORMAT
|
||||
@@ -1196,7 +1204,7 @@ Make sure that someone has not already opened the issue you're trying to open. S
|
||||
|
||||
### Why are existing options not enough?
|
||||
|
||||
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#synopsis). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
|
||||
Before requesting a new feature, please have a quick peek at [the list of supported options](https://github.com/rg3/youtube-dl/blob/master/README.md#options). Many feature requests are for features that actually exist already! Please, absolutely do show off your work in the issue report and detail how the existing similar options do *not* solve your problem.
|
||||
|
||||
### Is there enough context in your bug report?
|
||||
|
||||
|
||||
@@ -54,17 +54,21 @@ def filter_options(readme):
|
||||
|
||||
if in_options:
|
||||
if line.lstrip().startswith('-'):
|
||||
option, description = re.split(r'\s{2,}', line.lstrip())
|
||||
split_option = option.split(' ')
|
||||
split = re.split(r'\s{2,}', line.lstrip())
|
||||
# Description string may start with `-` as well. If there is
|
||||
# only one piece then it's a description bit not an option.
|
||||
if len(split) > 1:
|
||||
option, description = split
|
||||
split_option = option.split(' ')
|
||||
|
||||
if not split_option[-1].startswith('-'): # metavar
|
||||
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
|
||||
if not split_option[-1].startswith('-'): # metavar
|
||||
option = ' '.join(split_option[:-1] + ['*%s*' % split_option[-1]])
|
||||
|
||||
# Pandoc's definition_lists. See http://pandoc.org/README.html
|
||||
# for more information.
|
||||
ret += '\n%s\n: %s\n' % (option, description)
|
||||
else:
|
||||
ret += line.lstrip() + '\n'
|
||||
# Pandoc's definition_lists. See http://pandoc.org/README.html
|
||||
# for more information.
|
||||
ret += '\n%s\n: %s\n' % (option, description)
|
||||
continue
|
||||
ret += line.lstrip() + '\n'
|
||||
else:
|
||||
ret += line + '\n'
|
||||
|
||||
|
||||
@@ -182,6 +182,7 @@
|
||||
- **DigitallySpeaking**
|
||||
- **Digiteka**
|
||||
- **Discovery**
|
||||
- **DiscoveryGo**
|
||||
- **Dotsub**
|
||||
- **DouyuTV**: 斗鱼
|
||||
- **DPlay**
|
||||
@@ -264,7 +265,6 @@
|
||||
- **GloboArticle**
|
||||
- **GodTube**
|
||||
- **GodTV**
|
||||
- **GoldenMoustache**
|
||||
- **Golem**
|
||||
- **GoogleDrive**
|
||||
- **Goshgay**
|
||||
@@ -518,6 +518,7 @@
|
||||
- **plus.google**: Google Plus
|
||||
- **pluzz.francetv.fr**
|
||||
- **podomatic**
|
||||
- **Pokemon**
|
||||
- **PolskieRadio**
|
||||
- **PornHd**
|
||||
- **PornHub**: PornHub and Thumbzilla
|
||||
@@ -562,6 +563,7 @@
|
||||
- **RoosterTeeth**
|
||||
- **RottenTomatoes**
|
||||
- **Roxwel**
|
||||
- **Rozhlas**
|
||||
- **RTBF**
|
||||
- **rte**: Raidió Teilifís Éireann TV
|
||||
- **rte:radio**: Raidió Teilifís Éireann radio
|
||||
@@ -619,6 +621,7 @@
|
||||
- **smotri:user**: Smotri.com user videos
|
||||
- **Snotr**
|
||||
- **Sohu**
|
||||
- **SonyLIV**
|
||||
- **soundcloud**
|
||||
- **soundcloud:playlist**
|
||||
- **soundcloud:search**: Soundcloud search
|
||||
@@ -745,6 +748,7 @@
|
||||
- **udemy:course**
|
||||
- **UDNEmbed**: 聯合影音
|
||||
- **Unistra**
|
||||
- **uol.com.br**
|
||||
- **Urort**: NRK P3 Urørt
|
||||
- **URPlay**
|
||||
- **USAToday**
|
||||
|
||||
@@ -42,6 +42,7 @@ from youtube_dl.utils import (
|
||||
ohdave_rsa_encrypt,
|
||||
OnDemandPagedList,
|
||||
orderedSet,
|
||||
parse_age_limit,
|
||||
parse_duration,
|
||||
parse_filesize,
|
||||
parse_count,
|
||||
@@ -432,6 +433,20 @@ class TestUtil(unittest.TestCase):
|
||||
url_basename('http://media.w3.org/2010/05/sintel/trailer.mp4'),
|
||||
'trailer.mp4')
|
||||
|
||||
def test_parse_age_limit(self):
|
||||
self.assertEqual(parse_age_limit(None), None)
|
||||
self.assertEqual(parse_age_limit(False), None)
|
||||
self.assertEqual(parse_age_limit('invalid'), None)
|
||||
self.assertEqual(parse_age_limit(0), 0)
|
||||
self.assertEqual(parse_age_limit(18), 18)
|
||||
self.assertEqual(parse_age_limit(21), 21)
|
||||
self.assertEqual(parse_age_limit(22), None)
|
||||
self.assertEqual(parse_age_limit('18'), 18)
|
||||
self.assertEqual(parse_age_limit('18+'), 18)
|
||||
self.assertEqual(parse_age_limit('PG-13'), 13)
|
||||
self.assertEqual(parse_age_limit('TV-14'), 14)
|
||||
self.assertEqual(parse_age_limit('TV-MA'), 17)
|
||||
|
||||
def test_parse_duration(self):
|
||||
self.assertEqual(parse_duration(None), None)
|
||||
self.assertEqual(parse_duration(False), None)
|
||||
|
||||
@@ -249,7 +249,16 @@ class YoutubeDL(object):
|
||||
source_address: (Experimental) Client-side IP address to bind to.
|
||||
call_home: Boolean, true iff we are allowed to contact the
|
||||
youtube-dl servers for debugging.
|
||||
sleep_interval: Number of seconds to sleep before each download.
|
||||
sleep_interval: Number of seconds to sleep before each download when
|
||||
used alone or a lower bound of a range for randomized
|
||||
sleep before each download (minimum possible number
|
||||
of seconds to sleep) when used along with
|
||||
max_sleep_interval.
|
||||
max_sleep_interval:Upper bound of a range for randomized sleep before each
|
||||
download (maximum possible number of seconds to sleep).
|
||||
Must only be used along with sleep_interval.
|
||||
Actual sleep time will be a random float from range
|
||||
[sleep_interval; max_sleep_interval].
|
||||
listformats: Print an overview of available video formats and exit.
|
||||
list_thumbnails: Print a table of all thumbnails and exit.
|
||||
match_filter: A function that gets called with the info_dict of
|
||||
@@ -1594,7 +1603,9 @@ class YoutubeDL(object):
|
||||
self.to_screen('[info] Video subtitle %s.%s is already_present' % (sub_lang, sub_format))
|
||||
else:
|
||||
self.to_screen('[info] Writing video subtitles to: ' + sub_filename)
|
||||
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8') as subfile:
|
||||
# Use newline='' to prevent conversion of newline characters
|
||||
# See https://github.com/rg3/youtube-dl/issues/10268
|
||||
with io.open(encodeFilename(sub_filename), 'w', encoding='utf-8', newline='') as subfile:
|
||||
subfile.write(sub_data)
|
||||
except (OSError, IOError):
|
||||
self.report_error('Cannot write subtitles file ' + sub_filename)
|
||||
|
||||
@@ -145,6 +145,16 @@ def _real_main(argv=None):
|
||||
if numeric_limit is None:
|
||||
parser.error('invalid max_filesize specified')
|
||||
opts.max_filesize = numeric_limit
|
||||
if opts.sleep_interval is not None:
|
||||
if opts.sleep_interval < 0:
|
||||
parser.error('sleep interval must be positive or 0')
|
||||
if opts.max_sleep_interval is not None:
|
||||
if opts.max_sleep_interval < 0:
|
||||
parser.error('max sleep interval must be positive or 0')
|
||||
if opts.max_sleep_interval < opts.sleep_interval:
|
||||
parser.error('max sleep interval must be greater than or equal to min sleep interval')
|
||||
else:
|
||||
opts.max_sleep_interval = opts.sleep_interval
|
||||
|
||||
def parse_retries(retries):
|
||||
if retries in ('inf', 'infinite'):
|
||||
@@ -370,6 +380,7 @@ def _real_main(argv=None):
|
||||
'source_address': opts.source_address,
|
||||
'call_home': opts.call_home,
|
||||
'sleep_interval': opts.sleep_interval,
|
||||
'max_sleep_interval': opts.max_sleep_interval,
|
||||
'external_downloader': opts.external_downloader,
|
||||
'list_thumbnails': opts.list_thumbnails,
|
||||
'playlist_items': opts.playlist_items,
|
||||
|
||||
@@ -4,6 +4,7 @@ import os
|
||||
import re
|
||||
import sys
|
||||
import time
|
||||
import random
|
||||
|
||||
from ..compat import compat_os_name
|
||||
from ..utils import (
|
||||
@@ -342,8 +343,11 @@ class FileDownloader(object):
|
||||
})
|
||||
return True
|
||||
|
||||
sleep_interval = self.params.get('sleep_interval')
|
||||
if sleep_interval:
|
||||
min_sleep_interval = self.params.get('sleep_interval')
|
||||
if min_sleep_interval:
|
||||
max_sleep_interval = self.params.get('max_sleep_interval', min_sleep_interval)
|
||||
print(min_sleep_interval, max_sleep_interval)
|
||||
sleep_interval = random.uniform(min_sleep_interval, max_sleep_interval)
|
||||
self.to_screen('[download] Sleeping %s seconds...' % sleep_interval)
|
||||
time.sleep(sleep_interval)
|
||||
|
||||
|
||||
@@ -123,6 +123,10 @@ class AolFeaturesIE(InfoExtractor):
|
||||
'title': 'What To Watch - February 17, 2016',
|
||||
},
|
||||
'add_ie': ['FiveMin'],
|
||||
'params': {
|
||||
# encrypted m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
||||
@@ -1,8 +1,6 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
@@ -15,7 +13,7 @@ class AparatIE(InfoExtractor):
|
||||
|
||||
_TEST = {
|
||||
'url': 'http://www.aparat.com/v/wP8On',
|
||||
'md5': '6714e0af7e0d875c5a39c4dc4ab46ad1',
|
||||
'md5': '131aca2e14fe7c4dcb3c4877ba300c89',
|
||||
'info_dict': {
|
||||
'id': 'wP8On',
|
||||
'ext': 'mp4',
|
||||
@@ -31,13 +29,13 @@ class AparatIE(InfoExtractor):
|
||||
# Note: There is an easier-to-parse configuration at
|
||||
# http://www.aparat.com/video/video/config/videohash/%video_id
|
||||
# but the URL in there does not work
|
||||
embed_url = ('http://www.aparat.com/video/video/embed/videohash/' +
|
||||
video_id + '/vt/frame')
|
||||
embed_url = 'http://www.aparat.com/video/video/embed/vt/frame/showvideo/yes/videohash/' + video_id
|
||||
webpage = self._download_webpage(embed_url, video_id)
|
||||
|
||||
video_urls = [video_url.replace('\\/', '/') for video_url in re.findall(
|
||||
r'(?:fileList\[[0-9]+\]\s*=|"file"\s*:)\s*"([^"]+)"', webpage)]
|
||||
for i, video_url in enumerate(video_urls):
|
||||
file_list = self._parse_json(self._search_regex(
|
||||
r'fileList\s*=\s*JSON\.parse\(\'([^\']+)\'\)', webpage, 'file list'), video_id)
|
||||
for i, item in enumerate(file_list[0]):
|
||||
video_url = item['file']
|
||||
req = HEADRequest(video_url)
|
||||
res = self._request_webpage(
|
||||
req, video_id, note='Testing video URL %d' % i, errnote=False)
|
||||
|
||||
@@ -5,11 +5,13 @@ import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
dict_get,
|
||||
ExtractorError,
|
||||
float_or_none,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
parse_iso8601,
|
||||
try_get,
|
||||
unescapeHTML,
|
||||
)
|
||||
from ..compat import (
|
||||
@@ -229,51 +231,6 @@ class BBCCoUkIE(InfoExtractor):
|
||||
asx = self._download_xml(connection.get('href'), programme_id, 'Downloading ASX playlist')
|
||||
return [ref.get('href') for ref in asx.findall('./Entry/ref')]
|
||||
|
||||
def _extract_connection(self, connection, programme_id):
|
||||
formats = []
|
||||
kind = connection.get('kind')
|
||||
protocol = connection.get('protocol')
|
||||
supplier = connection.get('supplier')
|
||||
if protocol == 'http':
|
||||
href = connection.get('href')
|
||||
transfer_format = connection.get('transferFormat')
|
||||
# ASX playlist
|
||||
if supplier == 'asx':
|
||||
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
|
||||
formats.append({
|
||||
'url': ref,
|
||||
'format_id': 'ref%s_%s' % (i, supplier),
|
||||
})
|
||||
# Skip DASH until supported
|
||||
elif transfer_format == 'dash':
|
||||
pass
|
||||
elif transfer_format == 'hls':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id=supplier, fatal=False))
|
||||
# Direct link
|
||||
else:
|
||||
formats.append({
|
||||
'url': href,
|
||||
'format_id': supplier or kind or protocol,
|
||||
})
|
||||
elif protocol == 'rtmp':
|
||||
application = connection.get('application', 'ondemand')
|
||||
auth_string = connection.get('authString')
|
||||
identifier = connection.get('identifier')
|
||||
server = connection.get('server')
|
||||
formats.append({
|
||||
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
|
||||
'play_path': identifier,
|
||||
'app': '%s?%s' % (application, auth_string),
|
||||
'page_url': 'http://www.bbc.co.uk',
|
||||
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
|
||||
'rtmp_live': False,
|
||||
'ext': 'flv',
|
||||
'format_id': supplier,
|
||||
})
|
||||
return formats
|
||||
|
||||
def _extract_items(self, playlist):
|
||||
return playlist.findall('./{%s}item' % self._EMP_PLAYLIST_NS)
|
||||
|
||||
@@ -294,46 +251,6 @@ class BBCCoUkIE(InfoExtractor):
|
||||
def _extract_connections(self, media):
|
||||
return self._findall_ns(media, './{%s}connection')
|
||||
|
||||
def _extract_video(self, media, programme_id):
|
||||
formats = []
|
||||
vbr = int_or_none(media.get('bitrate'))
|
||||
vcodec = media.get('encoding')
|
||||
service = media.get('service')
|
||||
width = int_or_none(media.get('width'))
|
||||
height = int_or_none(media.get('height'))
|
||||
file_size = int_or_none(media.get('media_file_size'))
|
||||
for connection in self._extract_connections(media):
|
||||
conn_formats = self._extract_connection(connection, programme_id)
|
||||
for format in conn_formats:
|
||||
format.update({
|
||||
'width': width,
|
||||
'height': height,
|
||||
'vbr': vbr,
|
||||
'vcodec': vcodec,
|
||||
'filesize': file_size,
|
||||
})
|
||||
if service:
|
||||
format['format_id'] = '%s_%s' % (service, format['format_id'])
|
||||
formats.extend(conn_formats)
|
||||
return formats
|
||||
|
||||
def _extract_audio(self, media, programme_id):
|
||||
formats = []
|
||||
abr = int_or_none(media.get('bitrate'))
|
||||
acodec = media.get('encoding')
|
||||
service = media.get('service')
|
||||
for connection in self._extract_connections(media):
|
||||
conn_formats = self._extract_connection(connection, programme_id)
|
||||
for format in conn_formats:
|
||||
format.update({
|
||||
'format_id': '%s_%s' % (service, format['format_id']),
|
||||
'abr': abr,
|
||||
'acodec': acodec,
|
||||
'vcodec': 'none',
|
||||
})
|
||||
formats.extend(conn_formats)
|
||||
return formats
|
||||
|
||||
def _get_subtitles(self, media, programme_id):
|
||||
subtitles = {}
|
||||
for connection in self._extract_connections(media):
|
||||
@@ -379,13 +296,87 @@ class BBCCoUkIE(InfoExtractor):
|
||||
def _process_media_selector(self, media_selection, programme_id):
|
||||
formats = []
|
||||
subtitles = None
|
||||
urls = []
|
||||
|
||||
for media in self._extract_medias(media_selection):
|
||||
kind = media.get('kind')
|
||||
if kind == 'audio':
|
||||
formats.extend(self._extract_audio(media, programme_id))
|
||||
elif kind == 'video':
|
||||
formats.extend(self._extract_video(media, programme_id))
|
||||
if kind in ('video', 'audio'):
|
||||
bitrate = int_or_none(media.get('bitrate'))
|
||||
encoding = media.get('encoding')
|
||||
service = media.get('service')
|
||||
width = int_or_none(media.get('width'))
|
||||
height = int_or_none(media.get('height'))
|
||||
file_size = int_or_none(media.get('media_file_size'))
|
||||
for connection in self._extract_connections(media):
|
||||
href = connection.get('href')
|
||||
if href in urls:
|
||||
continue
|
||||
if href:
|
||||
urls.append(href)
|
||||
conn_kind = connection.get('kind')
|
||||
protocol = connection.get('protocol')
|
||||
supplier = connection.get('supplier')
|
||||
transfer_format = connection.get('transferFormat')
|
||||
format_id = supplier or conn_kind or protocol
|
||||
if service:
|
||||
format_id = '%s_%s' % (service, format_id)
|
||||
# ASX playlist
|
||||
if supplier == 'asx':
|
||||
for i, ref in enumerate(self._extract_asx_playlist(connection, programme_id)):
|
||||
formats.append({
|
||||
'url': ref,
|
||||
'format_id': 'ref%s_%s' % (i, format_id),
|
||||
})
|
||||
elif transfer_format == 'dash':
|
||||
formats.extend(self._extract_mpd_formats(
|
||||
href, programme_id, mpd_id=format_id, fatal=False))
|
||||
elif transfer_format == 'hls':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
href, programme_id, ext='mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id=format_id, fatal=False))
|
||||
elif transfer_format == 'hds':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
href, programme_id, f4m_id=format_id, fatal=False))
|
||||
else:
|
||||
if not service and not supplier and bitrate:
|
||||
format_id += '-%d' % bitrate
|
||||
fmt = {
|
||||
'format_id': format_id,
|
||||
'filesize': file_size,
|
||||
}
|
||||
if kind == 'video':
|
||||
fmt.update({
|
||||
'width': width,
|
||||
'height': height,
|
||||
'vbr': bitrate,
|
||||
'vcodec': encoding,
|
||||
})
|
||||
else:
|
||||
fmt.update({
|
||||
'abr': bitrate,
|
||||
'acodec': encoding,
|
||||
'vcodec': 'none',
|
||||
})
|
||||
if protocol == 'http':
|
||||
# Direct link
|
||||
fmt.update({
|
||||
'url': href,
|
||||
})
|
||||
elif protocol == 'rtmp':
|
||||
application = connection.get('application', 'ondemand')
|
||||
auth_string = connection.get('authString')
|
||||
identifier = connection.get('identifier')
|
||||
server = connection.get('server')
|
||||
fmt.update({
|
||||
'url': '%s://%s/%s?%s' % (protocol, server, application, auth_string),
|
||||
'play_path': identifier,
|
||||
'app': '%s?%s' % (application, auth_string),
|
||||
'page_url': 'http://www.bbc.co.uk',
|
||||
'player_url': 'http://www.bbc.co.uk/emp/releases/iplayer/revisions/617463_618125_4/617463_618125_4_emp.swf',
|
||||
'rtmp_live': False,
|
||||
'ext': 'flv',
|
||||
})
|
||||
formats.append(fmt)
|
||||
elif kind == 'captions':
|
||||
subtitles = self.extract_subtitles(media, programme_id)
|
||||
return formats, subtitles
|
||||
@@ -589,7 +580,7 @@ class BBCIE(BBCCoUkIE):
|
||||
'info_dict': {
|
||||
'id': '150615_telabyad_kentin_cogu',
|
||||
'ext': 'mp4',
|
||||
'title': "Tel Abyad'da IŞİD bayrağı indirildi YPG bayrağı çekildi",
|
||||
'title': "YPG: Tel Abyad'ın tamamı kontrolümüzde",
|
||||
'description': 'md5:33a4805a855c9baf7115fcbde57e7025',
|
||||
'timestamp': 1434397334,
|
||||
'upload_date': '20150615',
|
||||
@@ -654,6 +645,23 @@ class BBCIE(BBCCoUkIE):
|
||||
# rtmp download
|
||||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
# single video embedded with Morph
|
||||
'url': 'http://www.bbc.co.uk/sport/live/olympics/36895975',
|
||||
'info_dict': {
|
||||
'id': 'p041vhd0',
|
||||
'ext': 'mp4',
|
||||
'title': "Nigeria v Japan - Men's First Round",
|
||||
'description': 'Live coverage of the first round from Group B at the Amazonia Arena.',
|
||||
'duration': 7980,
|
||||
'uploader': 'BBC Sport',
|
||||
'uploader_id': 'bbc_sport',
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
},
|
||||
'skip': 'Georestricted to UK',
|
||||
}, {
|
||||
# single video with playlist.sxml URL in playlist param
|
||||
'url': 'http://www.bbc.com/sport/0/football/33653409',
|
||||
@@ -751,7 +759,7 @@ class BBCIE(BBCCoUkIE):
|
||||
|
||||
webpage = self._download_webpage(url, playlist_id)
|
||||
|
||||
json_ld_info = self._search_json_ld(webpage, playlist_id, default=None)
|
||||
json_ld_info = self._search_json_ld(webpage, playlist_id, default={})
|
||||
timestamp = json_ld_info.get('timestamp')
|
||||
|
||||
playlist_title = json_ld_info.get('title')
|
||||
@@ -820,13 +828,19 @@ class BBCIE(BBCCoUkIE):
|
||||
# http://www.bbc.com/turkce/multimedya/2015/10/151010_vid_ankara_patlama_ani)
|
||||
playlist = data_playable.get('otherSettings', {}).get('playlist', {})
|
||||
if playlist:
|
||||
for key in ('progressiveDownload', 'streaming'):
|
||||
entry = None
|
||||
for key in ('streaming', 'progressiveDownload'):
|
||||
playlist_url = playlist.get('%sUrl' % key)
|
||||
if not playlist_url:
|
||||
continue
|
||||
try:
|
||||
entries.append(self._extract_from_playlist_sxml(
|
||||
playlist_url, playlist_id, timestamp))
|
||||
info = self._extract_from_playlist_sxml(
|
||||
playlist_url, playlist_id, timestamp)
|
||||
if not entry:
|
||||
entry = info
|
||||
else:
|
||||
entry['title'] = info['title']
|
||||
entry['formats'].extend(info['formats'])
|
||||
except Exception as e:
|
||||
# Some playlist URL may fail with 500, at the same time
|
||||
# the other one may work fine (e.g.
|
||||
@@ -834,6 +848,9 @@ class BBCIE(BBCCoUkIE):
|
||||
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 500:
|
||||
continue
|
||||
raise
|
||||
if entry:
|
||||
self._sort_formats(entry['formats'])
|
||||
entries.append(entry)
|
||||
|
||||
if entries:
|
||||
return self.playlist_result(entries, playlist_id, playlist_title, playlist_description)
|
||||
@@ -866,6 +883,50 @@ class BBCIE(BBCCoUkIE):
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
# Morph based embed (e.g. http://www.bbc.co.uk/sport/live/olympics/36895975)
|
||||
# There are several setPayload calls may be present but the video
|
||||
# seems to be always related to the first one
|
||||
morph_payload = self._parse_json(
|
||||
self._search_regex(
|
||||
r'Morph\.setPayload\([^,]+,\s*({.+?})\);',
|
||||
webpage, 'morph payload', default='{}'),
|
||||
playlist_id, fatal=False)
|
||||
if morph_payload:
|
||||
components = try_get(morph_payload, lambda x: x['body']['components'], list) or []
|
||||
for component in components:
|
||||
if not isinstance(component, dict):
|
||||
continue
|
||||
lead_media = try_get(component, lambda x: x['props']['leadMedia'], dict)
|
||||
if not lead_media:
|
||||
continue
|
||||
identifiers = lead_media.get('identifiers')
|
||||
if not identifiers or not isinstance(identifiers, dict):
|
||||
continue
|
||||
programme_id = identifiers.get('vpid') or identifiers.get('playablePid')
|
||||
if not programme_id:
|
||||
continue
|
||||
title = lead_media.get('title') or self._og_search_title(webpage)
|
||||
formats, subtitles = self._download_media_selector(programme_id)
|
||||
self._sort_formats(formats)
|
||||
description = lead_media.get('summary')
|
||||
uploader = lead_media.get('masterBrand')
|
||||
uploader_id = lead_media.get('mid')
|
||||
duration = None
|
||||
duration_d = lead_media.get('duration')
|
||||
if isinstance(duration_d, dict):
|
||||
duration = parse_duration(dict_get(
|
||||
duration_d, ('rawDuration', 'formattedDuration', 'spokenDuration')))
|
||||
return {
|
||||
'id': programme_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'uploader': uploader,
|
||||
'uploader_id': uploader_id,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
|
||||
def extract_all(pattern):
|
||||
return list(filter(None, map(
|
||||
lambda s: self._parse_json(s, playlist_id, fatal=False),
|
||||
@@ -883,7 +944,7 @@ class BBCIE(BBCCoUkIE):
|
||||
r'setPlaylist\("(%s)"\)' % EMBED_URL, webpage))
|
||||
if entries:
|
||||
return self.playlist_result(
|
||||
[self.url_result(entry, 'BBCCoUk') for entry in entries],
|
||||
[self.url_result(entry_, 'BBCCoUk') for entry_ in entries],
|
||||
playlist_id, playlist_title, playlist_description)
|
||||
|
||||
# Multiple video article (e.g. http://www.bbc.com/news/world-europe-32668511)
|
||||
|
||||
@@ -25,13 +25,13 @@ class BiliBiliIE(InfoExtractor):
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.bilibili.tv/video/av1074402/',
|
||||
'md5': '5f7d29e1a2872f3df0cf76b1f87d3788',
|
||||
'md5': '9fa226fe2b8a9a4d5a69b4c6a183417e',
|
||||
'info_dict': {
|
||||
'id': '1554319',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': '【金坷垃】金泡沫',
|
||||
'description': 'md5:ce18c2a2d2193f0df2917d270f2e5923',
|
||||
'duration': 308.067,
|
||||
'duration': 308.315,
|
||||
'timestamp': 1398012660,
|
||||
'upload_date': '20140420',
|
||||
'thumbnail': 're:^https?://.+\.jpg',
|
||||
@@ -41,73 +41,33 @@ class BiliBiliIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://www.bilibili.com/video/av1041170/',
|
||||
'info_dict': {
|
||||
'id': '1041170',
|
||||
'id': '1507019',
|
||||
'ext': 'mp4',
|
||||
'title': '【BD1080P】刀语【诸神&异域】',
|
||||
'description': '这是个神奇的故事~每个人不留弹幕不给走哦~切利哦!~',
|
||||
'timestamp': 1396530060,
|
||||
'upload_date': '20140403',
|
||||
'uploader': '枫叶逝去',
|
||||
'uploader_id': '520116',
|
||||
},
|
||||
'playlist_count': 9,
|
||||
}, {
|
||||
'url': 'http://www.bilibili.com/video/av4808130/',
|
||||
'info_dict': {
|
||||
'id': '4808130',
|
||||
'id': '7802182',
|
||||
'ext': 'mp4',
|
||||
'title': '【长篇】哆啦A梦443【钉铛】',
|
||||
'description': '(2016.05.27)来组合客人的脸吧&amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;illust_id=56912929',
|
||||
'timestamp': 1464564180,
|
||||
'upload_date': '20160529',
|
||||
'uploader': '喜欢拉面',
|
||||
'uploader_id': '151066',
|
||||
},
|
||||
'playlist': [{
|
||||
'md5': '55cdadedf3254caaa0d5d27cf20a8f9c',
|
||||
'info_dict': {
|
||||
'id': '4808130_part1',
|
||||
'ext': 'flv',
|
||||
'title': '【长篇】哆啦A梦443【钉铛】',
|
||||
'description': '(2016.05.27)来组合客人的脸吧&amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;illust_id=56912929',
|
||||
'timestamp': 1464564180,
|
||||
'upload_date': '20160529',
|
||||
'uploader': '喜欢拉面',
|
||||
'uploader_id': '151066',
|
||||
},
|
||||
}, {
|
||||
'md5': '926f9f67d0c482091872fbd8eca7ea3d',
|
||||
'info_dict': {
|
||||
'id': '4808130_part2',
|
||||
'ext': 'flv',
|
||||
'title': '【长篇】哆啦A梦443【钉铛】',
|
||||
'description': '(2016.05.27)来组合客人的脸吧&amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;illust_id=56912929',
|
||||
'timestamp': 1464564180,
|
||||
'upload_date': '20160529',
|
||||
'uploader': '喜欢拉面',
|
||||
'uploader_id': '151066',
|
||||
},
|
||||
}, {
|
||||
'md5': '4b7b225b968402d7c32348c646f1fd83',
|
||||
'info_dict': {
|
||||
'id': '4808130_part3',
|
||||
'ext': 'flv',
|
||||
'title': '【长篇】哆啦A梦443【钉铛】',
|
||||
'description': '(2016.05.27)来组合客人的脸吧&amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;illust_id=56912929',
|
||||
'timestamp': 1464564180,
|
||||
'upload_date': '20160529',
|
||||
'uploader': '喜欢拉面',
|
||||
'uploader_id': '151066',
|
||||
},
|
||||
}, {
|
||||
'md5': '7b795e214166501e9141139eea236e91',
|
||||
'info_dict': {
|
||||
'id': '4808130_part4',
|
||||
'ext': 'flv',
|
||||
'title': '【长篇】哆啦A梦443【钉铛】',
|
||||
'description': '(2016.05.27)来组合客人的脸吧&amp;寻母六千里锭 抱歉,又轮到周日上班现在才到家 封面www.pixiv.net/member_illust.php?mode=medium&amp;illust_id=56912929',
|
||||
'timestamp': 1464564180,
|
||||
'upload_date': '20160529',
|
||||
'uploader': '喜欢拉面',
|
||||
'uploader_id': '151066',
|
||||
},
|
||||
}],
|
||||
}, {
|
||||
# Missing upload time
|
||||
'url': 'http://www.bilibili.com/video/av1867637/',
|
||||
'info_dict': {
|
||||
'id': '2880301',
|
||||
'ext': 'flv',
|
||||
'ext': 'mp4',
|
||||
'title': '【HDTV】【喜剧】岳父岳母真难当 (2014)【法国票房冠军】',
|
||||
'description': '一个信奉天主教的法国旧式传统资产阶级家庭中有四个女儿。三个女儿却分别找了阿拉伯、犹太、中国丈夫,老夫老妻唯独期盼剩下未嫁的小女儿能找一个信奉天主教的法国白人,结果没想到小女儿找了一位非裔黑人……【这次应该不会跳帧了】',
|
||||
'uploader': '黑夜为猫',
|
||||
|
||||
@@ -24,7 +24,8 @@ class BIQLEIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'Ребенок в шоке от автоматической мойки',
|
||||
'uploader': 'Dmitry Kotov',
|
||||
}
|
||||
},
|
||||
'skip': ' This video was marked as adult. Embedding adult videos on external sites is prohibited.',
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
||||
@@ -17,7 +17,8 @@ class ChaturbateIE(InfoExtractor):
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
}
|
||||
},
|
||||
'skip': 'Room is offline',
|
||||
}, {
|
||||
'url': 'https://en.chaturbate.com/siswet19/',
|
||||
'only_matching': True,
|
||||
|
||||
@@ -1,30 +1,33 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import base64
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
parse_duration,
|
||||
int_or_none,
|
||||
)
|
||||
from ..utils import parse_duration
|
||||
|
||||
|
||||
class ChirbitIE(InfoExtractor):
|
||||
IE_NAME = 'chirbit'
|
||||
_VALID_URL = r'https?://(?:www\.)?chirb\.it/(?:(?:wp|pl)/|fb_chirbit_player\.swf\?key=)?(?P<id>[\da-zA-Z]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://chirb.it/PrIPv5',
|
||||
'md5': '9847b0dad6ac3e074568bf2cfb197de8',
|
||||
'url': 'http://chirb.it/be2abG',
|
||||
'info_dict': {
|
||||
'id': 'PrIPv5',
|
||||
'id': 'be2abG',
|
||||
'ext': 'mp3',
|
||||
'title': 'Фасадстрой',
|
||||
'duration': 52,
|
||||
'view_count': int,
|
||||
'comment_count': int,
|
||||
'title': 'md5:f542ea253f5255240be4da375c6a5d7e',
|
||||
'description': 'md5:f24a4e22a71763e32da5fed59e47c770',
|
||||
'duration': 306,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
'url': 'https://chirb.it/fb_chirbit_player.swf?key=PrIPv5',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'https://chirb.it/wp/MN58c2',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
@@ -33,27 +36,30 @@ class ChirbitIE(InfoExtractor):
|
||||
webpage = self._download_webpage(
|
||||
'http://chirb.it/%s' % audio_id, audio_id)
|
||||
|
||||
audio_url = self._search_regex(
|
||||
r'"setFile"\s*,\s*"([^"]+)"', webpage, 'audio url')
|
||||
data_fd = self._search_regex(
|
||||
r'data-fd=(["\'])(?P<url>(?:(?!\1).)+)\1',
|
||||
webpage, 'data fd', group='url')
|
||||
|
||||
# Reverse engineered from https://chirb.it/js/chirbit.player.js (look
|
||||
# for soundURL)
|
||||
audio_url = base64.b64decode(
|
||||
data_fd[::-1].encode('ascii')).decode('utf-8')
|
||||
|
||||
title = self._search_regex(
|
||||
r'itemprop="name">([^<]+)', webpage, 'title')
|
||||
duration = parse_duration(self._html_search_meta(
|
||||
'duration', webpage, 'duration', fatal=False))
|
||||
view_count = int_or_none(self._search_regex(
|
||||
r'itemprop="playCount"\s*>(\d+)', webpage,
|
||||
'listen count', fatal=False))
|
||||
comment_count = int_or_none(self._search_regex(
|
||||
r'>(\d+) Comments?:', webpage,
|
||||
'comment count', fatal=False))
|
||||
r'class=["\']chirbit-title["\'][^>]*>([^<]+)', webpage, 'title')
|
||||
description = self._search_regex(
|
||||
r'<h3>Description</h3>\s*<pre[^>]*>([^<]+)</pre>',
|
||||
webpage, 'description', default=None)
|
||||
duration = parse_duration(self._search_regex(
|
||||
r'class=["\']c-length["\'][^>]*>([^<]+)',
|
||||
webpage, 'duration', fatal=False))
|
||||
|
||||
return {
|
||||
'id': audio_id,
|
||||
'url': audio_url,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'view_count': view_count,
|
||||
'comment_count': comment_count,
|
||||
}
|
||||
|
||||
|
||||
|
||||
@@ -816,11 +816,14 @@ class InfoExtractor(object):
|
||||
json_ld = self._search_regex(
|
||||
r'(?s)<script[^>]+type=(["\'])application/ld\+json\1[^>]*>(?P<json_ld>.+?)</script>',
|
||||
html, 'JSON-LD', group='json_ld', **kwargs)
|
||||
default = kwargs.get('default', NO_DEFAULT)
|
||||
if not json_ld:
|
||||
return {}
|
||||
return self._json_ld(
|
||||
json_ld, video_id, fatal=kwargs.get('fatal', True),
|
||||
expected_type=expected_type)
|
||||
return default if default is not NO_DEFAULT else {}
|
||||
# JSON-LD may be malformed and thus `fatal` should be respected.
|
||||
# At the same time `default` may be passed that assumes `fatal=False`
|
||||
# for _search_regex. Let's simulate the same behavior here as well.
|
||||
fatal = kwargs.get('fatal', True) if default == NO_DEFAULT else False
|
||||
return self._json_ld(json_ld, video_id, fatal=fatal, expected_type=expected_type)
|
||||
|
||||
def _json_ld(self, json_ld, video_id, fatal=True, expected_type=None):
|
||||
if isinstance(json_ld, compat_str):
|
||||
@@ -846,7 +849,7 @@ class InfoExtractor(object):
|
||||
part_of_season = e.get('partOfSeason')
|
||||
if isinstance(part_of_season, dict) and part_of_season.get('@type') == 'TVSeason':
|
||||
info['season_number'] = int_or_none(part_of_season.get('seasonNumber'))
|
||||
part_of_series = e.get('partOfSeries')
|
||||
part_of_series = e.get('partOfSeries') or e.get('partOfTVSeries')
|
||||
if isinstance(part_of_series, dict) and part_of_series.get('@type') == 'TVSeries':
|
||||
info['series'] = unescapeHTML(part_of_series.get('name'))
|
||||
elif item_type == 'Article':
|
||||
@@ -1140,7 +1143,7 @@ class InfoExtractor(object):
|
||||
'url': m3u8_url,
|
||||
'ext': ext,
|
||||
'protocol': 'm3u8',
|
||||
'preference': preference - 1 if preference else -1,
|
||||
'preference': preference - 100 if preference else -100,
|
||||
'resolution': 'multiple',
|
||||
'format_note': 'Quality selection URL',
|
||||
}
|
||||
|
||||
@@ -113,11 +113,19 @@ class CondeNastIE(InfoExtractor):
|
||||
'target': params['id'],
|
||||
})
|
||||
video_id = query['videoId']
|
||||
video_info = None
|
||||
info_page = self._download_webpage(
|
||||
'http://player.cnevids.com/player/video.js',
|
||||
video_id, 'Downloading video info', query=query)
|
||||
video_info = self._parse_json(self._search_regex(
|
||||
r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
|
||||
video_id, 'Downloading video info', query=query, fatal=False)
|
||||
if info_page:
|
||||
video_info = self._parse_json(self._search_regex(
|
||||
r'loadCallback\(({.+})\)', info_page, 'video info'), video_id)['video']
|
||||
else:
|
||||
info_page = self._download_webpage(
|
||||
'http://player.cnevids.com/player/loader.js',
|
||||
video_id, 'Downloading loader info', query=query)
|
||||
video_info = self._parse_json(self._search_regex(
|
||||
r'var\s+video\s*=\s*({.+?});', info_page, 'video info'), video_id)
|
||||
title = video_info['title']
|
||||
|
||||
formats = []
|
||||
@@ -135,7 +143,8 @@ class CondeNastIE(InfoExtractor):
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
info = self._search_json_ld(webpage, video_id) if url_type != 'embed' else {}
|
||||
info = self._search_json_ld(
|
||||
webpage, video_id, fatal=False) if url_type != 'embed' else {}
|
||||
info.update({
|
||||
'id': video_id,
|
||||
'formats': formats,
|
||||
|
||||
@@ -1,13 +1,12 @@
|
||||
# -*- coding: utf-8 -*-
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import parse_iso8601, ExtractorError
|
||||
from ..utils import unified_timestamp
|
||||
|
||||
|
||||
class CtsNewsIE(InfoExtractor):
|
||||
IE_DESC = '華視新聞'
|
||||
# https connection failed (Connection reset)
|
||||
_VALID_URL = r'https?://news\.cts\.com\.tw/[a-z]+/[a-z]+/\d+/(?P<id>\d+)\.html'
|
||||
_TESTS = [{
|
||||
'url': 'http://news.cts.com.tw/cts/international/201501/201501291578109.html',
|
||||
@@ -16,7 +15,7 @@ class CtsNewsIE(InfoExtractor):
|
||||
'id': '201501291578109',
|
||||
'ext': 'mp4',
|
||||
'title': '以色列.真主黨交火 3人死亡',
|
||||
'description': 'md5:95e9b295c898b7ff294f09d450178d7d',
|
||||
'description': '以色列和黎巴嫩真主黨,爆發五年最嚴重衝突,雙方砲轟交火,兩名以軍死亡,還有一名西班牙籍的聯合國維和人...',
|
||||
'timestamp': 1422528540,
|
||||
'upload_date': '20150129',
|
||||
}
|
||||
@@ -28,7 +27,7 @@ class CtsNewsIE(InfoExtractor):
|
||||
'id': '201309031304098',
|
||||
'ext': 'mp4',
|
||||
'title': '韓國31歲童顏男 貌如十多歲小孩',
|
||||
'description': 'md5:f183feeba3752b683827aab71adad584',
|
||||
'description': '越有年紀的人,越希望看起來年輕一點,而南韓卻有一位31歲的男子,看起來像是11、12歲的小孩,身...',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
'timestamp': 1378205880,
|
||||
'upload_date': '20130903',
|
||||
@@ -36,8 +35,7 @@ class CtsNewsIE(InfoExtractor):
|
||||
}, {
|
||||
# With Youtube embedded video
|
||||
'url': 'http://news.cts.com.tw/cts/money/201501/201501291578003.html',
|
||||
'md5': '1d842c771dc94c8c3bca5af2cc1db9c5',
|
||||
'add_ie': ['Youtube'],
|
||||
'md5': 'e4726b2ccd70ba2c319865e28f0a91d1',
|
||||
'info_dict': {
|
||||
'id': 'OVbfO7d0_hQ',
|
||||
'ext': 'mp4',
|
||||
@@ -47,42 +45,37 @@ class CtsNewsIE(InfoExtractor):
|
||||
'upload_date': '20150128',
|
||||
'uploader_id': 'TBSCTS',
|
||||
'uploader': '中華電視公司',
|
||||
}
|
||||
},
|
||||
'add_ie': ['Youtube'],
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
news_id = self._match_id(url)
|
||||
page = self._download_webpage(url, news_id)
|
||||
|
||||
if self._search_regex(r'(CTSPlayer2)', page, 'CTSPlayer2 identifier', default=None):
|
||||
feed_url = self._html_search_regex(
|
||||
r'(http://news\.cts\.com\.tw/action/mp4feed\.php\?news_id=\d+)',
|
||||
page, 'feed url')
|
||||
video_url = self._download_webpage(
|
||||
feed_url, news_id, note='Fetching feed')
|
||||
news_id = self._hidden_inputs(page).get('get_id')
|
||||
|
||||
if news_id:
|
||||
mp4_feed = self._download_json(
|
||||
'http://news.cts.com.tw/action/test_mp4feed.php',
|
||||
news_id, note='Fetching feed', query={'news_id': news_id})
|
||||
video_url = mp4_feed['source_url']
|
||||
else:
|
||||
self.to_screen('Not CTSPlayer video, trying Youtube...')
|
||||
youtube_url = self._search_regex(
|
||||
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url',
|
||||
default=None)
|
||||
if not youtube_url:
|
||||
raise ExtractorError('The news includes no videos!', expected=True)
|
||||
r'src="(//www\.youtube\.com/embed/[^"]+)"', page, 'youtube url')
|
||||
|
||||
return {
|
||||
'_type': 'url',
|
||||
'url': youtube_url,
|
||||
'ie_key': 'Youtube',
|
||||
}
|
||||
return self.url_result(youtube_url, ie='Youtube')
|
||||
|
||||
description = self._html_search_meta('description', page)
|
||||
title = self._html_search_meta('title', page)
|
||||
title = self._html_search_meta('title', page, fatal=True)
|
||||
thumbnail = self._html_search_meta('image', page)
|
||||
|
||||
datetime_str = self._html_search_regex(
|
||||
r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time')
|
||||
# Transform into ISO 8601 format with timezone info
|
||||
datetime_str = datetime_str.replace('/', '-') + ':00+0800'
|
||||
timestamp = parse_iso8601(datetime_str, delimiter=' ')
|
||||
r'(\d{4}/\d{2}/\d{2} \d{2}:\d{2})', page, 'date and time', fatal=False)
|
||||
timestamp = None
|
||||
if datetime_str:
|
||||
timestamp = unified_timestamp(datetime_str) - 8 * 3600
|
||||
|
||||
return {
|
||||
'id': news_id,
|
||||
|
||||
@@ -28,7 +28,8 @@ class CWTVIE(InfoExtractor):
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
}
|
||||
},
|
||||
'skip': 'redirect to http://cwtv.com/shows/arrow/',
|
||||
}, {
|
||||
'url': 'http://www.cwseed.com/shows/whose-line-is-it-anyway/jeff-davis-4/?play=24282b12-ead2-42f2-95ad-26770c2c6088',
|
||||
'info_dict': {
|
||||
@@ -44,10 +45,6 @@ class CWTVIE(InfoExtractor):
|
||||
'upload_date': '20151006',
|
||||
'timestamp': 1444107300,
|
||||
},
|
||||
'params': {
|
||||
# m3u8 download
|
||||
'skip_download': True,
|
||||
}
|
||||
}, {
|
||||
'url': 'http://cwtv.com/thecw/chroniclesofcisco/?play=8adebe35-f447-465f-ab52-e863506ff6d6',
|
||||
'only_matching': True,
|
||||
@@ -61,11 +58,30 @@ class CWTVIE(InfoExtractor):
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
video_data = self._download_json(
|
||||
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/132?format=json' % video_id, video_id)
|
||||
|
||||
formats = self._extract_m3u8_formats(
|
||||
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
|
||||
video_data = None
|
||||
formats = []
|
||||
for partner in (154, 213):
|
||||
vdata = self._download_json(
|
||||
'http://metaframe.digitalsmiths.tv/v2/CWtv/assets/%s/partner/%d?format=json' % (video_id, partner), video_id, fatal=False)
|
||||
if not vdata:
|
||||
continue
|
||||
video_data = vdata
|
||||
for quality, quality_data in vdata.get('videos', {}).items():
|
||||
quality_url = quality_data.get('uri')
|
||||
if not quality_url:
|
||||
continue
|
||||
if quality == 'variantplaylist':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
quality_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
|
||||
else:
|
||||
tbr = int_or_none(quality_data.get('bitrate'))
|
||||
format_id = 'http' + ('-%d' % tbr if tbr else '')
|
||||
if self._is_valid_url(quality_url, video_id, format_id):
|
||||
formats.append({
|
||||
'format_id': format_id,
|
||||
'url': quality_url,
|
||||
'tbr': tbr,
|
||||
})
|
||||
self._sort_formats(formats)
|
||||
|
||||
thumbnails = [{
|
||||
|
||||
98
youtube_dl/extractor/discoverygo.py
Normal file
98
youtube_dl/extractor/discoverygo.py
Normal file
@@ -0,0 +1,98 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
extract_attributes,
|
||||
int_or_none,
|
||||
parse_age_limit,
|
||||
unescapeHTML,
|
||||
)
|
||||
|
||||
|
||||
class DiscoveryGoIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?discoverygo\.com/(?:[^/]+/)*(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'https://www.discoverygo.com/love-at-first-kiss/kiss-first-ask-questions-later/',
|
||||
'info_dict': {
|
||||
'id': '57a33c536b66d1cd0345eeb1',
|
||||
'ext': 'mp4',
|
||||
'title': 'Kiss First, Ask Questions Later!',
|
||||
'description': 'md5:fe923ba34050eae468bffae10831cb22',
|
||||
'duration': 2579,
|
||||
'series': 'Love at First Kiss',
|
||||
'season_number': 1,
|
||||
'episode_number': 1,
|
||||
'age_limit': 14,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(url, display_id)
|
||||
|
||||
container = extract_attributes(
|
||||
self._search_regex(
|
||||
r'(<div[^>]+class=["\']video-player-container[^>]+>)',
|
||||
webpage, 'video container'))
|
||||
|
||||
video = self._parse_json(
|
||||
unescapeHTML(container.get('data-video') or container.get('data-json')),
|
||||
display_id)
|
||||
|
||||
title = video['name']
|
||||
|
||||
stream = video['stream']
|
||||
STREAM_URL_SUFFIX = 'streamUrl'
|
||||
formats = []
|
||||
for stream_kind in ('', 'hds'):
|
||||
suffix = STREAM_URL_SUFFIX.capitalize() if stream_kind else STREAM_URL_SUFFIX
|
||||
stream_url = stream.get('%s%s' % (stream_kind, suffix))
|
||||
if not stream_url:
|
||||
continue
|
||||
if stream_kind == '':
|
||||
formats.extend(self._extract_m3u8_formats(
|
||||
stream_url, display_id, 'mp4', entry_protocol='m3u8_native',
|
||||
m3u8_id='hls', fatal=False))
|
||||
elif stream_kind == 'hds':
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
stream_url, display_id, f4m_id=stream_kind, fatal=False))
|
||||
self._sort_formats(formats)
|
||||
|
||||
video_id = video.get('id') or display_id
|
||||
description = video.get('description', {}).get('detailed')
|
||||
duration = int_or_none(video.get('duration'))
|
||||
|
||||
series = video.get('show', {}).get('name')
|
||||
season_number = int_or_none(video.get('season', {}).get('number'))
|
||||
episode_number = int_or_none(video.get('episodeNumber'))
|
||||
|
||||
tags = video.get('tags')
|
||||
age_limit = parse_age_limit(video.get('parental', {}).get('rating'))
|
||||
|
||||
subtitles = {}
|
||||
captions = stream.get('captions')
|
||||
if isinstance(captions, list):
|
||||
for caption in captions:
|
||||
subtitle_url = caption.get('fileUrl')
|
||||
if (not subtitle_url or not isinstance(subtitle_url, compat_str) or
|
||||
not subtitle_url.startswith('http')):
|
||||
continue
|
||||
lang = caption.get('fileLang', 'en')
|
||||
subtitles.setdefault(lang, []).append({'url': subtitle_url})
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'display_id': display_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'series': series,
|
||||
'season_number': season_number,
|
||||
'episode_number': episode_number,
|
||||
'tags': tags,
|
||||
'age_limit': age_limit,
|
||||
'formats': formats,
|
||||
'subtitles': subtitles,
|
||||
}
|
||||
@@ -3,7 +3,10 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import str_to_int
|
||||
from ..utils import (
|
||||
NO_DEFAULT,
|
||||
str_to_int,
|
||||
)
|
||||
|
||||
|
||||
class DrTuberIE(InfoExtractor):
|
||||
@@ -17,7 +20,6 @@ class DrTuberIE(InfoExtractor):
|
||||
'ext': 'mp4',
|
||||
'title': 'hot perky blonde naked golf',
|
||||
'like_count': int,
|
||||
'dislike_count': int,
|
||||
'comment_count': int,
|
||||
'categories': ['Babe', 'Blonde', 'Erotic', 'Outdoor', 'Softcore', 'Solo'],
|
||||
'thumbnail': 're:https?://.*\.jpg$',
|
||||
@@ -36,25 +38,29 @@ class DrTuberIE(InfoExtractor):
|
||||
r'<source src="([^"]+)"', webpage, 'video URL')
|
||||
|
||||
title = self._html_search_regex(
|
||||
[r'<p[^>]+class="title_substrate">([^<]+)</p>', r'<title>([^<]+) - \d+'],
|
||||
(r'class="title_watch"[^>]*><p>([^<]+)<',
|
||||
r'<p[^>]+class="title_substrate">([^<]+)</p>',
|
||||
r'<title>([^<]+) - \d+'),
|
||||
webpage, 'title')
|
||||
|
||||
thumbnail = self._html_search_regex(
|
||||
r'poster="([^"]+)"',
|
||||
webpage, 'thumbnail', fatal=False)
|
||||
|
||||
def extract_count(id_, name):
|
||||
def extract_count(id_, name, default=NO_DEFAULT):
|
||||
return str_to_int(self._html_search_regex(
|
||||
r'<span[^>]+(?:class|id)="%s"[^>]*>([\d,\.]+)</span>' % id_,
|
||||
webpage, '%s count' % name, fatal=False))
|
||||
webpage, '%s count' % name, default=default, fatal=False))
|
||||
|
||||
like_count = extract_count('rate_likes', 'like')
|
||||
dislike_count = extract_count('rate_dislikes', 'dislike')
|
||||
dislike_count = extract_count('rate_dislikes', 'dislike', default=None)
|
||||
comment_count = extract_count('comments_count', 'comment')
|
||||
|
||||
cats_str = self._search_regex(
|
||||
r'<div[^>]+class="categories_list">(.+?)</div>', webpage, 'categories', fatal=False)
|
||||
categories = [] if not cats_str else re.findall(r'<a title="([^"]+)"', cats_str)
|
||||
r'<div[^>]+class="categories_list">(.+?)</div>',
|
||||
webpage, 'categories', fatal=False)
|
||||
categories = [] if not cats_str else re.findall(
|
||||
r'<a title="([^"]+)"', cats_str)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
|
||||
@@ -221,6 +221,7 @@ from .dvtv import DVTVIE
|
||||
from .dumpert import DumpertIE
|
||||
from .defense import DefenseGouvFrIE
|
||||
from .discovery import DiscoveryIE
|
||||
from .discoverygo import DiscoveryGoIE
|
||||
from .dispeak import DigitallySpeakingIE
|
||||
from .dropbox import DropboxIE
|
||||
from .dw import (
|
||||
@@ -310,7 +311,6 @@ from .globo import (
|
||||
)
|
||||
from .godtube import GodTubeIE
|
||||
from .godtv import GodTVIE
|
||||
from .goldenmoustache import GoldenMoustacheIE
|
||||
from .golem import GolemIE
|
||||
from .googledrive import GoogleDriveIE
|
||||
from .googleplus import GooglePlusIE
|
||||
@@ -636,6 +636,7 @@ from .pluralsight import (
|
||||
PluralsightCourseIE,
|
||||
)
|
||||
from .podomatic import PodomaticIE
|
||||
from .pokemon import PokemonIE
|
||||
from .polskieradio import PolskieRadioIE
|
||||
from .porn91 import Porn91IE
|
||||
from .pornhd import PornHdIE
|
||||
@@ -694,6 +695,7 @@ from .rockstargames import RockstarGamesIE
|
||||
from .roosterteeth import RoosterTeethIE
|
||||
from .rottentomatoes import RottenTomatoesIE
|
||||
from .roxwel import RoxwelIE
|
||||
from .rozhlas import RozhlasIE
|
||||
from .rtbf import RTBFIE
|
||||
from .rte import RteIE, RteRadioIE
|
||||
from .rtlnl import RtlNlIE
|
||||
@@ -753,6 +755,7 @@ from .smotri import (
|
||||
)
|
||||
from .snotr import SnotrIE
|
||||
from .sohu import SohuIE
|
||||
from .sonyliv import SonyLIVIE
|
||||
from .soundcloud import (
|
||||
SoundcloudIE,
|
||||
SoundcloudSetIE,
|
||||
@@ -925,6 +928,7 @@ from .udemy import (
|
||||
from .udn import UDNEmbedIE
|
||||
from .digiteka import DigitekaIE
|
||||
from .unistra import UnistraIE
|
||||
from .uol import UOLIE
|
||||
from .urort import UrortIE
|
||||
from .urplay import URPlayIE
|
||||
from .usatoday import USATodayIE
|
||||
|
||||
@@ -48,7 +48,7 @@ class FlipagramIE(InfoExtractor):
|
||||
flipagram = video_data['flipagram']
|
||||
video = flipagram['video']
|
||||
|
||||
json_ld = self._search_json_ld(webpage, video_id, default=False)
|
||||
json_ld = self._search_json_ld(webpage, video_id, default={})
|
||||
title = json_ld.get('title') or flipagram['captionText']
|
||||
description = json_ld.get('description') or flipagram.get('captionText')
|
||||
|
||||
|
||||
@@ -5,8 +5,8 @@ from .common import InfoExtractor
|
||||
|
||||
|
||||
class Formula1IE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?formula1\.com/content/fom-website/en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
|
||||
_TEST = {
|
||||
_VALID_URL = r'https?://(?:www\.)?formula1\.com/(?:content/fom-website/)?en/video/\d{4}/\d{1,2}/(?P<id>.+?)\.html'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.formula1.com/content/fom-website/en/video/2016/5/Race_highlights_-_Spain_2016.html',
|
||||
'md5': '8c79e54be72078b26b89e0e111c0502b',
|
||||
'info_dict': {
|
||||
@@ -15,7 +15,10 @@ class Formula1IE(InfoExtractor):
|
||||
'title': 'Race highlights - Spain 2016',
|
||||
},
|
||||
'add_ie': ['Ooyala'],
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.formula1.com/en/video/2016/5/Race_highlights_-_Spain_2016.html',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
display_id = self._match_id(url)
|
||||
|
||||
@@ -2,7 +2,10 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import smuggle_url
|
||||
from ..utils import (
|
||||
smuggle_url,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
class FOXIE(InfoExtractor):
|
||||
@@ -29,11 +32,12 @@ class FOXIE(InfoExtractor):
|
||||
|
||||
release_url = self._parse_json(self._search_regex(
|
||||
r'"fox_pdk_player"\s*:\s*({[^}]+?})', webpage, 'fox_pdk_player'),
|
||||
video_id)['release_url'] + '&switch=http'
|
||||
video_id)['release_url']
|
||||
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'ie_key': 'ThePlatform',
|
||||
'url': smuggle_url(release_url, {'force_smil_url': True}),
|
||||
'url': smuggle_url(update_url_query(
|
||||
release_url, {'switch': 'http'}), {'force_smil_url': True}),
|
||||
'id': video_id,
|
||||
}
|
||||
|
||||
@@ -131,7 +131,7 @@ class PluzzIE(FranceTVBaseInfoExtractor):
|
||||
|
||||
class FranceTvInfoIE(FranceTVBaseInfoExtractor):
|
||||
IE_NAME = 'francetvinfo.fr'
|
||||
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/.*/(?P<title>.+)\.html'
|
||||
_VALID_URL = r'https?://(?:www|mobile|france3-regions)\.francetvinfo\.fr/(?:[^/]+/)*(?P<title>[^/?#&.]+)'
|
||||
|
||||
_TESTS = [{
|
||||
'url': 'http://www.francetvinfo.fr/replay-jt/france-3/soir-3/jt-grand-soir-3-lundi-26-aout-2013_393427.html',
|
||||
@@ -206,6 +206,9 @@ class FranceTvInfoIE(FranceTVBaseInfoExtractor):
|
||||
'uploader_id': 'x2q2ez',
|
||||
},
|
||||
'add_ie': ['Dailymotion'],
|
||||
}, {
|
||||
'url': 'http://france3-regions.francetvinfo.fr/limousin/emissions/jt-1213-limousin',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
||||
@@ -2241,8 +2241,8 @@ class GenericIE(InfoExtractor):
|
||||
|
||||
# Looking for http://schema.org/VideoObject
|
||||
json_ld = self._search_json_ld(
|
||||
webpage, video_id, default=None, expected_type='VideoObject')
|
||||
if json_ld and json_ld.get('url'):
|
||||
webpage, video_id, default={}, expected_type='VideoObject')
|
||||
if json_ld.get('url'):
|
||||
info_dict.update({
|
||||
'title': video_title or info_dict['title'],
|
||||
'description': video_description,
|
||||
|
||||
@@ -1,48 +0,0 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class GoldenMoustacheIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?goldenmoustache\.com/(?P<display_id>[\w-]+)-(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.goldenmoustache.com/suricate-le-poker-3700/',
|
||||
'md5': '0f904432fa07da5054d6c8beb5efb51a',
|
||||
'info_dict': {
|
||||
'id': '3700',
|
||||
'ext': 'mp4',
|
||||
'title': 'Suricate - Le Poker',
|
||||
'description': 'md5:3d1f242f44f8c8cb0a106f1fd08e5dc9',
|
||||
'thumbnail': 're:^https?://.*\.jpg$',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://www.goldenmoustache.com/le-lab-tout-effacer-mc-fly-et-carlito-55249/',
|
||||
'md5': '27f0c50fb4dd5f01dc9082fc67cd5700',
|
||||
'info_dict': {
|
||||
'id': '55249',
|
||||
'ext': 'mp4',
|
||||
'title': 'Le LAB - Tout Effacer (Mc Fly et Carlito)',
|
||||
'description': 'md5:9b7fbf11023fb2250bd4b185e3de3b2a',
|
||||
'thumbnail': 're:^https?://.*\.(?:png|jpg)$',
|
||||
}
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
|
||||
video_url = self._html_search_regex(
|
||||
r'data-src-type="mp4" data-src="([^"]+)"', webpage, 'video URL')
|
||||
title = self._html_search_regex(
|
||||
r'<title>(.*?)(?: - Golden Moustache)?</title>', webpage, 'title')
|
||||
thumbnail = self._og_search_thumbnail(webpage)
|
||||
description = self._og_search_description(webpage)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'url': video_url,
|
||||
'ext': 'mp4',
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
}
|
||||
@@ -4,6 +4,7 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_urlparse
|
||||
from ..utils import (
|
||||
get_element_by_id,
|
||||
clean_html,
|
||||
@@ -242,8 +243,9 @@ class KuwoSingerIE(InfoExtractor):
|
||||
query={'artistId': artist_id, 'pn': page_num, 'rn': self.PAGE_SIZE})
|
||||
|
||||
return [
|
||||
self.url_result(song_url, 'Kuwo') for song_url in re.findall(
|
||||
r'<div[^>]+class="name"><a[^>]+href="(http://www\.kuwo\.cn/yinyue/\d+)',
|
||||
self.url_result(compat_urlparse.urljoin(url, song_url), 'Kuwo')
|
||||
for song_url in re.findall(
|
||||
r'<div[^>]+class="name"><a[^>]+href="(/yinyue/\d+)',
|
||||
webpage)
|
||||
]
|
||||
|
||||
|
||||
@@ -1,15 +1,14 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
from __future__ import unicode_literals, division
|
||||
|
||||
import re
|
||||
import math
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_chr
|
||||
from ..utils import (
|
||||
decode_png,
|
||||
determine_ext,
|
||||
encode_base_n,
|
||||
ExtractorError,
|
||||
mimetype2ext,
|
||||
)
|
||||
|
||||
|
||||
@@ -41,60 +40,6 @@ class OpenloadIE(InfoExtractor):
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
@staticmethod
|
||||
def openload_level2_debase(m):
|
||||
radix, num = int(m.group(1)) + 27, int(m.group(2))
|
||||
return '"' + encode_base_n(num, radix) + '"'
|
||||
|
||||
@classmethod
|
||||
def openload_level2(cls, txt):
|
||||
# The function name is ǃ \u01c3
|
||||
# Using escaped unicode literals does not work in Python 3.2
|
||||
return re.sub(r'ǃ\((\d+),(\d+)\)', cls.openload_level2_debase, txt, re.UNICODE).replace('"+"', '')
|
||||
|
||||
# Openload uses a variant of aadecode
|
||||
# openload_decode and related functions are originally written by
|
||||
# vitas@matfyz.cz and released with public domain
|
||||
# See https://github.com/rg3/youtube-dl/issues/8489
|
||||
@classmethod
|
||||
def openload_decode(cls, txt):
|
||||
symbol_table = [
|
||||
('_', '(゚Д゚) [゚Θ゚]'),
|
||||
('a', '(゚Д゚) [゚ω゚ノ]'),
|
||||
('b', '(゚Д゚) [゚Θ゚ノ]'),
|
||||
('c', '(゚Д゚) [\'c\']'),
|
||||
('d', '(゚Д゚) [゚ー゚ノ]'),
|
||||
('e', '(゚Д゚) [゚Д゚ノ]'),
|
||||
('f', '(゚Д゚) [1]'),
|
||||
|
||||
('o', '(゚Д゚) [\'o\']'),
|
||||
('u', '(o゚ー゚o)'),
|
||||
('c', '(゚Д゚) [\'c\']'),
|
||||
|
||||
('7', '((゚ー゚) + (o^_^o))'),
|
||||
('6', '((o^_^o) +(o^_^o) +(c^_^o))'),
|
||||
('5', '((゚ー゚) + (゚Θ゚))'),
|
||||
('4', '(-~3)'),
|
||||
('3', '(-~-~1)'),
|
||||
('2', '(-~1)'),
|
||||
('1', '(-~0)'),
|
||||
('0', '((c^_^o)-(c^_^o))'),
|
||||
]
|
||||
delim = '(゚Д゚)[゚ε゚]+'
|
||||
ret = ''
|
||||
for aachar in txt.split(delim):
|
||||
for val, pat in symbol_table:
|
||||
aachar = aachar.replace(pat, val)
|
||||
aachar = aachar.replace('+ ', '')
|
||||
m = re.match(r'^\d+', aachar)
|
||||
if m:
|
||||
ret += compat_chr(int(m.group(0), 8))
|
||||
else:
|
||||
m = re.match(r'^u([\da-f]+)', aachar)
|
||||
if m:
|
||||
ret += compat_chr(int(m.group(1), 16))
|
||||
return cls.openload_level2(ret)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
@@ -102,29 +47,77 @@ class OpenloadIE(InfoExtractor):
|
||||
if 'File not found' in webpage:
|
||||
raise ExtractorError('File not found', expected=True)
|
||||
|
||||
code = self._search_regex(
|
||||
r'</video>\s*</div>\s*<script[^>]+>[^>]+</script>\s*<script[^>]+>([^<]+)</script>',
|
||||
webpage, 'JS code')
|
||||
# The following extraction logic is proposed by @Belderak and @gdkchan
|
||||
# and declared to be used freely in youtube-dl
|
||||
# See https://github.com/rg3/youtube-dl/issues/9706
|
||||
|
||||
decoded = self.openload_decode(code)
|
||||
numbers_js = self._download_webpage(
|
||||
'https://openload.co/assets/js/obfuscator/n.js', video_id,
|
||||
note='Downloading signature numbers')
|
||||
signums = self._search_regex(
|
||||
r'window\.signatureNumbers\s*=\s*[\'"](?P<data>[a-z]+)[\'"]',
|
||||
numbers_js, 'signature numbers', group='data')
|
||||
|
||||
video_url = self._search_regex(
|
||||
r'return\s+"(https?://[^"]+)"', decoded, 'video URL')
|
||||
linkimg_uri = self._search_regex(
|
||||
r'<img[^>]+id="linkimg"[^>]+src="([^"]+)"', webpage, 'link image')
|
||||
linkimg = self._request_webpage(
|
||||
linkimg_uri, video_id, note=False).read()
|
||||
|
||||
width, height, pixels = decode_png(linkimg)
|
||||
|
||||
output = ''
|
||||
for y in range(height):
|
||||
for x in range(width):
|
||||
r, g, b = pixels[y][3 * x:3 * x + 3]
|
||||
if r == 0 and g == 0 and b == 0:
|
||||
break
|
||||
else:
|
||||
output += compat_chr(r)
|
||||
output += compat_chr(g)
|
||||
output += compat_chr(b)
|
||||
|
||||
img_str_length = len(output) // 200
|
||||
img_str = [[0 for x in range(img_str_length)] for y in range(10)]
|
||||
|
||||
sig_str_length = len(signums) // 260
|
||||
sig_str = [[0 for x in range(sig_str_length)] for y in range(10)]
|
||||
|
||||
for i in range(10):
|
||||
for j in range(img_str_length):
|
||||
begin = i * img_str_length * 20 + j * 20
|
||||
img_str[i][j] = output[begin:begin + 20]
|
||||
for j in range(sig_str_length):
|
||||
begin = i * sig_str_length * 26 + j * 26
|
||||
sig_str[i][j] = signums[begin:begin + 26]
|
||||
|
||||
parts = []
|
||||
# TODO: find better names for str_, chr_ and sum_
|
||||
str_ = ''
|
||||
for i in [2, 3, 5, 7]:
|
||||
str_ = ''
|
||||
sum_ = float(99)
|
||||
for j in range(len(sig_str[i])):
|
||||
for chr_idx in range(len(img_str[i][j])):
|
||||
if sum_ > float(122):
|
||||
sum_ = float(98)
|
||||
chr_ = compat_chr(int(math.floor(sum_)))
|
||||
if sig_str[i][j][chr_idx] == chr_ and j >= len(str_):
|
||||
sum_ += float(2.5)
|
||||
str_ += img_str[i][j][chr_idx]
|
||||
parts.append(str_.replace(',', ''))
|
||||
|
||||
video_url = 'https://openload.co/stream/%s~%s~%s~%s' % (parts[3], parts[1], parts[2], parts[0])
|
||||
|
||||
title = self._og_search_title(webpage, default=None) or self._search_regex(
|
||||
r'<span[^>]+class=["\']title["\'][^>]*>([^<]+)', webpage,
|
||||
'title', default=None) or self._html_search_meta(
|
||||
'description', webpage, 'title', fatal=True)
|
||||
|
||||
ext = mimetype2ext(self._search_regex(
|
||||
r'window\.vt\s*=\s*(["\'])(?P<mimetype>.+?)\1', decoded,
|
||||
'mimetype', default=None, group='mimetype')) or determine_ext(
|
||||
video_url, 'mp4')
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'ext': ext,
|
||||
'thumbnail': self._og_search_thumbnail(webpage, default=None),
|
||||
'url': video_url,
|
||||
# Seems all videos have extensions in their titles
|
||||
'ext': determine_ext(title),
|
||||
}
|
||||
|
||||
58
youtube_dl/extractor/pokemon.py
Normal file
58
youtube_dl/extractor/pokemon.py
Normal file
@@ -0,0 +1,58 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
extract_attributes,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
class PokemonIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?pokemon\.com/[a-z]{2}(?:.*?play=(?P<id>[a-z0-9]{32})|/[^/]+/\d+_\d+-(?P<display_id>[^/?#]+))'
|
||||
_TESTS = [{
|
||||
'url': 'http://www.pokemon.com/us/pokemon-episodes/19_01-from-a-to-z/?play=true',
|
||||
'md5': '9fb209ae3a569aac25de0f5afc4ee08f',
|
||||
'info_dict': {
|
||||
'id': 'd0436c00c3ce4071ac6cee8130ac54a1',
|
||||
'ext': 'mp4',
|
||||
'title': 'From A to Z!',
|
||||
'description': 'Bonnie makes a new friend, Ash runs into an old friend, and a terrifying premonition begins to unfold!',
|
||||
'timestamp': 1460478136,
|
||||
'upload_date': '20160412',
|
||||
},
|
||||
'add_id': ['LimelightMedia']
|
||||
}, {
|
||||
'url': 'http://www.pokemon.com/uk/pokemon-episodes/?play=2e8b5c761f1d4a9286165d7748c1ece2',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.pokemon.com/fr/episodes-pokemon/18_09-un-hiver-inattendu/',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://www.pokemon.com/de/pokemon-folgen/01_20-bye-bye-smettbo/',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id, display_id = re.match(self._VALID_URL, url).groups()
|
||||
webpage = self._download_webpage(url, video_id or display_id)
|
||||
video_data = extract_attributes(self._search_regex(
|
||||
r'(<[^>]+data-video-id="%s"[^>]*>)' % (video_id if video_id else '[a-z0-9]{32}'),
|
||||
webpage, 'video data element'))
|
||||
video_id = video_data['data-video-id']
|
||||
title = video_data['data-video-title']
|
||||
return {
|
||||
'_type': 'url_transparent',
|
||||
'id': video_id,
|
||||
'url': 'limelight:media:%s' % video_id,
|
||||
'title': title,
|
||||
'description': video_data.get('data-video-summary'),
|
||||
'thumbnail': video_data.get('data-video-poster'),
|
||||
'series': 'Pokémon',
|
||||
'season_number': int_or_none(video_data.get('data-video-season')),
|
||||
'episode': title,
|
||||
'episode_number': int_or_none(video_data.get('data-video-episode')),
|
||||
'ie_key': 'LimelightMedia',
|
||||
}
|
||||
@@ -1,55 +1,71 @@
|
||||
# encoding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
import json
|
||||
import re
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..compat import compat_str
|
||||
from ..utils import (
|
||||
ExtractorError,
|
||||
clean_html,
|
||||
int_or_none,
|
||||
unified_timestamp,
|
||||
update_url_query,
|
||||
)
|
||||
|
||||
|
||||
class RBMARadioIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<videoID>[^/]+)$'
|
||||
_VALID_URL = r'https?://(?:www\.)?rbmaradio\.com/shows/(?P<show_id>[^/]+)/episodes/(?P<id>[^/?#&]+)'
|
||||
_TEST = {
|
||||
'url': 'http://www.rbmaradio.com/shows/ford-lopatin-live-at-primavera-sound-2011',
|
||||
'url': 'https://www.rbmaradio.com/shows/main-stage/episodes/ford-lopatin-live-at-primavera-sound-2011',
|
||||
'md5': '6bc6f9bcb18994b4c983bc3bf4384d95',
|
||||
'info_dict': {
|
||||
'id': 'ford-lopatin-live-at-primavera-sound-2011',
|
||||
'ext': 'mp3',
|
||||
'uploader_id': 'ford-lopatin',
|
||||
'location': 'Spain',
|
||||
'description': 'Joel Ford and Daniel ’Oneohtrix Point Never’ Lopatin fly their midified pop extravaganza to Spain. Live at Primavera Sound 2011.',
|
||||
'uploader': 'Ford & Lopatin',
|
||||
'title': 'Live at Primavera Sound 2011',
|
||||
'title': 'Main Stage - Ford & Lopatin',
|
||||
'description': 'md5:4f340fb48426423530af5a9d87bd7b91',
|
||||
'thumbnail': 're:^https?://.*\.jpg',
|
||||
'duration': 2452,
|
||||
'timestamp': 1307103164,
|
||||
'upload_date': '20110603',
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
m = re.match(self._VALID_URL, url)
|
||||
video_id = m.group('videoID')
|
||||
mobj = re.match(self._VALID_URL, url)
|
||||
show_id = mobj.group('show_id')
|
||||
episode_id = mobj.group('id')
|
||||
|
||||
webpage = self._download_webpage(url, video_id)
|
||||
webpage = self._download_webpage(url, episode_id)
|
||||
|
||||
json_data = self._search_regex(r'window\.gon.*?gon\.show=(.+?);$',
|
||||
webpage, 'json data', flags=re.MULTILINE)
|
||||
episode = self._parse_json(
|
||||
self._search_regex(
|
||||
r'__INITIAL_STATE__\s*=\s*({.+?})\s*</script>',
|
||||
webpage, 'json data'),
|
||||
episode_id)['episodes'][show_id][episode_id]
|
||||
|
||||
try:
|
||||
data = json.loads(json_data)
|
||||
except ValueError as e:
|
||||
raise ExtractorError('Invalid JSON: ' + str(e))
|
||||
title = episode['title']
|
||||
|
||||
video_url = data['akamai_url'] + '&cbr=256'
|
||||
show_title = episode.get('showTitle')
|
||||
if show_title:
|
||||
title = '%s - %s' % (show_title, title)
|
||||
|
||||
formats = [{
|
||||
'url': update_url_query(episode['audioURL'], query={'cbr': abr}),
|
||||
'format_id': compat_str(abr),
|
||||
'abr': abr,
|
||||
'vcodec': 'none',
|
||||
} for abr in (96, 128, 256)]
|
||||
|
||||
description = clean_html(episode.get('longTeaser'))
|
||||
thumbnail = self._proto_relative_url(episode.get('imageURL', {}).get('landscape'))
|
||||
duration = int_or_none(episode.get('duration'))
|
||||
timestamp = unified_timestamp(episode.get('publishedAt'))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'url': video_url,
|
||||
'title': data['title'],
|
||||
'description': data.get('teaser_text'),
|
||||
'location': data.get('country_of_origin'),
|
||||
'uploader': data.get('host', {}).get('name'),
|
||||
'uploader_id': data.get('host', {}).get('slug'),
|
||||
'thumbnail': data.get('image', {}).get('large_url_2x'),
|
||||
'duration': data.get('duration'),
|
||||
'id': episode_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'thumbnail': thumbnail,
|
||||
'duration': duration,
|
||||
'timestamp': timestamp,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
50
youtube_dl/extractor/rozhlas.py
Normal file
50
youtube_dl/extractor/rozhlas.py
Normal file
@@ -0,0 +1,50 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
int_or_none,
|
||||
remove_start,
|
||||
)
|
||||
|
||||
|
||||
class RozhlasIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?prehravac\.rozhlas\.cz/audio/(?P<id>[0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://prehravac.rozhlas.cz/audio/3421320',
|
||||
'md5': '504c902dbc9e9a1fd50326eccf02a7e2',
|
||||
'info_dict': {
|
||||
'id': '3421320',
|
||||
'ext': 'mp3',
|
||||
'title': 'Echo Pavla Klusáka (30.06.2015 21:00)',
|
||||
'description': 'Osmdesátiny Terryho Rileyho jsou skvělou příležitostí proletět se elektronickými i akustickými díly zakladatatele minimalismu, který je aktivní už přes padesát let'
|
||||
}
|
||||
}, {
|
||||
'url': 'http://prehravac.rozhlas.cz/audio/3421320/embed',
|
||||
'skip_download': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
audio_id = self._match_id(url)
|
||||
|
||||
webpage = self._download_webpage(
|
||||
'http://prehravac.rozhlas.cz/audio/%s' % audio_id, audio_id)
|
||||
|
||||
title = self._html_search_regex(
|
||||
r'<h3>(.+?)</h3>\s*<p[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
|
||||
webpage, 'title', default=None) or remove_start(
|
||||
self._og_search_title(webpage), 'Radio Wave - ')
|
||||
description = self._html_search_regex(
|
||||
r'<p[^>]+title=(["\'])(?P<url>(?:(?!\1).)+)\1[^>]*>.*?</p>\s*<div[^>]+id=["\']player-track',
|
||||
webpage, 'description', fatal=False, group='url')
|
||||
duration = int_or_none(self._search_regex(
|
||||
r'data-duration=["\'](\d+)', webpage, 'duration', default=None))
|
||||
|
||||
return {
|
||||
'id': audio_id,
|
||||
'url': 'http://media.rozhlas.cz/_audio/%s.mp3' % audio_id,
|
||||
'title': title,
|
||||
'description': description,
|
||||
'duration': duration,
|
||||
'vcodec': 'none',
|
||||
}
|
||||
@@ -14,7 +14,7 @@ class RtlNlIE(InfoExtractor):
|
||||
_VALID_URL = r'''(?x)
|
||||
https?://(?:www\.)?
|
||||
(?:
|
||||
rtlxl\.nl/\#!/[^/]+/|
|
||||
rtlxl\.nl/[^\#]*\#!/[^/]+/|
|
||||
rtl\.nl/system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html\b.+?\buuid=
|
||||
)
|
||||
(?P<id>[0-9a-f-]+)'''
|
||||
@@ -67,6 +67,9 @@ class RtlNlIE(InfoExtractor):
|
||||
}, {
|
||||
'url': 'http://www.rtl.nl/system/videoplayer/derden/embed.html#!/uuid=bb0353b0-d6a4-1dad-90e9-18fe75b8d1f0',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://rtlxl.nl/?_ga=1.204735956.572365465.1466978370#!/rtl-nieuws-132237/3c487912-023b-49ac-903e-2c5d79f8410f',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
def _real_extract(self, url):
|
||||
|
||||
@@ -14,10 +14,10 @@ from ..utils import ExtractorError
|
||||
class SohuIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?P<mytv>my\.)?tv\.sohu\.com/.+?/(?(mytv)|n)(?P<id>\d+)\.shtml.*?'
|
||||
|
||||
# Sohu videos give different MD5 sums on Travis CI and my machine
|
||||
_TESTS = [{
|
||||
'note': 'This video is available only in Mainland China',
|
||||
'url': 'http://tv.sohu.com/20130724/n382479172.shtml#super',
|
||||
'md5': '29175c8cadd8b5cc4055001e85d6b372',
|
||||
'info_dict': {
|
||||
'id': '382479172',
|
||||
'ext': 'mp4',
|
||||
@@ -26,7 +26,6 @@ class SohuIE(InfoExtractor):
|
||||
'skip': 'On available in China',
|
||||
}, {
|
||||
'url': 'http://tv.sohu.com/20150305/n409385080.shtml',
|
||||
'md5': '699060e75cf58858dd47fb9c03c42cfb',
|
||||
'info_dict': {
|
||||
'id': '409385080',
|
||||
'ext': 'mp4',
|
||||
@@ -34,7 +33,6 @@ class SohuIE(InfoExtractor):
|
||||
}
|
||||
}, {
|
||||
'url': 'http://my.tv.sohu.com/us/232799889/78693464.shtml',
|
||||
'md5': '9bf34be48f2f4dadcb226c74127e203c',
|
||||
'info_dict': {
|
||||
'id': '78693464',
|
||||
'ext': 'mp4',
|
||||
@@ -48,7 +46,6 @@ class SohuIE(InfoExtractor):
|
||||
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
|
||||
},
|
||||
'playlist': [{
|
||||
'md5': 'bdbfb8f39924725e6589c146bc1883ad',
|
||||
'info_dict': {
|
||||
'id': '78910339_part1',
|
||||
'ext': 'mp4',
|
||||
@@ -56,7 +53,6 @@ class SohuIE(InfoExtractor):
|
||||
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
|
||||
}
|
||||
}, {
|
||||
'md5': '3e1f46aaeb95354fd10e7fca9fc1804e',
|
||||
'info_dict': {
|
||||
'id': '78910339_part2',
|
||||
'ext': 'mp4',
|
||||
@@ -64,7 +60,6 @@ class SohuIE(InfoExtractor):
|
||||
'title': '【神探苍实战秘籍】第13期 战争之影 赫卡里姆',
|
||||
}
|
||||
}, {
|
||||
'md5': '8407e634175fdac706766481b9443450',
|
||||
'info_dict': {
|
||||
'id': '78910339_part3',
|
||||
'ext': 'mp4',
|
||||
|
||||
34
youtube_dl/extractor/sonyliv.py
Normal file
34
youtube_dl/extractor/sonyliv.py
Normal file
@@ -0,0 +1,34 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
|
||||
|
||||
class SonyLIVIE(InfoExtractor):
|
||||
_VALID_URL = r'https?://(?:www\.)?sonyliv\.com/details/[^/]+/(?P<id>\d+)'
|
||||
_TESTS = [{
|
||||
'url': "http://www.sonyliv.com/details/episodes/5024612095001/Ep.-1---Achaari-Cheese-Toast---Bachelor's-Delight",
|
||||
'info_dict': {
|
||||
'title': "Ep. 1 - Achaari Cheese Toast - Bachelor's Delight",
|
||||
'id': '5024612095001',
|
||||
'ext': 'mp4',
|
||||
'upload_date': '20160707',
|
||||
'description': 'md5:7f28509a148d5be9d0782b4d5106410d',
|
||||
'uploader_id': '4338955589001',
|
||||
'timestamp': 1467870968,
|
||||
},
|
||||
'params': {
|
||||
'skip_download': True,
|
||||
},
|
||||
'add_ie': ['BrightcoveNew'],
|
||||
}, {
|
||||
'url': 'http://www.sonyliv.com/details/full%20movie/4951168986001/Sei-Raat-(Bangla)',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
BRIGHTCOVE_URL_TEMPLATE = 'http://players.brightcove.net/4338955589001/default_default/index.html?videoId=%s'
|
||||
|
||||
def _real_extract(self, url):
|
||||
brightcove_id = self._match_id(url)
|
||||
return self.url_result(
|
||||
self.BRIGHTCOVE_URL_TEMPLATE % brightcove_id, 'BrightcoveNew', brightcove_id)
|
||||
@@ -118,8 +118,12 @@ class TNAFlixNetworkBaseIE(InfoExtractor):
|
||||
xpath_text(cfg_xml, './startThumb', 'thumbnail'), 'http:')
|
||||
thumbnails = self._extract_thumbnails(cfg_xml)
|
||||
|
||||
title = self._html_search_regex(
|
||||
self._TITLE_REGEX, webpage, 'title') if self._TITLE_REGEX else self._og_search_title(webpage)
|
||||
title = None
|
||||
if self._TITLE_REGEX:
|
||||
title = self._html_search_regex(
|
||||
self._TITLE_REGEX, webpage, 'title', default=None)
|
||||
if not title:
|
||||
title = self._og_search_title(webpage)
|
||||
|
||||
age_limit = self._rta_search(webpage) or 18
|
||||
|
||||
@@ -189,9 +193,9 @@ class TNAFlixNetworkEmbedIE(TNAFlixNetworkBaseIE):
|
||||
class TNAFlixIE(TNAFlixNetworkBaseIE):
|
||||
_VALID_URL = r'https?://(?:www\.)?tnaflix\.com/[^/]+/(?P<display_id>[^/]+)/video(?P<id>\d+)'
|
||||
|
||||
_TITLE_REGEX = r'<title>(.+?) - TNAFlix Porn Videos</title>'
|
||||
_DESCRIPTION_REGEX = r'<meta[^>]+name="description"[^>]+content="([^"]+)"'
|
||||
_UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h1>(.+?)</h1>'
|
||||
_TITLE_REGEX = r'<title>(.+?) - (?:TNAFlix Porn Videos|TNAFlix\.com)</title>'
|
||||
_DESCRIPTION_REGEX = r'(?s)>Description:</[^>]+>(.+?)<'
|
||||
_UPLOADER_REGEX = r'<i>\s*Verified Member\s*</i>\s*<h\d+>(.+?)<'
|
||||
_CATEGORIES_REGEX = r'(?s)<span[^>]*>Categories:</span>(.+?)</div>'
|
||||
|
||||
_TESTS = [{
|
||||
|
||||
128
youtube_dl/extractor/uol.py
Normal file
128
youtube_dl/extractor/uol.py
Normal file
@@ -0,0 +1,128 @@
|
||||
# coding: utf-8
|
||||
from __future__ import unicode_literals
|
||||
|
||||
from .common import InfoExtractor
|
||||
from ..utils import (
|
||||
clean_html,
|
||||
int_or_none,
|
||||
parse_duration,
|
||||
update_url_query,
|
||||
str_or_none,
|
||||
)
|
||||
|
||||
|
||||
class UOLIE(InfoExtractor):
|
||||
IE_NAME = 'uol.com.br'
|
||||
_VALID_URL = r'https?://(?:.+?\.)?uol\.com\.br/.*?(?:(?:mediaId|v)=|view/(?:[a-z0-9]+/)?|video(?:=|/(?:\d{4}/\d{2}/\d{2}/)?))(?P<id>\d+|[\w-]+-[A-Z0-9]+)'
|
||||
_TESTS = [{
|
||||
'url': 'http://player.mais.uol.com.br/player_video_v3.swf?mediaId=15951931',
|
||||
'md5': '25291da27dc45e0afb5718a8603d3816',
|
||||
'info_dict': {
|
||||
'id': '15951931',
|
||||
'ext': 'mp4',
|
||||
'title': 'Miss simpatia é encontrada morta',
|
||||
'description': 'md5:3f8c11a0c0556d66daf7e5b45ef823b2',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://tvuol.uol.com.br/video/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
|
||||
'md5': 'e41a2fb7b7398a3a46b6af37b15c00c9',
|
||||
'info_dict': {
|
||||
'id': '15954259',
|
||||
'ext': 'mp4',
|
||||
'title': 'Incêndio destrói uma das maiores casas noturnas de Londres',
|
||||
'description': 'Em Londres, um incêndio destruiu uma das maiores boates da cidade. Não há informações sobre vítimas.',
|
||||
}
|
||||
}, {
|
||||
'url': 'http://mais.uol.com.br/static/uolplayer/index.html?mediaId=15951931',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://mais.uol.com.br/view/15954259',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://noticias.band.uol.com.br/brasilurgente/video/2016/08/05/15951931/miss-simpatia-e-encontrada-morta.html',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://videos.band.uol.com.br/programa.asp?e=noticias&pr=brasil-urgente&v=15951931&t=Policia-desmonte-base-do-PCC-na-Cracolandia',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://mais.uol.com.br/view/cphaa0gl2x8r/incendio-destroi-uma-das-maiores-casas-noturnas-de-londres-04024E9A3268D4C95326',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://noticias.uol.com.br//videos/assistir.htm?video=rafaela-silva-inspira-criancas-no-judo-04024D983968D4C95326',
|
||||
'only_matching': True,
|
||||
}, {
|
||||
'url': 'http://mais.uol.com.br/view/e0qbgxid79uv/15275470',
|
||||
'only_matching': True,
|
||||
}]
|
||||
|
||||
_FORMATS = {
|
||||
'2': {
|
||||
'width': 640,
|
||||
'height': 360,
|
||||
},
|
||||
'5': {
|
||||
'width': 1080,
|
||||
'height': 720,
|
||||
},
|
||||
'6': {
|
||||
'width': 426,
|
||||
'height': 240,
|
||||
},
|
||||
'7': {
|
||||
'width': 1920,
|
||||
'height': 1080,
|
||||
},
|
||||
'8': {
|
||||
'width': 192,
|
||||
'height': 144,
|
||||
},
|
||||
'9': {
|
||||
'width': 568,
|
||||
'height': 320,
|
||||
},
|
||||
}
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
if not video_id.isdigit():
|
||||
embed_page = self._download_webpage('https://jsuol.com.br/c/tv/uol/embed/?params=[embed,%s]' % video_id, video_id)
|
||||
video_id = self._search_regex(r'mediaId=(\d+)', embed_page, 'media id')
|
||||
video_data = self._download_json(
|
||||
'http://mais.uol.com.br/apiuol/v3/player/getMedia/%s.json' % video_id,
|
||||
video_id)['item']
|
||||
title = video_data['title']
|
||||
|
||||
query = {
|
||||
'ver': video_data.get('numRevision', 2),
|
||||
'r': 'http://mais.uol.com.br',
|
||||
}
|
||||
formats = []
|
||||
for f in video_data.get('formats', []):
|
||||
f_url = f.get('url') or f.get('secureUrl')
|
||||
if not f_url:
|
||||
continue
|
||||
format_id = str_or_none(f.get('id'))
|
||||
fmt = {
|
||||
'format_id': format_id,
|
||||
'url': update_url_query(f_url, query),
|
||||
}
|
||||
fmt.update(self._FORMATS.get(format_id, {}))
|
||||
formats.append(fmt)
|
||||
self._sort_formats(formats)
|
||||
|
||||
tags = []
|
||||
for tag in video_data.get('tags', []):
|
||||
tag_description = tag.get('description')
|
||||
if not tag_description:
|
||||
continue
|
||||
tags.append(tag_description)
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': title,
|
||||
'description': clean_html(video_data.get('desMedia')),
|
||||
'thumbnail': video_data.get('thumbnail'),
|
||||
'duration': int_or_none(video_data.get('durationSeconds')) or parse_duration(video_data.get('duration')),
|
||||
'tags': tags,
|
||||
'formats': formats,
|
||||
}
|
||||
@@ -9,6 +9,7 @@ from ..utils import (
|
||||
ExtractorError,
|
||||
unified_strdate,
|
||||
HEADRequest,
|
||||
int_or_none,
|
||||
)
|
||||
|
||||
|
||||
@@ -30,48 +31,58 @@ class WatIE(InfoExtractor):
|
||||
},
|
||||
{
|
||||
'url': 'http://www.wat.tv/video/gregory-lemarchal-voix-ange-6z1v7_6ygkj_.html',
|
||||
'md5': 'fbc84e4378165278e743956d9c1bf16b',
|
||||
'md5': '34bdfa5ca9fd3c7eb88601b635b0424c',
|
||||
'info_dict': {
|
||||
'id': '11713075',
|
||||
'ext': 'mp4',
|
||||
'title': 'Grégory Lemarchal, une voix d\'ange depuis 10 ans (1/3)',
|
||||
'description': 'md5:b7a849cf16a2b733d9cd10c52906dee3',
|
||||
'upload_date': '20140816',
|
||||
'duration': 2910,
|
||||
},
|
||||
'skip': "Ce contenu n'est pas disponible pour l'instant.",
|
||||
'expected_warnings': ["Ce contenu n'est pas disponible pour l'instant."],
|
||||
},
|
||||
]
|
||||
|
||||
_FORMATS = (
|
||||
(200, 416, 234),
|
||||
(400, 480, 270),
|
||||
(600, 640, 360),
|
||||
(1200, 640, 360),
|
||||
(1800, 960, 540),
|
||||
(2500, 1280, 720),
|
||||
)
|
||||
|
||||
def _real_extract(self, url):
|
||||
video_id = self._match_id(url)
|
||||
video_id = video_id if video_id.isdigit() and len(video_id) > 6 else compat_str(int(video_id, 36))
|
||||
|
||||
# 'contentv4' is used in the website, but it also returns the related
|
||||
# videos, we don't need them
|
||||
video_info = self._download_json(
|
||||
'http://www.wat.tv/interface/contentv3/' + video_id, video_id)['media']
|
||||
video_data = self._download_json(
|
||||
'http://www.wat.tv/interface/contentv4s/' + video_id, video_id)
|
||||
video_info = video_data['media']
|
||||
|
||||
error_desc = video_info.get('error_desc')
|
||||
if error_desc:
|
||||
raise ExtractorError(
|
||||
'%s returned error: %s' % (self.IE_NAME, error_desc), expected=True)
|
||||
self.report_warning(
|
||||
'%s returned error: %s' % (self.IE_NAME, error_desc))
|
||||
|
||||
chapters = video_info['chapters']
|
||||
first_chapter = chapters[0]
|
||||
if chapters:
|
||||
first_chapter = chapters[0]
|
||||
|
||||
def video_id_for_chapter(chapter):
|
||||
return chapter['tc_start'].split('-')[0]
|
||||
def video_id_for_chapter(chapter):
|
||||
return chapter['tc_start'].split('-')[0]
|
||||
|
||||
if video_id_for_chapter(first_chapter) != video_id:
|
||||
self.to_screen('Multipart video detected')
|
||||
entries = [self.url_result('wat:%s' % video_id_for_chapter(chapter)) for chapter in chapters]
|
||||
return self.playlist_result(entries, video_id, video_info['title'])
|
||||
# Otherwise we can continue and extract just one part, we have to use
|
||||
# the video id for getting the video url
|
||||
if video_id_for_chapter(first_chapter) != video_id:
|
||||
self.to_screen('Multipart video detected')
|
||||
entries = [self.url_result('wat:%s' % video_id_for_chapter(chapter)) for chapter in chapters]
|
||||
return self.playlist_result(entries, video_id, video_info['title'])
|
||||
# Otherwise we can continue and extract just one part, we have to use
|
||||
# the video id for getting the video url
|
||||
else:
|
||||
first_chapter = video_info
|
||||
|
||||
date_diffusion = first_chapter.get('date_diffusion')
|
||||
upload_date = unified_strdate(date_diffusion) if date_diffusion else None
|
||||
title = first_chapter['title']
|
||||
|
||||
def extract_url(path_template, url_type):
|
||||
req_url = 'http://www.wat.tv/get/%s' % (path_template % video_id)
|
||||
@@ -83,36 +94,61 @@ class WatIE(InfoExtractor):
|
||||
expected=True)
|
||||
return red_url
|
||||
|
||||
m3u8_url = extract_url('ipad/%s.m3u8', 'm3u8')
|
||||
http_url = extract_url('android5/%s.mp4', 'http')
|
||||
|
||||
formats = []
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
|
||||
formats.extend(m3u8_formats)
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
m3u8_url.replace('ios.', 'web.').replace('.m3u8', '.f4m'),
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
for m3u8_format in m3u8_formats:
|
||||
vbr, abr = m3u8_format.get('vbr'), m3u8_format.get('abr')
|
||||
if not vbr or not abr:
|
||||
continue
|
||||
f = m3u8_format.copy()
|
||||
f.update({
|
||||
'url': re.sub(r'%s-\d+00-\d+' % video_id, '%s-%d00-%d' % (video_id, round(vbr / 100), round(abr)), http_url),
|
||||
'format_id': f['format_id'].replace('hls', 'http'),
|
||||
'protocol': 'http',
|
||||
})
|
||||
formats.append(f)
|
||||
self._sort_formats(formats)
|
||||
try:
|
||||
http_url = extract_url('android5/%s.mp4', 'http')
|
||||
m3u8_url = extract_url('ipad/%s.m3u8', 'm3u8')
|
||||
m3u8_formats = self._extract_m3u8_formats(
|
||||
m3u8_url, video_id, 'mp4', 'm3u8_native', m3u8_id='hls')
|
||||
formats.extend(m3u8_formats)
|
||||
formats.extend(self._extract_f4m_formats(
|
||||
m3u8_url.replace('ios.', 'web.').replace('.m3u8', '.f4m'),
|
||||
video_id, f4m_id='hds', fatal=False))
|
||||
for m3u8_format in m3u8_formats:
|
||||
vbr, abr = m3u8_format.get('vbr'), m3u8_format.get('abr')
|
||||
if not vbr or not abr:
|
||||
continue
|
||||
format_id = m3u8_format['format_id'].replace('hls', 'http')
|
||||
fmt_url = re.sub(r'%s-\d+00-\d+' % video_id, '%s-%d00-%d' % (video_id, round(vbr / 100), round(abr)), http_url)
|
||||
if self._is_valid_url(fmt_url, video_id, format_id):
|
||||
f = m3u8_format.copy()
|
||||
f.update({
|
||||
'url': fmt_url,
|
||||
'format_id': format_id,
|
||||
'protocol': 'http',
|
||||
})
|
||||
formats.append(f)
|
||||
self._sort_formats(formats)
|
||||
except ExtractorError:
|
||||
abr = 64
|
||||
for vbr, width, height in self._FORMATS:
|
||||
tbr = vbr + abr
|
||||
format_id = 'http-%s' % tbr
|
||||
fmt_url = 'http://dnl.adv.tf1.fr/2/USP-0x0/%s/%s/%s/ssm/%s-%s-64k.mp4' % (video_id[-4:-2], video_id[-2:], video_id, video_id, vbr)
|
||||
if self._is_valid_url(fmt_url, video_id, format_id):
|
||||
formats.append({
|
||||
'format_id': format_id,
|
||||
'url': fmt_url,
|
||||
'vbr': vbr,
|
||||
'abr': abr,
|
||||
'width': width,
|
||||
'height': height,
|
||||
})
|
||||
|
||||
date_diffusion = first_chapter.get('date_diffusion') or video_data.get('configv4', {}).get('estatS4')
|
||||
upload_date = unified_strdate(date_diffusion) if date_diffusion else None
|
||||
duration = None
|
||||
files = video_info['files']
|
||||
if files:
|
||||
duration = int_or_none(files[0].get('duration'))
|
||||
|
||||
return {
|
||||
'id': video_id,
|
||||
'title': first_chapter['title'],
|
||||
'thumbnail': first_chapter['preview'],
|
||||
'description': first_chapter['description'],
|
||||
'view_count': video_info['views'],
|
||||
'title': title,
|
||||
'thumbnail': first_chapter.get('preview'),
|
||||
'description': first_chapter.get('description'),
|
||||
'view_count': int_or_none(video_info.get('views')),
|
||||
'upload_date': upload_date,
|
||||
'duration': video_info['files'][0]['duration'],
|
||||
'duration': duration,
|
||||
'formats': formats,
|
||||
}
|
||||
|
||||
@@ -499,9 +499,20 @@ def parseOpts(overrideArguments=None):
|
||||
dest='bidi_workaround', action='store_true',
|
||||
help='Work around terminals that lack bidirectional text support. Requires bidiv or fribidi executable in PATH')
|
||||
workarounds.add_option(
|
||||
'--sleep-interval', metavar='SECONDS',
|
||||
'--sleep-interval', '--min-sleep-interval', metavar='SECONDS',
|
||||
dest='sleep_interval', type=float,
|
||||
help='Number of seconds to sleep before each download.')
|
||||
help=(
|
||||
'Number of seconds to sleep before each download when used alone '
|
||||
'or a lower bound of a range for randomized sleep before each download '
|
||||
'(minimum possible number of seconds to sleep) when used along with '
|
||||
'--max-sleep-interval.'))
|
||||
workarounds.add_option(
|
||||
'--max-sleep-interval', metavar='SECONDS',
|
||||
dest='max_sleep_interval', type=float,
|
||||
help=(
|
||||
'Upper bound of a range for randomized sleep before each download '
|
||||
'(maximum possible number of seconds to sleep). Must only be used '
|
||||
'along with --min-sleep-interval.'))
|
||||
|
||||
verbosity = optparse.OptionGroup(parser, 'Verbosity / Simulation Options')
|
||||
verbosity.add_option(
|
||||
|
||||
@@ -3,11 +3,6 @@ from __future__ import unicode_literals
|
||||
import re
|
||||
|
||||
from .common import PostProcessor
|
||||
from ..utils import PostProcessingError
|
||||
|
||||
|
||||
class MetadataFromTitlePPError(PostProcessingError):
|
||||
pass
|
||||
|
||||
|
||||
class MetadataFromTitlePP(PostProcessor):
|
||||
@@ -38,7 +33,8 @@ class MetadataFromTitlePP(PostProcessor):
|
||||
title = info['title']
|
||||
match = re.match(self._titleregex, title)
|
||||
if match is None:
|
||||
raise MetadataFromTitlePPError('Could not interpret title of video as "%s"' % self._titleformat)
|
||||
self._downloader.to_screen('[fromtitle] Could not interpret title of video as "%s"' % self._titleformat)
|
||||
return [], info
|
||||
for attribute, value in match.groupdict().items():
|
||||
value = match.group(attribute)
|
||||
info[attribute] = value
|
||||
|
||||
@@ -47,6 +47,7 @@ from .compat import (
|
||||
compat_socket_create_connection,
|
||||
compat_str,
|
||||
compat_struct_pack,
|
||||
compat_struct_unpack,
|
||||
compat_urllib_error,
|
||||
compat_urllib_parse,
|
||||
compat_urllib_parse_urlencode,
|
||||
@@ -121,6 +122,7 @@ DATE_FORMATS = (
|
||||
'%Y %m %d',
|
||||
'%Y-%m-%d',
|
||||
'%Y/%m/%d',
|
||||
'%Y/%m/%d %H:%M',
|
||||
'%Y/%m/%d %H:%M:%S',
|
||||
'%Y-%m-%d %H:%M:%S',
|
||||
'%Y-%m-%d %H:%M:%S.%f',
|
||||
@@ -1983,11 +1985,27 @@ US_RATINGS = {
|
||||
}
|
||||
|
||||
|
||||
TV_PARENTAL_GUIDELINES = {
|
||||
'TV-Y': 0,
|
||||
'TV-Y7': 7,
|
||||
'TV-G': 0,
|
||||
'TV-PG': 0,
|
||||
'TV-14': 14,
|
||||
'TV-MA': 17,
|
||||
}
|
||||
|
||||
|
||||
def parse_age_limit(s):
|
||||
if s is None:
|
||||
if type(s) == int:
|
||||
return s if 0 <= s <= 21 else None
|
||||
if not isinstance(s, compat_basestring):
|
||||
return None
|
||||
m = re.match(r'^(?P<age>\d{1,2})\+?$', s)
|
||||
return int(m.group('age')) if m else US_RATINGS.get(s)
|
||||
if m:
|
||||
return int(m.group('age'))
|
||||
if s in US_RATINGS:
|
||||
return US_RATINGS[s]
|
||||
return TV_PARENTAL_GUIDELINES.get(s)
|
||||
|
||||
|
||||
def strip_jsonp(code):
|
||||
@@ -2969,3 +2987,110 @@ def parse_m3u8_attributes(attrib):
|
||||
|
||||
def urshift(val, n):
|
||||
return val >> n if val >= 0 else (val + 0x100000000) >> n
|
||||
|
||||
|
||||
# Based on png2str() written by @gdkchan and improved by @yokrysty
|
||||
# Originally posted at https://github.com/rg3/youtube-dl/issues/9706
|
||||
def decode_png(png_data):
|
||||
# Reference: https://www.w3.org/TR/PNG/
|
||||
header = png_data[8:]
|
||||
|
||||
if png_data[:8] != b'\x89PNG\x0d\x0a\x1a\x0a' or header[4:8] != b'IHDR':
|
||||
raise IOError('Not a valid PNG file.')
|
||||
|
||||
int_map = {1: '>B', 2: '>H', 4: '>I'}
|
||||
unpack_integer = lambda x: compat_struct_unpack(int_map[len(x)], x)[0]
|
||||
|
||||
chunks = []
|
||||
|
||||
while header:
|
||||
length = unpack_integer(header[:4])
|
||||
header = header[4:]
|
||||
|
||||
chunk_type = header[:4]
|
||||
header = header[4:]
|
||||
|
||||
chunk_data = header[:length]
|
||||
header = header[length:]
|
||||
|
||||
header = header[4:] # Skip CRC
|
||||
|
||||
chunks.append({
|
||||
'type': chunk_type,
|
||||
'length': length,
|
||||
'data': chunk_data
|
||||
})
|
||||
|
||||
ihdr = chunks[0]['data']
|
||||
|
||||
width = unpack_integer(ihdr[:4])
|
||||
height = unpack_integer(ihdr[4:8])
|
||||
|
||||
idat = b''
|
||||
|
||||
for chunk in chunks:
|
||||
if chunk['type'] == b'IDAT':
|
||||
idat += chunk['data']
|
||||
|
||||
if not idat:
|
||||
raise IOError('Unable to read PNG data.')
|
||||
|
||||
decompressed_data = bytearray(zlib.decompress(idat))
|
||||
|
||||
stride = width * 3
|
||||
pixels = []
|
||||
|
||||
def _get_pixel(idx):
|
||||
x = idx % stride
|
||||
y = idx // stride
|
||||
return pixels[y][x]
|
||||
|
||||
for y in range(height):
|
||||
basePos = y * (1 + stride)
|
||||
filter_type = decompressed_data[basePos]
|
||||
|
||||
current_row = []
|
||||
|
||||
pixels.append(current_row)
|
||||
|
||||
for x in range(stride):
|
||||
color = decompressed_data[1 + basePos + x]
|
||||
basex = y * stride + x
|
||||
left = 0
|
||||
up = 0
|
||||
|
||||
if x > 2:
|
||||
left = _get_pixel(basex - 3)
|
||||
if y > 0:
|
||||
up = _get_pixel(basex - stride)
|
||||
|
||||
if filter_type == 1: # Sub
|
||||
color = (color + left) & 0xff
|
||||
elif filter_type == 2: # Up
|
||||
color = (color + up) & 0xff
|
||||
elif filter_type == 3: # Average
|
||||
color = (color + ((left + up) >> 1)) & 0xff
|
||||
elif filter_type == 4: # Paeth
|
||||
a = left
|
||||
b = up
|
||||
c = 0
|
||||
|
||||
if x > 2 and y > 0:
|
||||
c = _get_pixel(basex - stride - 3)
|
||||
|
||||
p = a + b - c
|
||||
|
||||
pa = abs(p - a)
|
||||
pb = abs(p - b)
|
||||
pc = abs(p - c)
|
||||
|
||||
if pa <= pb and pa <= pc:
|
||||
color = (color + a) & 0xff
|
||||
elif pb <= pc:
|
||||
color = (color + b) & 0xff
|
||||
else:
|
||||
color = (color + c) & 0xff
|
||||
|
||||
current_row.append(color)
|
||||
|
||||
return width, height, pixels
|
||||
|
||||
@@ -1,3 +1,3 @@
|
||||
from __future__ import unicode_literals
|
||||
|
||||
__version__ = '2016.08.06'
|
||||
__version__ = '2016.08.12'
|
||||
|
||||
Reference in New Issue
Block a user