1
0
mirror of https://gitlab.com/ytdl-org/youtube-dl.git synced 2026-01-25 00:00:04 -05:00

Compare commits

...

187 Commits

Author SHA1 Message Date
Philipp Hagemeister
b0ba11cc64 release 2016.04.13 2016-04-13 08:02:03 +02:00
Yen Chi Hsuan
75af5d59ae [netease] Skip all tests: completely georestricted 2016-04-13 04:52:07 +08:00
Sergey M․
b969d12490 Credit @Phaeilo for presstv (#7113) 2016-04-13 01:52:50 +06:00
Sergey M․
466a614537 [youtube:playlist] Recognize popular uploads playlist as mix (Closes #9170) 2016-04-12 21:38:31 +06:00
Sergey M․
ffa2cecf72 [ard] Change subtitles extension to ttml (Closes #9169)
ttml is now served instead of srt
2016-04-12 21:20:31 +06:00
Yen Chi Hsuan
a837416025 [jadorecettepub] Remove extractor: website gone 2016-04-12 18:30:53 +08:00
Yen Chi Hsuan
c9d448876f [izlesene] Fix extraction
description may be absent
2016-04-12 18:29:28 +08:00
Yen Chi Hsuan
8865b8abfd [howstuffworks] Skip a broken test case 2016-04-12 17:30:14 +08:00
Yen Chi Hsuan
c77a0c01cb [groupon] Fix extraction 2016-04-12 17:26:09 +08:00
Yen Chi Hsuan
12355ac473 [goshgay] Fix extraction
isFamilyFriendly no longer exists in the webpage and I can't find
another indicator.
2016-04-12 17:23:00 +08:00
Sergey M․
49f523ca50 [mixcloud] Capture error message (#9156) 2016-04-11 20:45:58 +06:00
remitamine
4a903b93a9 Revert "[openclassroom] Add new extractor(closes #9147)"
This reverts commit 13267a2be3.
2016-04-11 14:44:35 +01:00
remitamine
13267a2be3 [openclassroom] Add new extractor(closes #9147) 2016-04-11 14:24:08 +01:00
Yen Chi Hsuan
134c207e3f [arte.tv:embed] Extended support (#2620) 2016-04-11 19:32:27 +08:00
Yen Chi Hsuan
0f56bd2178 Merge branch 'Phaeilo-presstv' 2016-04-11 16:17:05 +08:00
Yen Chi Hsuan
dfbc7f7f3f [presstv] Improve and simplify 2016-04-11 16:14:07 +08:00
Yen Chi Hsuan
7d58ea7c5b Merge branch 'presstv' of https://github.com/Phaeilo/youtube-dl into Phaeilo-presstv 2016-04-11 15:48:10 +08:00
Sergey M․
452908b257 [telebruxelles] Fix extraction (Closes #9142) 2016-04-11 00:06:05 +06:00
Sergey M․
5899e988d5 [glide] Improve extraction and extract upload info 2016-04-10 23:56:23 +06:00
Sergey M․
4a121d29bb [glide] Fix extraction (Closes #9141) 2016-04-10 23:45:17 +06:00
Sergey M․
7ebc36900d [jwplatform:base] Improve subtitles extraction 2016-04-10 22:55:07 +06:00
Sergey M․
d7eb052fa2 [screencastomatic] Add duration to test 2016-04-10 22:48:04 +06:00
Sergey M․
a6d6722c8f [jwplatform:base] Extract duration 2016-04-10 22:47:38 +06:00
Sergey M․
66fa495868 [screencastomatic] Fix extraction (Closes #9136) 2016-04-10 22:37:14 +06:00
Sergey M․
443285aabe [ebaumsworlds] Update _VALID_URL (Closes #9135) 2016-04-10 22:15:11 +06:00
Philip Huppert
de728757ad [presstv] Refactored extractor. 2016-04-10 16:36:44 +02:00
Sergey M․
f44c276842 [extractor/extractors] Remove non-existant imports 2016-04-10 19:21:58 +06:00
Sergey M․
a1fa60a934 [cliprs] Add extractor (Closes #9099) 2016-04-10 18:43:40 +06:00
Sergey M․
49caf3307f [extractor/common] Remove irrelevant comment 2016-04-10 17:10:27 +06:00
Jaime Marquínez Ferrándiz
6a801f4470 [test/InfoExtractors] add test for _download_json 2016-04-09 23:18:41 +02:00
Sergey M․
61dd350a04 [1tv] Fix extraction (Closes #9103) 2016-04-10 03:02:35 +06:00
Jaime Marquínez Ferrándiz
eb9c3edd5e [test/utils] Add test for date_from_str 2016-04-09 22:40:05 +02:00
Philip Huppert
95153a960d [presstv] updated extractor and tests to work with current PressTV website 2016-04-09 16:14:05 +02:00
Yen Chi Hsuan
6c4c7539f2 [test/helper] Check got values to be strings for md5: fields
Seen in PBSIE tests
2016-04-09 22:04:48 +08:00
Yen Chi Hsuan
c991106706 [videodetective] Adapt to InternetVideoArchiveIE 2016-04-09 21:47:35 +08:00
Yen Chi Hsuan
dae2a058de [rottentomatoes] Adapt to InternetVideoArchiveIE 2016-04-09 21:47:12 +08:00
Yen Chi Hsuan
c05025fdd7 [internetvideoarchive] Fix extraction and support json URLs 2016-04-09 21:46:51 +08:00
Philip Huppert
bfe96d7bea [presstv] Added extractor PressTV.
Fixes #7060
2016-04-09 14:55:54 +02:00
Yen Chi Hsuan
ab481b48e5 [funnyordie] Relax M3U8 URL matching
Also, m3u8_url extraction should be fatal as all formats depends
directly or indirectly on it.

This change fixes test_Generic_26 and TestFunnyOrDieSubtitles
2016-04-09 20:17:35 +08:00
Sergey M․
92c7f3157a [aol] Add coding cookie 2016-04-09 17:32:23 +06:00
Yen Chi Hsuan
cacd996662 [utils] Don't touch URLs if not necessary
Fix test_Generic_15 (Google redirect)
2016-04-09 19:27:54 +08:00
remitamine
bffb245a48 [aol] add support for videos with vidible IDs(closes #9124) 2016-04-09 10:51:23 +01:00
Yen Chi Hsuan
680efb6723 Merge pull request #8497 from jaimeMF/lazy-load
Add experimenta lazy loading of info extractors
2016-04-09 14:08:13 +08:00
Jaime Marquínez Ferrándiz
5a9858bfa9 setup.py: add command for building the lazy_extractors module 2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
8a5dc1c1e1 lazy extractors: Initialize the real info extractor
According to the docs '__init__' is only called automatically if '__new__' returns an instance of the original class.
2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
e0986e31cf lazy extractors: Output if it's enabled in the verbose log 2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
6b97ca96fc lazy extractors: Style fixes
* Sort extractors alphabetically
* Add newlines when needed (youtube_dl/extractors/lazy_extractors.py pass the flake8 test now)
2016-04-08 21:50:54 +02:00
Jaime Marquínez Ferrándiz
c1ce6acdd7 lazy extractors: Fix building with python2.6 2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
0d778b1db9 lazy extractors: specify the encoding
When building with python3 the unicode characters are not escaped, python2 needs to know the encoding.
2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
779822d945 Add experimental support for lazy loading the info extractors
'make lazy-extractors' creates the youtube_dl/extractor/lazy_extractors.py (imported by youtube_dl/extractor/__init__.py), which contains simplified classes that only have the 'suitable' class method and that load the appropiate class with the '__new__' method when a instance is created.
2016-04-08 21:50:07 +02:00
Jaime Marquínez Ferrándiz
1b3d5e05a8 Move the extreactors import to youtube_dl/extractor/extractors.py 2016-04-08 21:47:51 +02:00
Jaime Marquínez Ferrándiz
e52d7f85f2 Delay initialization of InfoExtractors until they are needed 2016-04-08 21:43:24 +02:00
Sergey M․
568d2f78d6 [tnaflix] Fix metadata extraction 2016-04-09 00:27:24 +06:00
Sergey M․
2f2fcf1a33 [tnaflix] Fix extraction (Closes #9074) 2016-04-08 23:34:59 +06:00
Sergey M․
bacec0397f [extractor/common] Relax _hidden_inputs 2016-04-08 23:33:45 +06:00
Sergey M․
3c6c7e7d7e [gdcvault] Fix extraction (Closes #9107, closes #9114) 2016-04-08 23:16:02 +06:00
Sergey M․
fb38aa8b53 [extractor/common] Support arbitrary format strings for template based identifiers in mpd manifests (Closes #9119, closes #9120) 2016-04-08 22:48:08 +06:00
Sergey M․
18da24634c [democracynow] Improve extraction 2016-04-08 22:27:27 +06:00
Sergey M․
a134426d61 [democracynow] Fix tests 2016-04-08 22:21:14 +06:00
Sergey M․
a64c0c9b06 [democracynow] Make description optional (Closes #9115) 2016-04-08 22:15:36 +06:00
Sergey M․
56019444cb [novamov] Improve _VALID_URL template (Closes #9116) 2016-04-08 21:26:42 +06:00
remitamine
a1ff3cd5f9 [acast] fix channel extraction(closes #9117) 2016-04-08 15:15:34 +01:00
remitamine
9a32e80477 [acast] fix extraction(#9117) 2016-04-08 14:51:00 +01:00
Sergey M․
536a55dabd [YoutubeDL] Sanitize single thumbnail URL 2016-04-08 00:17:47 +06:00
Sergey M․
ed6fb8b804 [vrt] Add support for direct hls playlists and YouTube (Closes #9108) 2016-04-07 23:22:43 +06:00
Sergey M․
3afef2e3fc [beeg] Improve extraction 2016-04-07 22:40:35 +06:00
Sergey M․
e90d175436 [yandexmusic] Extract music album metafields (Closes #7354) 2016-04-07 02:56:13 +06:00
Sergey M․
7a93ab5f3f [extractor/common] Introduce music album metafields 2016-04-07 02:53:53 +06:00
Philipp Hagemeister
c41cf65d4a release 2016.04.06 2016-04-06 15:13:08 +02:00
Jaime Marquínez Ferrándiz
ec4a4c6fcc Makefile: remove ISSUE_TEMPLATE.md from the 'all' target (fixes #9088)
It isn't included in the tar file, causing build failures.
Since it's only used for GitHub, I think we don't need to store it in the tar file.
2016-04-06 14:16:05 +02:00
Jaime Marquínez Ferrándiz
be0c7009fb Makefile: use full path for the ISSUE_TEMPLATE.md file 2016-04-06 14:09:31 +02:00
Yen Chi Hsuan
92d5477d84 [compat] Handle tuples properly in urlencode()
Fixes #9055
2016-04-06 18:29:54 +08:00
Yen Chi Hsuan
8790249c68 [iqiyi] Improve error detection for VIP-only videos
Closes #9071
2016-04-06 16:12:16 +08:00
Philipp Hagemeister
416930d450 release 2016.04.05 2016-04-05 18:36:24 +02:00
Sergey M․
65150b41bb [deezer] Fix extraction (Closes #9086) 2016-04-05 22:27:33 +06:00
Sergey M․
e42f413716 [rte] Improve thumbnail extraction (Closes #9085) 2016-04-05 22:23:20 +06:00
Sergey M․
40a056d85d [extractor/__init__] Remove novamov extractor and sort novamov based extractors alphabetically 2016-04-05 21:54:09 +06:00
Sergey M․
e7d77efb9d [auroravid] Add extractor (Closes #9070) 2016-04-05 21:52:07 +06:00
Sergey M․
995cf05c96 [novamov] Make title fatal 2016-04-05 21:40:43 +06:00
Jaime Marquínez Ferrándiz
5bf28d7864 [utils] dfxp2srt: add additional namespace
Used by the ZDF subtitles (#9081).
2016-04-04 20:46:35 +02:00
Jaime Marquínez Ferrándiz
8c7d6e8e22 [zdf] Extract subtitles (closes #9081) 2016-04-04 20:44:06 +02:00
Sergey M․
6d4fc66bfc [youtube] Add support for zwearz (Closes #9062) 2016-04-04 02:26:20 +06:00
remitamine
23576edbfc [brightcove:legacy] skip None value for uploader_id 2016-04-02 21:31:21 +01:00
remitamine
4d4cd35f48 [brightcove:legacy] extract uploader_id as a string 2016-04-02 20:55:44 +01:00
remitamine
3aac9b2fb1 [nowness] update tests 2016-04-02 18:57:15 +01:00
remitamine
e47d19e991 [brightcove:new] extract subtitles and strip video title 2016-04-02 18:57:15 +01:00
remitamine
41f5492fbc [brightcove:legacy] improve format extraction and extract uploader_id, duration and timestamp 2016-04-02 18:57:15 +01:00
Jaime Marquínez Ferrándiz
2defa7d75a [instagram:user] Fix extraction (fixes #9059)
The URL for the next page was incorrect and we always got the same page, therefore it got trapped in an infinite loop.
2016-04-02 18:03:56 +02:00
Sergey M․
bbc26c8a01 [bbc] Set vcodec to none for audio formats 2016-04-02 19:00:38 +06:00
Sergey M․
b507cc925b [extractor/common] Carry long line 2016-04-02 18:49:58 +06:00
Sergey M․
db8ee7ec05 [extractor/common] Fix numeric identifiers conversion in DASH URL templates 2016-04-02 18:48:05 +06:00
remitamine
08136dc138 [brightcove] fix format sorting 2016-04-02 10:57:57 +01:00
remitamine
fe7ef95e91 [cbsinteractive] Add support for ZDNet videos 2016-04-01 23:53:32 +01:00
remitamine
5f705baf5e [cnet] extract more formats 2016-04-01 20:42:15 +01:00
remitamine
0750b2491f [ffmpeg] try to convert tt subtitles usng dfxp2srt 2016-04-01 19:47:49 +01:00
remitamine
df634be2ed [common] prefer using mime type over ext for smil subtitle extraction
the subtitle ext for http://www.cnet.com/videos/download-amazon-prime-movies-and-tv/
is adb_xml while using the mime type it get tt(application/smptett+xml)
2016-04-01 19:47:49 +01:00
Jaime Marquínez Ferrándiz
6d628fafca [camwithher] Remove extra blank line 2016-04-01 20:45:21 +02:00
Jaime Marquínez Ferrándiz
0f28777f58 [cbsnews] Remove unused import 2016-04-01 20:43:14 +02:00
Jaime Marquínez Ferrándiz
329c1eae54 [aenetworks] Make pep8 happy 2016-04-01 20:42:19 +02:00
Sergey M․
9aaaf8e8e8 [camwithher] Improve extraction (Closes #8989) 2016-04-01 23:47:27 +06:00
theGeekPirate
04819db58e [camwithher] Add extractor
Corrected unnecessary test

Sane variable naming

RTMP all .flv & url_id for _download_webpage()

Corrected all outstanding issues, next up is a squash!
2016-04-01 23:44:25 +06:00
remitamine
79ba9140dc [theplatform] extract timestamp and uploader 2016-04-01 18:07:17 +01:00
Sergey M․
75d572e9fb [screencast] Improve title regexes (Closes #9025) 2016-04-01 23:01:55 +06:00
Martin Trigaux
791d6aaecc screencast.com: fallback on page title
When determining the title of the page, use the <title> tag of the page
2016-04-01 23:00:52 +06:00
Sergey M․
81de73e5b4 [screencast] Add test 2016-04-01 23:00:45 +06:00
Martin Trigaux
83cedc1cf2 screencast.com: support missing www
The "www." part of the URL is not mandatory
2016-04-01 22:58:16 +06:00
Sergey M․
244cd04237 [pluralsight] Remove unnecessary login/password encode 2016-04-01 22:46:46 +06:00
Sergey M․
fbdaced256 [lynda] Remove unnecessary login/password encode 2016-04-01 22:45:20 +06:00
Sergey M․
a3373823e1 [udemy] Remove unnecessary login/password encode
This is now covered by compat_urllib_parse_urlencode
2016-04-01 22:42:09 +06:00
Sergey M․
03caa463e7 [udemy:course] Skip non-video lectures 2016-04-01 22:38:56 +06:00
remitamine
3f64379eda [movieclips] fix extraction 2016-04-01 16:22:06 +01:00
remitamine
3e0c3d14d9 [cbs] add base extractor 2016-04-01 10:12:29 +01:00
remitamine
d8873d4def [aenetworks] improve format extraction 2016-04-01 09:58:02 +01:00
remitamine
db1c969da5 [theplatform] sign https urls 2016-04-01 09:58:02 +01:00
Philipp Hagemeister
1e02bc7ba2 release 2016.04.01 2016-04-01 09:07:40 +02:00
remitamine
63c55e9f22 [cbs] improve extraction(closes #6321) 2016-04-01 07:33:37 +01:00
remitamine
f9b1529af8 [generic] remove sbnation test(handled by VoxMediaIE) 2016-03-31 23:50:45 +01:00
remitamine
961fc024d2 [voxmedia] improve sbnation support 2016-03-31 23:33:36 +01:00
Sergey M․
b53a06e3b9 [udemy:course] Use new URL format 2016-04-01 02:24:22 +06:00
remitamine
4ecc1fc638 [howstuffworks] improve extraction 2016-03-31 21:11:58 +01:00
Yen Chi Hsuan
5b012dfce8 [tudou] Improve error handling (closes #8988) 2016-04-01 01:42:16 +08:00
remitamine
8369942773 [voxmedia] Add new extractor(closes #3182) 2016-03-31 18:36:41 +01:00
Sergey M․
86f3b66cec [udemy] Remove unused import 2016-03-31 23:00:11 +06:00
Sergey M․
6bb4600717 [udemy:course] Simplify course curriculum downloading 2016-03-31 22:59:19 +06:00
Sergey M․
41d06b0424 [extractor/common] Improve _request_webpage
* Do not ignore data, headers and query for Requests
* Default values for headers and query switched to dicts since these are used by urllib itself
2016-03-31 22:58:38 +06:00
Sergey M․
15d260ebaa [utils] Use update_Request in http_request 2016-03-31 22:55:49 +06:00
Sergey M․
ed0291d153 [utils] Add update_Request 2016-03-31 22:55:01 +06:00
Sergey M․
81da8cbc45 [udemy] Switch to api 2.0 (Closes #9035) 2016-03-31 22:05:25 +06:00
Sergey M․
5299bc3f91 [beeg] Switch to api v6 (Closes #9036) 2016-03-31 20:42:41 +06:00
remitamine
c9c39c22c5 [nationalgeographic] add support for channel.nationalgeographic.com urls 2016-03-31 13:47:38 +01:00
remitamine
d84b48e3f1 [nationalgeographic] improve extraction 2016-03-31 13:44:55 +01:00
remitamine
dd17041c82 [tenplay] remove extractor(fixes #6927) 2016-03-31 12:02:04 +01:00
remitamine
fea7295b14 [brightcove] relax embed_in_page regex 2016-03-31 10:48:22 +01:00
remitamine
9cf01f7f30 [nbc] add new extractor for csnne.com(#5432) 2016-03-31 00:26:42 +01:00
remitamine
ce548296fe [cnbc] fix test 2016-03-31 00:25:11 +01:00
remitamine
c02ec7d430 [cnbc] Add new extractor(closes #8012) 2016-03-30 23:18:31 +01:00
remitamine
6b820a2376 [myspace] improve extraction 2016-03-30 21:18:07 +01:00
Yen Chi Hsuan
e621a344e6 [kwuo] Port to new API and enable --cn-verification-proxy 2016-03-31 02:27:52 +08:00
Yen Chi Hsuan
3ae6f8fec1 [kwuo] Remove _sort_formats() from KuwoBaseIE._get_formats()
Following the idea proposed in 19dbaeece3
2016-03-31 02:11:21 +08:00
Yen Chi Hsuan
597d52fadb [kuwo:song] Correct song ID extraction (fixes #9033)
Bug introduced in daef04a4e7.
2016-03-31 02:00:50 +08:00
Sergey M․
afca767d19 [tumblr] Improve _VALID_URL (Closes #9027) 2016-03-30 22:26:43 +06:00
remitamine
6e359a1534 [comcarcoff] don not depend on crackle extractor(closes #8995)
previously extraction has been delegated to crackle to extract more info
and subtitles #6106 but some of the episodes can't be extracted using
crackle #8995.
2016-03-30 12:27:00 +01:00
Sergey M․
607619bc90 Add manually generated ISSUE_TEMPLATE.md
In order not to wait for the next release
2016-03-29 22:04:29 +06:00
Sergey M․
0b7bfc9422 Improve ISSUE_TEMPLATE_tmpl.md 2016-03-29 22:02:42 +06:00
Sergey M․
7168a6c874 [devscripts/make_issue_template] Fix __version__ again 2016-03-29 03:05:15 +06:00
Sergey M․
034947dd1e Rename ISSUE_TEMPLATE.tmpl in order not to be picked up by github 2016-03-29 02:48:04 +06:00
Sergey M․
3c0de33ad7 Remove ISSUE_TEMPLATE.md 2016-03-29 02:43:48 +06:00
Sergey M․
89924f8230 [devscripts/make_issue_template] Fix NameError under python3 2016-03-29 02:41:27 +06:00
Sergey M․
a39c68f7e5 Exclude make_issue_template.py from flake8 2016-03-29 02:19:24 +06:00
Sergey M․
4a5a67ca25 [devscripts/release.sh] Make ISSUE_TEMPLATE.md and commit it 2016-03-29 02:18:52 +06:00
Sergey M․
8751da85a7 [Makefile] Fix ISSUE_TEMPLATE.md target 2016-03-29 02:17:57 +06:00
Sergey M․
3bf1df51fd [devscripts/make_issue_template] Rework to use ISSUE_TEMPLATE.tmpl (Closes #8785) 2016-03-29 02:16:38 +06:00
Sergey M․
3842a3e652 Add ISSUE_TEMPLATE.tmpl as template for ISSUE_TEMPLATE.md 2016-03-29 02:15:26 +06:00
Sander van den Oever
7710bdf4e8 Add initial ISSUE_TEMPLATE
Add auto-updating of youtube-dl version in ISSUE_TEMPLATE

Move parts of template text and adopt makefile to new format

Moved the 'kind-of-issue' section and rephrased a bit

Rephrased and moved Example URL section upwards

Moved ISSUE_TEMPLATE inside .github folder.

Update makefile to match new folderstructure
2016-03-28 22:43:13 +06:00
Sergey M
8d9dd3c34b [README.md] Add format_id to the list of string meta fields available for use in format selection 2016-03-28 03:08:34 +05:00
Sergey M․
33f3040a3e [YoutubeDL] Fix sanitizing subtitles' url 2016-03-28 03:13:39 +06:00
Sergey M․
03442072c0 [pornhub] Fix typo (Closes #9008) 2016-03-28 01:21:44 +06:00
Sergey M․
c8b13fec02 [foxnews] Restore upload time fields in test 2016-03-28 01:14:12 +06:00
Sergey M․
87d105ac6c [amp] Fix upload timestamp extraction (Closes #9007) 2016-03-28 01:13:47 +06:00
Sergey M․
3454139576 [pornhub:uservideos] Add support for multipage videos (Closes #9006) 2016-03-28 00:50:46 +06:00
Sergey M․
3a23bae9cc [pornhub:playlistbase] Do not include videos not from playlist 2016-03-28 00:32:57 +06:00
Sergey M․
8f9a477e7f [pornhub:playlistbase] Use orderedSet 2016-03-28 00:21:08 +06:00
Sergey M․
a1cf3e38a3 [bbc] Extend vpid regex (Closes #9003) 2016-03-27 23:22:51 +06:00
Philipp Hagemeister
a122e7080b release 2016.03.27 2016-03-27 16:56:33 +02:00
Sergey M․
b22ca76204 [extractor/common] Filter out unsupported encrypted media for f4m formats (Closes #8573) 2016-03-27 07:42:38 +06:00
Sergey M․
f7df343b4a [downloader/f4m] Extract routine for removing unsupported encrypted media 2016-03-27 07:41:19 +06:00
Sergey M․
19dbaeece3 Remove _sort_formats from _extract_*_formats methods
Now _sort_formats should be called explicitly.
_sort_formats has been added to all the necessary places in code.

Closes #8051
2016-03-27 07:03:08 +06:00
Yen Chi Hsuan
395fd4b08a [twitter] Handle another form of embedded Vine
Fixes #8996
2016-03-27 04:36:02 +08:00
Sergey M․
8018028d0f [pluralsight] Extract chapter metadata (Closes #8993) 2016-03-27 02:10:52 +06:00
Sergey M․
00322ad4fd [lynda] Extract chapter metadata (#8993) 2016-03-27 02:00:36 +06:00
Sergey M․
4cf3489c6e [vevo] Update videoservice API URL (Closes #8900) 2016-03-27 01:11:11 +06:00
Sergey M․
b24ab3e341 [udemy] Improve paid course detection 2016-03-27 00:09:12 +06:00
Sergey M․
af4116f4f0 [udemy] Improve format_id 2016-03-27 00:02:52 +06:00
Sergey M․
f973e5d54e [udemy] Drop outputs' formats
Always results in 403
2016-03-26 23:55:07 +06:00
Sergey M․
62f55aa68a [udemy] Add outputs metadata to view_html formats 2016-03-26 23:54:12 +06:00
Sergey M․
02d7634d24 [udemy] Fix outputs' formats format_id 2016-03-26 23:43:25 +06:00
Sergey M․
48dce58ca9 [udemy] Use custom sorting 2016-03-26 23:42:46 +06:00
Sergey M․
efcba804f6 [udemy] Extract formats from view_html (Closes #8979) 2016-03-26 23:42:34 +06:00
Sergey M․
6dee688e6d [youtube:playlistsbase] Restrict playlist regex (Closes #8986) 2016-03-26 20:42:18 +06:00
Sergey M․
eedb7ba536 [YoutubeDL] Sort imports 2016-03-26 19:40:33 +06:00
Sergey M․
dcf77cf1a7 [YoutubeDL] Sanitize final URLs (Closes #8991) 2016-03-26 19:37:41 +06:00
Sergey M․
17bcc626bf [utils] Extract sanitize_url routine 2016-03-26 19:33:57 +06:00
Sergey M․
b5a5bbf376 [mailru] Extend _VALID_URL (Closes #8990) 2016-03-26 19:15:32 +06:00
Yen Chi Hsuan
e68d3a010f [twitter] Fix extraction (closes #8966)
HLS and DASH formats are no longer appeared in test cases. I keep them
for fear of triggering new errors.
2016-03-26 18:34:51 +08:00
Yen Chi Hsuan
d10fe8358c [generic] Add a test case for brightcove embed
Closes #8862
2016-03-26 18:30:43 +08:00
Yen Chi Hsuan
d6c340cae5 [brightcove] Extract more formats (#8862) 2016-03-26 18:21:07 +08:00
Yen Chi Hsuan
5964b598ff [brightcove] Support alternative BrightcoveExperience layout
The full URL lays in the `data` attribute of <object> (#8862)
2016-03-26 17:47:32 +08:00
124 changed files with 3221 additions and 1739 deletions

58
.github/ISSUE_TEMPLATE.md vendored Normal file
View File

@@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2016.04.13*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2016.04.13**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2016.04.13
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

58
.github/ISSUE_TEMPLATE_tmpl.md vendored Normal file
View File

@@ -0,0 +1,58 @@
## Please follow the guide below
- You will be asked some questions and requested to provide some information, please read them **carefully** and answer honestly
- Put an `x` into all the boxes [ ] relevant to your *issue* (like that [x])
- Use *Preview* tab to see how your issue will actually look like
---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *%(version)s*. If it's not read [this FAQ entry](https://github.com/rg3/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **%(version)s**
### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through [README](https://github.com/rg3/youtube-dl/blob/master/README.md) and **most notably** [FAQ](https://github.com/rg3/youtube-dl#faq) and [BUGS](https://github.com/rg3/youtube-dl#bugs) sections
- [ ] [Searched](https://github.com/rg3/youtube-dl/search?type=Issues) the bugtracker for similar issues including closed ones
### What is the purpose of your *issue*?
- [ ] Bug report (encountered problems with youtube-dl)
- [ ] Site support request (request for adding support for a new site)
- [ ] Feature request (request for a new functionality)
- [ ] Question
- [ ] Other
---
### The following sections concretize particular purposed issues, you can erase any section (the contents between triple ---) not applicable to your *issue*
---
### If the purpose of this *issue* is a *bug report*, *site support request* or you are not completely sure provide the full verbose output as follows:
Add `-v` flag to **your command line** you run youtube-dl with, copy the **whole** output and insert it here. It should look similar to one below (replace it with **your** log inserted between triple ```):
```
$ youtube-dl -v <your command line>
[debug] System config: []
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version %(version)s
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}
...
<end of log>
```
---
### If the purpose of this *issue* is a *site support request* please provide all kinds of example URLs support for which should be included (replace following example URLs by **yours**):
- Single video: https://www.youtube.com/watch?v=BaW_jenozKc
- Single video: https://youtu.be/BaW_jenozKc
- Playlist: https://www.youtube.com/playlist?list=PL4lCao7KL_QFVb7Iudeipvc2BCavECqzc
---
### Description of your *issue*, suggested solution and other information
Explanation of your *issue* in arbitrary form goes here. Please make sure the [description is worded well enough to be understood](https://github.com/rg3/youtube-dl#is-the-description-of-the-issue-itself-sufficient). Provide as much context and examples as possible.
If work on your *issue* required an account credentials please provide them or explain how one can obtain them.

1
.gitignore vendored
View File

@@ -13,6 +13,7 @@ README.txt
youtube-dl.1
youtube-dl.bash-completion
youtube-dl.fish
youtube_dl/extractor/lazy_extractors.py
youtube-dl
youtube-dl.exe
youtube-dl.tar.gz

View File

@@ -167,3 +167,4 @@ Kacper Michajłow
José Joaquín Atria
Viťas Strádal
Kagami Hiiragi
Philip Huppert

View File

@@ -140,14 +140,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor

View File

@@ -1,7 +1,7 @@
all: youtube-dl README.md CONTRIBUTING.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish supportedsites
clean:
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp youtube-dl youtube-dl.exe
rm -rf youtube-dl.1.temp.md youtube-dl.1 youtube-dl.bash-completion README.txt MANIFEST build/ dist/ .coverage cover/ youtube-dl.tar.gz youtube-dl.zsh youtube-dl.fish youtube_dl/extractor/lazy_extractors.py *.dump *.part *.info.json *.mp4 *.flv *.mp3 *.avi CONTRIBUTING.md.tmp ISSUE_TEMPLATE.md.tmp youtube-dl youtube-dl.exe
find . -name "*.pyc" -delete
find . -name "*.class" -delete
@@ -59,6 +59,9 @@ README.md: youtube_dl/*.py youtube_dl/*/*.py
CONTRIBUTING.md: README.md
$(PYTHON) devscripts/make_contributing.py README.md CONTRIBUTING.md
.github/ISSUE_TEMPLATE.md: devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md youtube_dl/version.py
$(PYTHON) devscripts/make_issue_template.py .github/ISSUE_TEMPLATE_tmpl.md .github/ISSUE_TEMPLATE.md
supportedsites:
$(PYTHON) devscripts/make_supportedsites.py docs/supportedsites.md
@@ -85,6 +88,12 @@ youtube-dl.fish: youtube_dl/*.py youtube_dl/*/*.py devscripts/fish-completion.in
fish-completion: youtube-dl.fish
lazy-extractors: youtube_dl/extractor/lazy_extractors.py
_EXTRACTOR_FILES != find youtube_dl/extractor -iname '*.py' -and -not -iname 'lazy_extractors.py'
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \

View File

@@ -600,6 +600,7 @@ Also filtering work for comparisons `=` (equals), `!=` (not equals), `^=` (begin
- `vcodec`: Name of the video codec in use
- `container`: Name of the container format
- `protocol`: The protocol that will be used for the actual download, lower-case. `http`, `https`, `rtsp`, `rtmp`, `rtmpe`, `m3u8`, or `m3u8_native`
- `format_id`: A short description of the format
Note that none of the aforementioned meta fields are guaranteed to be present since this solely depends on the metadata obtained by particular extractor, i.e. the metadata offered by video hoster.
@@ -888,14 +889,14 @@ After you have ensured this site is distributing it's content legally, you can f
# TODO more properties (see youtube_dl/extractor/common.py)
}
```
5. Add an import in [`youtube_dl/extractor/__init__.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/__init__.py).
5. Add an import in [`youtube_dl/extractor/extractors.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/extractors.py).
6. Run `python test/test_download.py TestDownload.test_YourExtractor`. This *should fail* at first, but you can continually re-run it until you're done. If you decide to add more than one test, then rename ``_TEST`` to ``_TESTS`` and make it into a list of dictionaries. The tests will then be named `TestDownload.test_YourExtractor`, `TestDownload.test_YourExtractor_1`, `TestDownload.test_YourExtractor_2`, etc.
7. Have a look at [`youtube_dl/extractor/common.py`](https://github.com/rg3/youtube-dl/blob/master/youtube_dl/extractor/common.py) for possible helper methods and a [detailed description of what your extractor should and may return](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L68-L226). Add tests and code for as many as you want.
8. Keep in mind that the only mandatory fields in info dict for successful extraction process are `id`, `title` and either `url` or `formats`, i.e. these are the critical data the extraction does not make any sense without. This means that [any field](https://github.com/rg3/youtube-dl/blob/58525c94d547be1c8167d16c298bdd75506db328/youtube_dl/extractor/common.py#L138-L226) apart from aforementioned mandatory ones should be treated **as optional** and extraction should be **tolerate** to situations when sources for these fields can potentially be unavailable (even if they always available at the moment) and **future-proof** in order not to break the extraction of general purpose mandatory fields. For example, if you have some intermediate dict `meta` that is a source of metadata and it has a key `summary` that you want to extract and put into resulting info dict as `description`, you should be ready that this key may be missing from the `meta` dict, i.e. you should extract it as `meta.get('summary')` and not `meta['summary']`. Similarly, you should pass `fatal=False` when extracting data from a webpage with `_search_regex/_html_search_regex`.
9. Check the code with [flake8](https://pypi.python.org/pypi/flake8).
10. When the tests pass, [add](http://git-scm.com/docs/git-add) the new files and [commit](http://git-scm.com/docs/git-commit) them and [push](http://git-scm.com/docs/git-push) the result, like this:
$ git add youtube_dl/extractor/__init__.py
$ git add youtube_dl/extractor/extractors.py
$ git add youtube_dl/extractor/yourextractor.py
$ git commit -m '[yourextractor] Add new extractor'
$ git push origin yourextractor

View File

@@ -0,0 +1,19 @@
# encoding: utf-8
from __future__ import unicode_literals
import re
class LazyLoadExtractor(object):
_module = None
@classmethod
def ie_key(cls):
return cls.__name__[:-2]
def __new__(cls, *args, **kwargs):
mod = __import__(cls._module, fromlist=(cls.__name__,))
real_cls = getattr(mod, cls.__name__)
instance = real_cls.__new__(real_cls)
instance.__init__(*args, **kwargs)
return instance

View File

@@ -0,0 +1,29 @@
#!/usr/bin/env python
from __future__ import unicode_literals
import io
import optparse
def main():
parser = optparse.OptionParser(usage='%prog INFILE OUTFILE')
options, args = parser.parse_args()
if len(args) != 2:
parser.error('Expected an input and an output filename')
infile, outfile = args
with io.open(infile, encoding='utf-8') as inf:
issue_template_tmpl = inf.read()
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
out = issue_template_tmpl % {'version': locals()['__version__']}
with io.open(outfile, 'w', encoding='utf-8') as outf:
outf.write(out)
if __name__ == '__main__':
main()

View File

@@ -0,0 +1,63 @@
from __future__ import unicode_literals, print_function
from inspect import getsource
import os
from os.path import dirname as dirn
import sys
print('WARNING: Lazy loading extractors is an experimental feature that may not always work', file=sys.stderr)
sys.path.insert(0, dirn(dirn((os.path.abspath(__file__)))))
lazy_extractors_filename = sys.argv[1]
if os.path.exists(lazy_extractors_filename):
os.remove(lazy_extractors_filename)
from youtube_dl.extractor import _ALL_CLASSES
from youtube_dl.extractor.common import InfoExtractor
with open('devscripts/lazy_load_template.py', 'rt') as f:
module_template = f.read()
module_contents = [module_template + '\n' + getsource(InfoExtractor.suitable)]
ie_template = '''
class {name}(LazyLoadExtractor):
_VALID_URL = {valid_url!r}
_module = '{module}'
'''
make_valid_template = '''
@classmethod
def _make_valid_url(cls):
return {valid_url!r}
'''
def build_lazy_ie(ie, name):
valid_url = getattr(ie, '_VALID_URL', None)
s = ie_template.format(
name=name,
valid_url=valid_url,
module=ie.__module__)
if ie.suitable.__func__ is not InfoExtractor.suitable.__func__:
s += '\n' + getsource(ie.suitable)
if hasattr(ie, '_make_valid_url'):
# search extractors
s += make_valid_template.format(valid_url=ie._make_valid_url())
return s
names = []
for ie in list(sorted(_ALL_CLASSES[:-1], key=lambda cls: cls.ie_key())) + _ALL_CLASSES[-1:]:
name = ie.ie_key() + 'IE'
src = build_lazy_ie(ie, name)
module_contents.append(src)
names.append(name)
module_contents.append(
'_ALL_CLASSES = [{0}]'.format(', '.join(names)))
module_src = '\n'.join(module_contents) + '\n'
with open(lazy_extractors_filename, 'wt') as f:
f.write(module_src)

View File

@@ -45,9 +45,9 @@ fi
/bin/echo -e "\n### Changing version in version.py..."
sed -i "s/__version__ = '.*'/__version__ = '$version'/" youtube_dl/version.py
/bin/echo -e "\n### Committing documentation and youtube_dl/version.py..."
make README.md CONTRIBUTING.md supportedsites
git add README.md CONTRIBUTING.md docs/supportedsites.md youtube_dl/version.py
/bin/echo -e "\n### Committing documentation, templates and youtube_dl/version.py..."
make README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md supportedsites
git add README.md CONTRIBUTING.md .github/ISSUE_TEMPLATE.md docs/supportedsites.md youtube_dl/version.py
git commit -m "release $version"
/bin/echo -e "\n### Now tagging, signing and pushing..."

View File

@@ -57,6 +57,7 @@
- **AudioBoom**
- **audiomack**
- **audiomack:album**
- **auroravid**: AuroraVid
- **Azubu**
- **AzubuLive**
- **BaiduVideo**: 百度视频
@@ -92,12 +93,14 @@
- **BYUtv**
- **Camdemy**
- **CamdemyFolder**
- **CamWithHer**
- **canalc2.tv**
- **Canalplus**: canalplus.fr, piwiplus.fr and d8.tv
- **Canvas**
- **CBC**
- **CBCPlayer**
- **CBS**
- **CBSInteractive**
- **CBSNews**: CBS News
- **CBSNewsLiveVideo**: CBS News Live Videos
- **CBSSports**
@@ -112,13 +115,14 @@
- **Cinemassacre**
- **Clipfish**
- **cliphunter**
- **ClipRs**
- **Clipsyndicate**
- **cloudtime**: CloudTime
- **Cloudy**
- **Clubic**
- **Clyp**
- **cmt.com**
- **CNET**
- **CNBC**
- **CNN**
- **CNNArticle**
- **CNNBlogs**
@@ -134,6 +138,7 @@
- **CrooksAndLiars**
- **Crunchyroll**
- **crunchyroll:playlist**
- **CSNNE**
- **CSpan**: C-SPAN
- **CtsNews**: 華視新聞
- **culturebox.francetvinfo.fr**
@@ -282,7 +287,6 @@
- **ivi:compilation**: ivi.ru compilations
- **ivideon**: Ivideon TV
- **Izlesene**
- **JadoreCettePub**
- **JeuxVideo**
- **Jove**
- **jpopsuki.tv**
@@ -376,7 +380,8 @@
- **myvideo** (Currently broken)
- **MyVidster**
- **n-tv.de**
- **NationalGeographic**
- **natgeo**
- **natgeo:channel**
- **Naver**
- **NBA**
- **NBC**
@@ -416,7 +421,6 @@
- **Normalboots**
- **NosVideo**
- **Nova**: TN.cz, Prásk.tv, Nova.cz, Novaplus.cz, FANDA.tv, Krásná.cz and Doma.cz
- **novamov**: NovaMov
- **nowness**
- **nowness:playlist**
- **nowness:series**
@@ -480,6 +484,7 @@
- **Pornotube**
- **PornoVoisines**
- **PornoXO**
- **PressTV**
- **PrimeShareTV**
- **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital
@@ -618,7 +623,6 @@
- **Telegraaf**
- **TeleMB**
- **TeleTask**
- **TenPlay**
- **TF1**
- **TheIntercept**
- **TheOnion**
@@ -740,6 +744,7 @@
- **vlive**
- **Vodlocker**
- **VoiceRepublic**
- **VoxMedia**
- **Vporn**
- **vpro**: npo.nl and ntr.nl
- **VRT**

View File

@@ -2,5 +2,5 @@
universal = True
[flake8]
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,setup.py,build,.git
exclude = youtube_dl/extractor/__init__.py,devscripts/buildserver.py,devscripts/lazy_load_template.py,devscripts/make_issue_template.py,setup.py,build,.git
ignore = E402,E501,E731

View File

@@ -8,11 +8,12 @@ import warnings
import sys
try:
from setuptools import setup
from setuptools import setup, Command
setuptools_available = True
except ImportError:
from distutils.core import setup
from distutils.core import setup, Command
setuptools_available = False
from distutils.spawn import spawn
try:
# This will create an exe that needs Microsoft Visual C++ 2008
@@ -70,6 +71,22 @@ else:
else:
params['scripts'] = ['bin/youtube-dl']
class build_lazy_extractors(Command):
description = "Build the extractor lazy loading module"
user_options = []
def initialize_options(self):
pass
def finalize_options(self):
pass
def run(self):
spawn(
[sys.executable, 'devscripts/make_lazy_extractors.py', 'youtube_dl/extractor/lazy_extractors.py'],
dry_run=self.dry_run,
)
# Get the version from youtube_dl/version.py without importing the package
exec(compile(open('youtube_dl/version.py').read(),
'youtube_dl/version.py', 'exec'))
@@ -107,5 +124,6 @@ setup(
"Programming Language :: Python :: 3.4",
],
cmdclass={'build_lazy_extractors': build_lazy_extractors},
**params
)

View File

@@ -143,6 +143,9 @@ def expect_value(self, got, expected, field):
expect_value(self, item_got, item_expected, field)
else:
if isinstance(expected, compat_str) and expected.startswith('md5:'):
self.assertTrue(
isinstance(got, compat_str),
'Expected field %s to be a unicode object, but got value %r of type %r' % (field, got, type(got)))
got = 'md5:' + md5(got)
elif isinstance(expected, compat_str) and expected.startswith('mincount:'):
self.assertTrue(

View File

@@ -11,6 +11,7 @@ sys.path.insert(0, os.path.dirname(os.path.dirname(os.path.abspath(__file__))))
from test.helper import FakeYDL
from youtube_dl.extractor.common import InfoExtractor
from youtube_dl.extractor import YoutubeIE, get_info_extractor
from youtube_dl.utils import encode_data_uri, strip_jsonp, ExtractorError
class TestIE(InfoExtractor):
@@ -66,5 +67,14 @@ class TestInfoExtractor(unittest.TestCase):
self.assertEqual(ie._html_search_meta('e', html), '5')
self.assertEqual(ie._html_search_meta('f', html), '6')
def test_download_json(self):
uri = encode_data_uri(b'{"foo": "blah"}', 'application/json')
self.assertEqual(self.ie._download_json(uri, None), {'foo': 'blah'})
uri = encode_data_uri(b'callback({"foo": "blah"})', 'application/javascript')
self.assertEqual(self.ie._download_json(uri, None, transform_source=strip_jsonp), {'foo': 'blah'})
uri = encode_data_uri(b'{"foo": invalid}', 'application/json')
self.assertRaises(ExtractorError, self.ie._download_json, uri, None)
self.assertEqual(self.ie._download_json(uri, None, fatal=False), None)
if __name__ == '__main__':
unittest.main()

View File

@@ -76,6 +76,10 @@ class TestCompat(unittest.TestCase):
self.assertEqual(compat_urllib_parse_urlencode({'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': 'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode({b'abc': b'def'}), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([('abc', b'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', 'def')]), 'abc=def')
self.assertEqual(compat_urllib_parse_urlencode([(b'abc', b'def')]), 'abc=def')
def test_compat_shlex_split(self):
self.assertEqual(compat_shlex_split('-option "one two"'), ['-option', 'one two'])

View File

@@ -20,6 +20,7 @@ from youtube_dl.utils import (
args_to_str,
encode_base_n,
clean_html,
date_from_str,
DateRange,
detect_exe_version,
determine_ext,
@@ -234,6 +235,13 @@ class TestUtil(unittest.TestCase):
self.assertEqual(unescapeHTML('&eacute;'), 'é')
self.assertEqual(unescapeHTML('&#2013266066;'), '&#2013266066;')
def test_date_from_str(self):
self.assertEqual(date_from_str('yesterday'), date_from_str('now-1day'))
self.assertEqual(date_from_str('now+7day'), date_from_str('now+1week'))
self.assertEqual(date_from_str('now+14day'), date_from_str('now+2week'))
self.assertEqual(date_from_str('now+365day'), date_from_str('now+1year'))
self.assertEqual(date_from_str('now+30day'), date_from_str('now+1month'))
def test_daterange(self):
_20century = DateRange("19000101", "20000101")
self.assertFalse("17890714" in _20century)

View File

@@ -39,6 +39,8 @@ from .compat import (
compat_urllib_request_DataHandler,
)
from .utils import (
age_restricted,
args_to_str,
ContentTooShortError,
date_from_str,
DateRange,
@@ -58,13 +60,16 @@ from .utils import (
PagedList,
parse_filesize,
PerRequestProxyHandler,
PostProcessingError,
platform_name,
PostProcessingError,
preferredencoding,
prepend_extension,
render_table,
replace_extension,
SameFileError,
sanitize_filename,
sanitize_path,
sanitize_url,
sanitized_Request,
std_headers,
subtitles_filename,
@@ -75,13 +80,9 @@ from .utils import (
write_string,
YoutubeDLCookieProcessor,
YoutubeDLHandler,
prepend_extension,
replace_extension,
args_to_str,
age_restricted,
)
from .cache import Cache
from .extractor import get_info_extractor, gen_extractors
from .extractor import get_info_extractor, gen_extractor_classes, _LAZY_LOADER
from .downloader import get_suitable_downloader
from .downloader.rtmp import rtmpdump_version
from .postprocessor import (
@@ -377,8 +378,9 @@ class YoutubeDL(object):
def add_info_extractor(self, ie):
"""Add an InfoExtractor object to the end of the list."""
self._ies.append(ie)
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
if not isinstance(ie, type):
self._ies_instances[ie.ie_key()] = ie
ie.set_downloader(self)
def get_info_extractor(self, ie_key):
"""
@@ -396,7 +398,7 @@ class YoutubeDL(object):
"""
Add the InfoExtractors returned by gen_extractors to the end of the list
"""
for ie in gen_extractors():
for ie in gen_extractor_classes():
self.add_info_extractor(ie)
def add_post_processor(self, pp):
@@ -660,6 +662,7 @@ class YoutubeDL(object):
if not ie.suitable(url):
continue
ie = self.get_info_extractor(ie.ie_key())
if not ie.working():
self.report_warning('The program functionality for this site has been marked as broken, '
'and will probably not work.')
@@ -1229,6 +1232,7 @@ class YoutubeDL(object):
t.get('preference'), t.get('width'), t.get('height'),
t.get('id'), t.get('url')))
for i, t in enumerate(thumbnails):
t['url'] = sanitize_url(t['url'])
if t.get('width') and t.get('height'):
t['resolution'] = '%dx%d' % (t['width'], t['height'])
if t.get('id') is None:
@@ -1238,7 +1242,10 @@ class YoutubeDL(object):
self.list_thumbnails(info_dict)
return
if thumbnails and 'thumbnail' not in info_dict:
thumbnail = info_dict.get('thumbnail')
if thumbnail:
info_dict['thumbnail'] = sanitize_url(thumbnail)
elif thumbnails:
info_dict['thumbnail'] = thumbnails[-1]['url']
if 'display_id' not in info_dict and 'id' in info_dict:
@@ -1263,6 +1270,8 @@ class YoutubeDL(object):
if subtitles:
for _, subtitle in subtitles.items():
for subtitle_format in subtitle:
if subtitle_format.get('url'):
subtitle_format['url'] = sanitize_url(subtitle_format['url'])
if 'ext' not in subtitle_format:
subtitle_format['ext'] = determine_ext(subtitle_format['url']).lower()
@@ -1292,6 +1301,8 @@ class YoutubeDL(object):
if 'url' not in format:
raise ExtractorError('Missing "url" key in result (index %d)' % i)
format['url'] = sanitize_url(format['url'])
if format.get('format_id') is None:
format['format_id'] = compat_str(i)
else:
@@ -1948,6 +1959,8 @@ class YoutubeDL(object):
write_string(encoding_str, encoding=None)
self._write_string('[debug] youtube-dl version ' + __version__ + '\n')
if _LAZY_LOADER:
self._write_string('[debug] Lazy loading extractors enabled' + '\n')
try:
sp = subprocess.Popen(
['git', 'rev-parse', '--short', 'HEAD'],

View File

@@ -181,7 +181,8 @@ except ImportError: # Python 2
if isinstance(e, dict):
e = encode_dict(e)
elif isinstance(e, (list, tuple,)):
e = encode_list(e)
list_e = encode_list(e)
e = tuple(list_e) if isinstance(e, tuple) else list_e
elif isinstance(e, compat_str):
e = e.encode(encoding)
return e

View File

@@ -223,6 +223,12 @@ def write_metadata_tag(stream, metadata):
write_unsigned_int(stream, FLV_TAG_HEADER_LEN + len(metadata))
def remove_encrypted_media(media):
return list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
def _add_ns(prop):
return '{http://ns.adobe.com/f4m/1.0}%s' % prop
@@ -244,9 +250,7 @@ class F4mFD(FragmentFD):
# without drmAdditionalHeaderId or drmAdditionalHeaderSetId attribute
if 'id' not in e.attrib:
self.report_error('Missing ID in f4m DRM')
media = list(filter(lambda e: 'drmAdditionalHeaderId' not in e.attrib and
'drmAdditionalHeaderSetId' not in e.attrib,
media))
media = remove_encrypted_media(media)
if not media:
self.report_error('Unsupported DRM')
return media

File diff suppressed because it is too large Load Diff

View File

@@ -44,6 +44,7 @@ class Abc7NewsIE(InfoExtractor):
'contentURL', webpage, 'm3u8 url', fatal=True)
formats = self._extract_m3u8_formats(m3u8, display_id, 'mp4')
self._sort_formats(formats)
title = self._og_search_title(webpage).strip()
description = self._og_search_description(webpage).strip()

View File

@@ -2,10 +2,14 @@
from __future__ import unicode_literals
import re
import functools
from .common import InfoExtractor
from ..compat import compat_str
from ..utils import int_or_none
from ..utils import (
int_or_none,
OnDemandPagedList,
)
class ACastIE(InfoExtractor):
@@ -26,13 +30,8 @@ class ACastIE(InfoExtractor):
def _real_extract(self, url):
channel, display_id = re.match(self._VALID_URL, url).groups()
embed_page = self._download_webpage(
re.sub('(?:www\.)?acast\.com', 'embedcdn.acast.com', url), display_id)
cast_data = self._parse_json(self._search_regex(
r'window\[\'acast/queries\'\]\s*=\s*([^;]+);', embed_page, 'acast data'),
display_id)['GetAcast/%s/%s' % (channel, display_id)]
cast_data = self._download_json(
'https://embed.acast.com/api/acasts/%s/%s' % (channel, display_id), display_id)
return {
'id': compat_str(cast_data['id']),
'display_id': display_id,
@@ -58,15 +57,26 @@ class ACastChannelIE(InfoExtractor):
'playlist_mincount': 20,
}
_API_BASE_URL = 'https://www.acast.com/api/'
_PAGE_SIZE = 10
@classmethod
def suitable(cls, url):
return False if ACastIE.suitable(url) else super(ACastChannelIE, cls).suitable(url)
def _real_extract(self, url):
display_id = self._match_id(url)
channel_data = self._download_json(self._API_BASE_URL + 'channels/%s' % display_id, display_id)
casts = self._download_json(self._API_BASE_URL + 'channels/%s/acasts' % display_id, display_id)
entries = [self.url_result('https://www.acast.com/%s/%s' % (display_id, cast['url']), 'ACast') for cast in casts]
def _fetch_page(self, channel_slug, page):
casts = self._download_json(
self._API_BASE_URL + 'channels/%s/acasts?page=%s' % (channel_slug, page),
channel_slug, note='Download page %d of channel data' % page)
for cast in casts:
yield self.url_result(
'https://www.acast.com/%s/%s' % (channel_slug, cast['url']),
'ACast', cast['id'])
return self.playlist_result(entries, compat_str(channel_data['id']), channel_data['name'], channel_data.get('description'))
def _real_extract(self, url):
channel_slug = self._match_id(url)
channel_data = self._download_json(
self._API_BASE_URL + 'channels/%s' % channel_slug, channel_slug)
entries = OnDemandPagedList(functools.partial(
self._fetch_page, channel_slug), self._PAGE_SIZE)
return self.playlist_result(entries, compat_str(
channel_data['id']), channel_data['name'], channel_data.get('description'))

View File

@@ -1,13 +1,19 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import smuggle_url
from ..utils import (
smuggle_url,
update_url_query,
unescapeHTML,
)
class AENetworksIE(InfoExtractor):
IE_NAME = 'aenetworks'
IE_DESC = 'A+E Networks: A&E, Lifetime, History.com, FYI Network'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_VALID_URL = r'https?://(?:www\.)?(?:(?:history|aetv|mylifetime)\.com|fyi\.tv)/(?P<type>[^/]+)/(?:[^/]+/)+(?P<id>[^/]+?)(?:$|[?#])'
_TESTS = [{
'url': 'http://www.history.com/topics/valentines-day/history-of-valentines-day/videos/bet-you-didnt-know-valentines-day?m=528e394da93ae&s=undefined&f=1&free=false',
@@ -16,6 +22,9 @@ class AENetworksIE(InfoExtractor):
'ext': 'mp4',
'title': "Bet You Didn't Know: Valentine's Day",
'description': 'md5:7b57ea4829b391995b405fa60bd7b5f7',
'timestamp': 1375819729,
'upload_date': '20130806',
'uploader': 'AENE-NEW',
},
'params': {
# m3u8 download
@@ -25,15 +34,15 @@ class AENetworksIE(InfoExtractor):
'expected_warnings': ['JSON-LD'],
}, {
'url': 'http://www.history.com/shows/mountain-men/season-1/episode-1',
'md5': '8ff93eb073449f151d6b90c0ae1ef0c7',
'info_dict': {
'id': 'eg47EERs_JsZ',
'ext': 'mp4',
'title': 'Winter Is Coming',
'description': 'md5:641f424b7a19d8e24f26dea22cf59d74',
},
'params': {
# m3u8 download
'skip_download': True,
'timestamp': 1338306241,
'upload_date': '20120529',
'uploader': 'AENE-NEW',
},
'add_ie': ['ThePlatform'],
}, {
@@ -48,7 +57,7 @@ class AENetworksIE(InfoExtractor):
}]
def _real_extract(self, url):
video_id = self._match_id(url)
page_type, video_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, video_id)
@@ -56,11 +65,23 @@ class AENetworksIE(InfoExtractor):
r'data-href="[^"]*/%s"[^>]+data-release-url="([^"]+)"' % video_id,
r"media_url\s*=\s*'([^']+)'"
]
video_url = self._search_regex(video_url_re, webpage, 'video url')
video_url = unescapeHTML(self._search_regex(video_url_re, webpage, 'video url'))
query = {'mbr': 'true'}
if page_type == 'shows':
query['assetTypes'] = 'medium_video_s3'
if 'switch=hds' in video_url:
query['switch'] = 'hls'
info = self._search_json_ld(webpage, video_id, fatal=False)
info.update({
'_type': 'url_transparent',
'url': smuggle_url(video_url, {'sig': {'key': 'crazyjava', 'secret': 's3cr3t'}}),
'url': smuggle_url(
update_url_query(video_url, query),
{
'sig': {
'key': 'crazyjava',
'secret': 's3cr3t'},
'force_smil_url': True
}),
})
return info

View File

@@ -69,12 +69,14 @@ class AMPIE(InfoExtractor):
self._sort_formats(formats)
timestamp = parse_iso8601(item.get('pubDate'), ' ') or parse_iso8601(item.get('dc-date'))
return {
'id': video_id,
'title': get_media_node('title'),
'description': get_media_node('description'),
'thumbnails': thumbnails,
'timestamp': parse_iso8601(item.get('pubDate'), ' '),
'timestamp': timestamp,
'duration': int_or_none(media_content[0].get('@attributes', {}).get('duration')),
'subtitles': subtitles,
'formats': formats,

View File

@@ -1,11 +1,18 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
ExtractorError,
int_or_none,
)
class AolIE(InfoExtractor):
IE_NAME = 'on.aol.com'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[0-9]+)(?:$|\?)'
_VALID_URL = r'(?:aol-video:|https?://on\.aol\.com/video/.*-)(?P<id>[^/?-]+)'
_TESTS = [{
'url': 'http://on.aol.com/video/u-s--official-warns-of-largest-ever-irs-phone-scam-518167793?icid=OnHomepageC2Wide_MustSee_Img',
@@ -14,13 +21,79 @@ class AolIE(InfoExtractor):
'id': '518167793',
'ext': 'mp4',
'title': 'U.S. Official Warns Of \'Largest Ever\' IRS Phone Scam',
'description': 'A major phone scam has cost thousands of taxpayers more than $1 million, with less than a month until income tax returns are due to the IRS.',
'timestamp': 1395405060,
'upload_date': '20140321',
'uploader': 'Newsy Studio',
},
'add_ie': ['FiveMin'],
'params': {
# m3u8 download
'skip_download': True,
}
}, {
'url': 'http://on.aol.com/video/netflix-is-raising-rates-5707d6b8e4b090497b04f706?context=PC:homepage:PL1944:1460189336183',
'info_dict': {
'id': '5707d6b8e4b090497b04f706',
'ext': 'mp4',
'title': 'Netflix is Raising Rates',
'description': 'Netflix is rewarding millions of its long-standing members with an increase in cost. Veuers Carly Figueroa has more.',
'upload_date': '20160408',
'timestamp': 1460123280,
'uploader': 'Veuer',
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
def _real_extract(self, url):
video_id = self._match_id(url)
return self.url_result('5min:%s' % video_id)
response = self._download_json(
'https://feedapi.b2c.on.aol.com/v1.0/app/videos/aolon/%s/details' % video_id,
video_id)['response']
if response['statusText'] != 'Ok':
raise ExtractorError('%s said: %s' % (self.IE_NAME, response['statusText']), expected=True)
video_data = response['data']
formats = []
m3u8_url = video_data.get('videoMasterPlaylist')
if m3u8_url:
formats.extend(self._extract_m3u8_formats(
m3u8_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
for rendition in video_data.get('renditions', []):
video_url = rendition.get('url')
if not video_url:
continue
ext = rendition.get('format')
if ext == 'm3u8':
formats.extend(self._extract_m3u8_formats(
video_url, video_id, 'mp4', m3u8_id='hls', fatal=False))
else:
f = {
'url': video_url,
'format_id': rendition.get('quality'),
}
mobj = re.search(r'(\d+)x(\d+)', video_url)
if mobj:
f.update({
'width': int(mobj.group(1)),
'height': int(mobj.group(2)),
})
formats.append(f)
self._sort_formats(formats, ('width', 'height', 'tbr', 'format_id'))
return {
'id': video_id,
'title': video_data['title'],
'duration': int_or_none(video_data.get('duration')),
'timestamp': int_or_none(video_data.get('publishDate')),
'view_count': int_or_none(video_data.get('views')),
'description': video_data.get('description'),
'uploader': video_data.get('videoOwner'),
'formats': formats,
}
class AolFeaturesIE(InfoExtractor):

View File

@@ -83,7 +83,7 @@ class ARDMediathekIE(InfoExtractor):
subtitle_url = media_info.get('_subtitleUrl')
if subtitle_url:
subtitles['de'] = [{
'ext': 'srt',
'ext': 'ttml',
'url': subtitle_url,
}]

View File

@@ -337,7 +337,7 @@ class ArteTVEmbedIE(ArteTVPlus7IE):
IE_NAME = 'arte.tv:embed'
_VALID_URL = r'''(?x)
http://www\.arte\.tv
/playerv2/embed\.php\?json_url=
/(?:playerv2/embed|arte_vp/index)\.php\?json_url=
(?P<json_url>
http://arte\.tv/papi/tvguide/videos/stream/player/
(?P<lang>[^/]+)/(?P<id>[^/]+)[^&]*

View File

@@ -120,6 +120,7 @@ class AzubuLiveIE(InfoExtractor):
bc_info = self._download_json(req, user)
m3u8_url = next(source['src'] for source in bc_info['sources'] if source['container'] == 'M2TS')
formats = self._extract_m3u8_formats(m3u8_url, user, ext='mp4')
self._sort_formats(formats)
return {
'id': info['id'],

View File

@@ -328,6 +328,7 @@ class BBCCoUkIE(InfoExtractor):
'format_id': '%s_%s' % (service, format['format_id']),
'abr': abr,
'acodec': acodec,
'vcodec': 'none',
})
formats.extend(conn_formats)
return formats
@@ -688,6 +689,10 @@ class BBCIE(BBCCoUkIE):
# custom redirection to www.bbc.com
'url': 'http://www.bbc.co.uk/news/science-environment-33661876',
'only_matching': True,
}, {
# single video article embedded with data-media-vpid
'url': 'http://www.bbc.co.uk/sport/rowing/35908187',
'only_matching': True,
}]
@classmethod
@@ -817,7 +822,7 @@ class BBCIE(BBCCoUkIE):
# single video story (e.g. http://www.bbc.com/travel/story/20150625-sri-lankas-spicy-secret)
programme_id = self._search_regex(
[r'data-video-player-vpid="(%s)"' % self._ID_REGEX,
[r'data-(?:video-player|media)-vpid="(%s)"' % self._ID_REGEX,
r'<param[^>]+name="externalIdentifier"[^>]+value="(%s)"' % self._ID_REGEX,
r'videoId\s*:\s*["\'](%s)["\']' % self._ID_REGEX],
webpage, 'vpid', default=None)

View File

@@ -33,8 +33,33 @@ class BeegIE(InfoExtractor):
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
cpl_url = self._search_regex(
r'<script[^>]+src=(["\'])(?P<url>(?:https?:)?//static\.beeg\.com/cpl/\d+\.js.*?)\1',
webpage, 'cpl', default=None, group='url')
beeg_version, beeg_salt = [None] * 2
if cpl_url:
cpl = self._download_webpage(
self._proto_relative_url(cpl_url), video_id,
'Downloading cpl JS', fatal=False)
if cpl:
beeg_version = self._search_regex(
r'beeg_version\s*=\s*(\d+)', cpl,
'beeg version', default=None) or self._search_regex(
r'/(\d+)\.js', cpl_url, 'beeg version', default=None)
beeg_salt = self._search_regex(
r'beeg_salt\s*=\s*(["\'])(?P<beeg_salt>.+?)\1', cpl, 'beeg beeg_salt',
default=None, group='beeg_salt')
beeg_version = beeg_version or '1750'
beeg_salt = beeg_salt or 'MIDtGaw96f0N1kMMAM1DE46EC9pmFr'
video = self._download_json(
'https://api.beeg.com/api/v5/video/%s' % video_id, video_id)
'http://api.beeg.com/api/v6/%s/video/%s' % (beeg_version, video_id),
video_id)
def split(o, e):
def cut(s, x):
@@ -50,8 +75,8 @@ class BeegIE(InfoExtractor):
return n
def decrypt_key(key):
# Reverse engineered from http://static.beeg.com/cpl/1105.js
a = '5ShMcIQlssOd7zChAIOlmeTZDaUxULbJRnywYaiB'
# Reverse engineered from http://static.beeg.com/cpl/1738.js
a = beeg_salt
e = compat_urllib_parse_unquote(key)
o = ''.join([
compat_chr(compat_ord(e[n]) - compat_ord(a[n % len(a)]) % 21)
@@ -101,5 +126,5 @@ class BeegIE(InfoExtractor):
'duration': duration,
'tags': tags,
'formats': formats,
'age_limit': 18,
'age_limit': self._rta_search(webpage),
}

View File

@@ -94,6 +94,7 @@ class BetIE(InfoExtractor):
xpath_with_ns('./media:thumbnail', NS_MAP)).get('url')
formats = self._extract_smil_formats(smil_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -15,6 +15,9 @@ class BravoTVIE(InfoExtractor):
'ext': 'mp4',
'title': 'Last Chance Kitchen Returns',
'description': 'S13: Last Chance Kitchen Returns for Top Chef Season 13',
'timestamp': 1448926740,
'upload_date': '20151130',
'uploader': 'NBCU-BRAV',
}
}

View File

@@ -46,6 +46,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'Xavier Sala i Martín: “Un banc que no presta és un banc zombi que no serveix per a res”',
'uploader': '8TV',
'description': 'md5:a950cc4285c43e44d763d036710cd9cd',
'timestamp': 1368213670,
'upload_date': '20130510',
'uploader_id': '1589608506001',
}
},
{
@@ -57,6 +60,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'JVMLS 2012: Arrays 2.0 - Opportunities and Challenges',
'description': 'John Rose speaks at the JVM Language Summit, August 1, 2012.',
'uploader': 'Oracle',
'timestamp': 1344975024,
'upload_date': '20120814',
'uploader_id': '1460825906',
},
},
{
@@ -68,6 +74,9 @@ class BrightcoveLegacyIE(InfoExtractor):
'title': 'This Bracelet Acts as a Personal Thermostat',
'description': 'md5:547b78c64f4112766ccf4e151c20b6a0',
'uploader': 'Mashable',
'timestamp': 1382041798,
'upload_date': '20131017',
'uploader_id': '1130468786001',
},
},
{
@@ -85,14 +94,17 @@ class BrightcoveLegacyIE(InfoExtractor):
{
# test flv videos served by akamaihd.net
# From http://www.redbull.com/en/bike/stories/1331655643987/replay-uci-dh-world-cup-2014-from-fort-william
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3ABC2996102916001&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
'url': 'http://c.brightcove.com/services/viewer/htmlFederated?%40videoPlayer=ref%3Aevent-stream-356&linkBaseURL=http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fvideos%2F1331655630249%2Freplay-uci-fort-william-2014-dh&playerKey=AQ%7E%7E%2CAAAApYJ7UqE%7E%2Cxqr_zXk0I-zzNndy8NlHogrCb5QdyZRf&playerID=1398061561001#__youtubedl_smuggle=%7B%22Referer%22%3A+%22http%3A%2F%2Fwww.redbull.com%2Fen%2Fbike%2Fstories%2F1331655643987%2Freplay-uci-dh-world-cup-2014-from-fort-william%22%7D',
# The md5 checksum changes on each download
'info_dict': {
'id': '2996102916001',
'id': '3750436379001',
'ext': 'flv',
'title': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'uploader': 'Red Bull TV',
'uploader': 'RBTV Old (do not use)',
'description': 'UCI MTB World Cup 2014: Fort William, UK - Downhill Finals',
'timestamp': 1409122195,
'upload_date': '20140827',
'uploader_id': '710858724001',
},
},
{
@@ -106,6 +118,12 @@ class BrightcoveLegacyIE(InfoExtractor):
'playlist_mincount': 7,
},
]
FLV_VCODECS = {
1: 'SORENSON',
2: 'ON2',
3: 'H264',
4: 'VP8',
}
@classmethod
def _build_brighcove_url(cls, object_str):
@@ -136,13 +154,16 @@ class BrightcoveLegacyIE(InfoExtractor):
else:
flashvars = {}
data_url = object_doc.attrib.get('data', '')
data_url_params = compat_parse_qs(compat_urllib_parse_urlparse(data_url).query)
def find_param(name):
if name in flashvars:
return flashvars[name]
node = find_xpath_attr(object_doc, './param', 'name', name)
if node is not None:
return node.attrib['value']
return None
return data_url_params.get(name)
params = {}
@@ -286,15 +307,19 @@ class BrightcoveLegacyIE(InfoExtractor):
playlist_title=playlist_info['mediaCollectionDTO']['displayName'])
def _extract_video_info(self, video_info):
publisher_id = video_info.get('publisherId')
info = {
'id': compat_str(video_info['id']),
'title': video_info['displayName'].strip(),
'description': video_info.get('shortDescription'),
'thumbnail': video_info.get('videoStillURL') or video_info.get('thumbnailURL'),
'uploader': video_info.get('publisherName'),
'uploader_id': compat_str(publisher_id) if publisher_id else None,
'duration': float_or_none(video_info.get('length'), 1000),
'timestamp': int_or_none(video_info.get('creationDate'), 1000),
}
renditions = video_info.get('renditions')
renditions = video_info.get('renditions', []) + video_info.get('IOSRenditions', [])
if renditions:
formats = []
for rend in renditions:
@@ -315,19 +340,42 @@ class BrightcoveLegacyIE(InfoExtractor):
ext = 'flv'
if ext is None:
ext = determine_ext(url)
size = rend.get('size')
formats.append({
tbr = int_or_none(rend.get('encodingRate'), 1000),
a_format = {
'format_id': 'http%s' % ('-%s' % tbr if tbr else ''),
'url': url,
'ext': ext,
'height': rend.get('frameHeight'),
'width': rend.get('frameWidth'),
'filesize': size if size != 0 else None,
})
'filesize': int_or_none(rend.get('size')) or None,
'tbr': tbr,
}
if rend.get('audioOnly'):
a_format.update({
'vcodec': 'none',
})
else:
a_format.update({
'height': int_or_none(rend.get('frameHeight')),
'width': int_or_none(rend.get('frameWidth')),
'vcodec': rend.get('videoCodec'),
})
# m3u8 manifests with remote == false are media playlists
# Not calling _extract_m3u8_formats here to save network traffic
if ext == 'm3u8':
a_format.update({
'format_id': 'hls%s' % ('-%s' % tbr if tbr else ''),
'ext': 'mp4',
'protocol': 'm3u8',
})
formats.append(a_format)
self._sort_formats(formats)
info['formats'] = formats
elif video_info.get('FLVFullLengthURL') is not None:
info.update({
'url': video_info['FLVFullLengthURL'],
'vcodec': self.FLV_VCODECS.get(video_info.get('FLVFullCodec')),
'filesize': int_or_none(video_info.get('FLVFullSize')),
})
if self._downloader.params.get('include_ads', False):
@@ -383,6 +431,7 @@ class BrightcoveNewIE(InfoExtractor):
'formats': 'mincount:41',
},
'params': {
# m3u8 download
'skip_download': True,
}
}, {
@@ -426,7 +475,7 @@ class BrightcoveNewIE(InfoExtractor):
</video>.*?
<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/
(\d+)/([\da-f-]+)_([^/]+)/index(?:\.min)?\.js
(\d+)/([^/]+)_([^/]+)/index(?:\.min)?\.js
''', webpage):
entries.append(
'http://players.brightcove.net/%s/%s_%s/index.html?videoId=%s'
@@ -467,7 +516,7 @@ class BrightcoveNewIE(InfoExtractor):
raise ExtractorError(json_data[0]['message'], expected=True)
raise
title = json_data['name']
title = json_data['name'].strip()
formats = []
for source in json_data.get('sources', []):
@@ -520,7 +569,7 @@ class BrightcoveNewIE(InfoExtractor):
f.update({
'url': src or streaming_src,
'format_id': build_format_id('http' if src else 'http-streaming'),
'preference': 2 if src else 1,
'source_preference': 0 if src else -1,
})
else:
f.update({
@@ -531,20 +580,22 @@ class BrightcoveNewIE(InfoExtractor):
formats.append(f)
self._sort_formats(formats)
description = json_data.get('description')
thumbnail = json_data.get('thumbnail')
timestamp = parse_iso8601(json_data.get('published_at'))
duration = float_or_none(json_data.get('duration'), 1000)
tags = json_data.get('tags', [])
subtitles = {}
for text_track in json_data.get('text_tracks', []):
if text_track.get('src'):
subtitles.setdefault(text_track.get('srclang'), []).append({
'url': text_track['src'],
})
return {
'id': video_id,
'title': title,
'description': description,
'thumbnail': thumbnail,
'duration': duration,
'timestamp': timestamp,
'description': json_data.get('description'),
'thumbnail': json_data.get('thumbnail') or json_data.get('poster'),
'duration': float_or_none(json_data.get('duration'), 1000),
'timestamp': parse_iso8601(json_data.get('published_at')),
'uploader_id': account_id,
'formats': formats,
'tags': tags,
'subtitles': subtitles,
'tags': json_data.get('tags', []),
}

View File

@@ -0,0 +1,87 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
int_or_none,
parse_duration,
unified_strdate,
)
class CamWithHerIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?camwithher\.tv/view_video\.php\?.*\bviewkey=(?P<id>\w+)'
_TESTS = [{
'url': 'http://camwithher.tv/view_video.php?viewkey=6e9a24e2c0e842e1f177&page=&viewtype=&category=',
'info_dict': {
'id': '5644',
'ext': 'flv',
'title': 'Periscope Tease',
'description': 'In the clouds teasing on periscope to my favorite song',
'duration': 240,
'view_count': int,
'comment_count': int,
'uploader': 'MileenaK',
'upload_date': '20160322',
},
'params': {
'skip_download': True,
}
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=6dfd8b7c97531a459937',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?page=&viewkey=6e9a24e2c0e842e1f177&viewtype=&category=',
'only_matching': True,
}, {
'url': 'http://camwithher.tv/view_video.php?viewkey=b6c3b5bea9515d1a1fc4&page=&viewtype=&category=mv',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
flv_id = self._html_search_regex(
r'<a[^>]+href=["\']/download/\?v=(\d+)', webpage, 'video id')
# Video URL construction algorithm is reverse-engineered from cwhplayer.swf
rtmp_url = 'rtmp://camwithher.tv/clipshare/%s' % (
('mp4:%s.mp4' % flv_id) if int(flv_id) > 2010 else flv_id)
title = self._html_search_regex(
r'<div[^>]+style="float:left"[^>]*>\s*<h2>(.+?)</h2>', webpage, 'title')
description = self._html_search_regex(
r'>Description:</span>(.+?)</div>', webpage, 'description', default=None)
runtime = self._search_regex(
r'Runtime\s*:\s*(.+?) \|', webpage, 'duration', default=None)
if runtime:
runtime = re.sub(r'[\s-]', '', runtime)
duration = parse_duration(runtime)
view_count = int_or_none(self._search_regex(
r'Views\s*:\s*(\d+)', webpage, 'view count', default=None))
comment_count = int_or_none(self._search_regex(
r'Comments\s*:\s*(\d+)', webpage, 'comment count', default=None))
uploader = self._search_regex(
r'Added by\s*:\s*<a[^>]+>([^<]+)</a>', webpage, 'uploader', default=None)
upload_date = unified_strdate(self._search_regex(
r'Added on\s*:\s*([\d-]+)', webpage, 'upload date', default=None))
return {
'id': flv_id,
'url': rtmp_url,
'ext': 'flv',
'no_resume': True,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'comment_count': comment_count,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -1,24 +1,41 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from ..utils import (
sanitized_Request,
smuggle_url,
xpath_text,
xpath_element,
int_or_none,
ExtractorError,
find_xpath_attr,
)
class CBSIE(InfoExtractor):
class CBSBaseIE(ThePlatformIE):
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
class CBSIE(CBSBaseIE):
_VALID_URL = r'https?://(?:www\.)?(?:cbs\.com/shows/[^/]+/(?:video|artist)|colbertlateshow\.com/(?:video|podcasts))/[^/]+/(?P<id>[^/]+)'
_TESTS = [{
'url': 'http://www.cbs.com/shows/garth-brooks/video/_u7W953k6la293J7EPTd9oHkSPs6Xn6_/connect-chat-feat-garth-brooks/',
'info_dict': {
'id': '4JUVEwq3wUT7',
'id': '_u7W953k6la293J7EPTd9oHkSPs6Xn6_',
'display_id': 'connect-chat-feat-garth-brooks',
'ext': 'flv',
'ext': 'mp4',
'title': 'Connect Chat feat. Garth Brooks',
'description': 'Connect with country music singer Garth Brooks, as he chats with fans on Wednesday November 27, 2013. Be sure to tune in to Garth Brooks: Live from Las Vegas, Friday November 29, at 9/8c on CBS!',
'duration': 1495,
'timestamp': 1385585425,
'upload_date': '20131127',
'uploader': 'CBSI-NEW',
},
'params': {
# rtmp download
@@ -47,22 +64,46 @@ class CBSIE(InfoExtractor):
'url': 'http://www.colbertlateshow.com/podcasts/dYSwjqPs_X1tvbV_P2FcPWRa_qT6akTC/in-the-bad-room-with-stephen/',
'only_matching': True,
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/dJ5BDC/%s?manifest=m3u&mbr=true'
def _real_extract(self, url):
display_id = self._match_id(url)
request = sanitized_Request(url)
# Android UA is served with higher quality (720p) streams (see
# https://github.com/rg3/youtube-dl/issues/7490)
request.add_header('User-Agent', 'Mozilla/5.0 (Linux; Android 4.4; Nexus 5)')
webpage = self._download_webpage(request, display_id)
real_id = self._search_regex(
[r"video\.settings\.pid\s*=\s*'([^']+)';", r"cbsplayer\.pid\s*=\s*'([^']+)';"],
webpage, 'real video ID')
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/dJ5BDC/%s?mbr=true&manifest=m3u' % real_id,
{'force_smil_url': True}),
webpage = self._download_webpage(url, display_id)
content_id = self._search_regex(
[r"video\.settings\.content_id\s*=\s*'([^']+)';", r"cbsplayer\.contentId\s*=\s*'([^']+)';"],
webpage, 'content id')
items_data = self._download_xml(
'http://can.cbs.com/thunder/player/videoPlayerService.php',
content_id, query={'partner': 'cbs', 'contentId': content_id})
video_data = xpath_element(items_data, './/item')
title = xpath_text(video_data, 'videoTitle', 'title', True)
subtitles = {}
formats = []
for item in items_data.findall('.//item'):
pid = xpath_text(item, 'pid')
if not pid:
continue
try:
tp_formats, tp_subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % pid, content_id, 'Downloading %s SMIL data' % pid)
except ExtractorError:
continue
formats.extend(tp_formats)
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
info = self.get_metadata('dJ5BDC/media/guid/2198311517/%s' % content_id, content_id)
info.update({
'id': content_id,
'display_id': display_id,
}
'title': title,
'series': xpath_text(video_data, 'seriesTitle'),
'season_number': int_or_none(xpath_text(video_data, 'seasonNumber')),
'episode_number': int_or_none(xpath_text(video_data, 'episodeNumber')),
'duration': int_or_none(xpath_text(video_data, 'videoLength'), 1000),
'thumbnail': xpath_text(video_data, 'previewImageURL'),
'formats': formats,
'subtitles': subtitles,
})
return info

View File

@@ -1,12 +1,14 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .theplatform import ThePlatformIE
from ..utils import int_or_none
class CNETIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?cnet\.com/videos/(?P<id>[^/]+)/'
class CBSInteractiveIE(ThePlatformIE):
_VALID_URL = r'https?://(?:www\.)?(?P<site>cnet|zdnet)\.com/(?:videos|video/share)/(?P<id>[^/?]+)'
_TESTS = [{
'url': 'http://www.cnet.com/videos/hands-on-with-microsofts-windows-8-1-update/',
'info_dict': {
@@ -17,6 +19,8 @@ class CNETIE(ThePlatformIE):
'uploader_id': '6085384d-619e-11e3-b231-14feb5ca9861',
'uploader': 'Sarah Mitroff',
'duration': 70,
'timestamp': 1396479627,
'upload_date': '20140402',
},
}, {
'url': 'http://www.cnet.com/videos/whiny-pothole-tweets-at-local-government-when-hit-by-cars-tomorrow-daily-187/',
@@ -28,15 +32,38 @@ class CNETIE(ThePlatformIE):
'uploader_id': 'b163284d-6b73-44fc-b3e6-3da66c392d40',
'uploader': 'Ashley Esqueda',
'duration': 1482,
'timestamp': 1433289889,
'upload_date': '20150603',
},
}, {
'url': 'http://www.zdnet.com/video/share/video-keeping-android-smartphones-and-tablets-secure/',
'info_dict': {
'id': 'bc1af9f0-a2b5-4e54-880d-0d95525781c0',
'ext': 'mp4',
'title': 'Video: Keeping Android smartphones and tablets secure',
'description': 'Here\'s the best way to keep Android devices secure, and what you do when they\'ve come to the end of their lives.',
'uploader_id': 'f2d97ea2-8175-11e2-9d12-0018fe8a00b0',
'uploader': 'Adrian Kingsley-Hughes',
'timestamp': 1448961720,
'upload_date': '20151201',
},
'params': {
# m3u8 download
'skip_download': True,
}
}]
TP_RELEASE_URL_TEMPLATE = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true'
MPX_ACCOUNTS = {
'cnet': 2288573011,
'zdnet': 2387448114,
}
def _real_extract(self, url):
display_id = self._match_id(url)
site, display_id = re.match(self._VALID_URL, url).groups()
webpage = self._download_webpage(url, display_id)
data_json = self._html_search_regex(
r"data-cnet-video(?:-uvp)?-options='([^']+)'",
r"data-(?:cnet|zdnet)-video(?:-uvp)?-options='([^']+)'",
webpage, 'data json')
data = self._parse_json(data_json, display_id)
vdata = data.get('video') or data['videos'][0]
@@ -51,16 +78,15 @@ class CNETIE(ThePlatformIE):
uploader = None
uploader_id = None
metadata = self.get_metadata('kYEXFC/%s' % list(vdata['files'].values())[0], video_id)
description = vdata.get('description') or metadata.get('description')
duration = int_or_none(vdata.get('duration')) or metadata.get('duration')
formats = []
subtitles = {}
media_guid_path = 'media/guid/%d/%s' % (self.MPX_ACCOUNTS[site], vdata['mpxRefId'])
formats, subtitles = [], {}
if site == 'cnet':
formats, subtitles = self._extract_theplatform_smil(
self.TP_RELEASE_URL_TEMPLATE % media_guid_path, video_id)
for (fkey, vid) in vdata['files'].items():
if fkey == 'hls_phone' and 'hls_tablet' in vdata['files']:
continue
release_url = 'http://link.theplatform.com/s/kYEXFC/%s?mbr=true' % vid
release_url = self.TP_RELEASE_URL_TEMPLATE % vid
if fkey == 'hds':
release_url += '&manifest=f4m'
tp_formats, tp_subtitles = self._extract_theplatform_smil(release_url, video_id, 'Downloading %s SMIL data' % fkey)
@@ -68,15 +94,15 @@ class CNETIE(ThePlatformIE):
subtitles = self._merge_subtitles(subtitles, tp_subtitles)
self._sort_formats(formats)
return {
info = self.get_metadata('kYEXFC/%s' % media_guid_path, video_id)
info.update({
'id': video_id,
'display_id': display_id,
'title': title,
'description': description,
'thumbnail': metadata.get('thumbnail'),
'duration': duration,
'duration': int_or_none(vdata.get('duration')),
'uploader': uploader,
'uploader_id': uploader_id,
'subtitles': subtitles,
'formats': formats,
}
})
return info

View File

@@ -2,14 +2,13 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from .theplatform import ThePlatformIE
from .cbs import CBSBaseIE
from ..utils import (
parse_duration,
find_xpath_attr,
)
class CBSNewsIE(ThePlatformIE):
class CBSNewsIE(CBSBaseIE):
IE_DESC = 'CBS News'
_VALID_URL = r'https?://(?:www\.)?cbsnews\.com/(?:news|videos)/(?P<id>[\da-z_-]+)'
@@ -49,15 +48,6 @@ class CBSNewsIE(ThePlatformIE):
},
]
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
closed_caption_e = find_xpath_attr(smil, self._xpath_ns('.//param', namespace), 'name', 'ClosedCaptionURL')
return {
'en': [{
'ext': 'ttml',
'url': closed_caption_e.attrib['value'],
}]
} if closed_caption_e is not None and closed_caption_e.attrib.get('value') else []
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -122,6 +112,7 @@ class CBSNewsLiveVideoIE(InfoExtractor):
for entry in f4m_formats:
# URLs without the extra param induce an 404 error
entry.update({'extra_param_to_segment_url': hdcore_sign})
self._sort_formats(f4m_formats)
return {
'id': video_id,

View File

@@ -48,6 +48,7 @@ class ChaturbateIE(InfoExtractor):
raise ExtractorError('Unable to find stream URL')
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -0,0 +1,90 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import (
ExtractorError,
float_or_none,
int_or_none,
parse_iso8601,
)
class ClipRsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?clip\.rs/(?P<id>[^/]+)/\d+'
_TEST = {
'url': 'http://www.clip.rs/premijera-frajle-predstavljaju-novi-spot-za-pesmu-moli-me-moli/3732',
'md5': 'c412d57815ba07b56f9edc7b5d6a14e5',
'info_dict': {
'id': '1488842.1399140381',
'ext': 'mp4',
'title': 'PREMIJERA Frajle predstavljaju novi spot za pesmu Moli me, moli',
'description': 'md5:56ce2c3b4ab31c5a2e0b17cb9a453026',
'duration': 229,
'timestamp': 1459850243,
'upload_date': '20160405',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video_id = self._search_regex(
r'id=(["\'])mvp:(?P<id>.+?)\1', webpage, 'mvp id', group='id')
response = self._download_json(
'http://qi.ckm.onetapi.pl/', video_id,
query={
'body[id]': video_id,
'body[jsonrpc]': '2.0',
'body[method]': 'get_asset_detail',
'body[params][ID_Publikacji]': video_id,
'body[params][Service]': 'www.onet.pl',
'content-type': 'application/jsonp',
'x-onet-app': 'player.front.onetapi.pl',
})
error = response.get('error')
if error:
raise ExtractorError(
'%s said: %s' % (self.IE_NAME, error['message']), expected=True)
video = response['result'].get('0')
formats = []
for _, formats_dict in video['formats'].items():
if not isinstance(formats_dict, dict):
continue
for format_id, format_list in formats_dict.items():
if not isinstance(format_list, list):
continue
for f in format_list:
if not f.get('url'):
continue
formats.append({
'url': f['url'],
'format_id': format_id,
'height': int_or_none(f.get('vertical_resolution')),
'width': int_or_none(f.get('horizontal_resolution')),
'abr': float_or_none(f.get('audio_bitrate')),
'vbr': float_or_none(f.get('video_bitrate')),
})
self._sort_formats(formats)
meta = video.get('meta', {})
title = self._og_search_title(webpage, default=None) or meta['title']
description = self._og_search_description(webpage, default=None) or meta.get('description')
duration = meta.get('length') or meta.get('lenght')
timestamp = parse_iso8601(meta.get('addDate'), ' ')
return {
'id': video_id,
'title': title,
'description': description,
'duration': duration,
'timestamp': timestamp,
'formats': formats,
}

View File

@@ -0,0 +1,36 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import smuggle_url
class CNBCIE(InfoExtractor):
_VALID_URL = r'https?://video\.cnbc\.com/gallery/\?video=(?P<id>[0-9]+)'
_TEST = {
'url': 'http://video.cnbc.com/gallery/?video=3000503714',
'info_dict': {
'id': '3000503714',
'ext': 'mp4',
'title': 'Fighting zombies is big business',
'description': 'md5:0c100d8e1a7947bd2feec9a5550e519e',
'timestamp': 1459332000,
'upload_date': '20160330',
'uploader': 'NBCU-CNBC',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/gZWlPC/media/guid/2408950221/%s?mbr=true&manifest=m3u' % video_id,
{'force_smil_url': True}),
'id': video_id,
}

View File

@@ -41,7 +41,13 @@ class ComCarCoffIE(InfoExtractor):
display_id = full_data['activeVideo']['video']
video_data = full_data.get('videos', {}).get(display_id) or full_data['singleshots'][display_id]
video_id = compat_str(video_data['mediaId'])
title = video_data['title']
formats = self._extract_m3u8_formats(
video_data['mediaUrl'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': video_data['images']['thumb'],
}, {
@@ -54,15 +60,14 @@ class ComCarCoffIE(InfoExtractor):
video_data.get('duration'))
return {
'_type': 'url_transparent',
'url': 'crackle:%s' % video_id,
'id': video_id,
'display_id': display_id,
'title': video_data['title'],
'title': title,
'description': video_data.get('description'),
'timestamp': timestamp,
'duration': duration,
'thumbnails': thumbnails,
'formats': formats,
'season_number': int_or_none(video_data.get('season')),
'episode_number': int_or_none(video_data.get('episode')),
'webpage_url': 'http://comediansincarsgettingcoffee.com/%s' % (video_data.get('urlSlug', video_data.get('slug'))),

View File

@@ -22,8 +22,10 @@ from ..compat import (
compat_str,
compat_urllib_error,
compat_urllib_parse_urlencode,
compat_urllib_request,
compat_urlparse,
)
from ..downloader.f4m import remove_encrypted_media
from ..utils import (
NO_DEFAULT,
age_restricted,
@@ -48,6 +50,7 @@ from ..utils import (
determine_protocol,
parse_duration,
mimetype2ext,
update_Request,
update_url_query,
)
@@ -229,6 +232,24 @@ class InfoExtractor(object):
episode_number: Number of the video episode within a season, as an integer.
episode_id: Id of the video episode, as a unicode string.
The following fields should only be used when the media is a track or a part of
a music album:
track: Title of the track.
track_number: Number of the track within an album or a disc, as an integer.
track_id: Id of the track (useful in case of custom indexing, e.g. 6.iii),
as a unicode string.
artist: Artist(s) of the track.
genre: Genre(s) of the track.
album: Title of the album the track belongs to.
album_type: Type of the album (e.g. "Demo", "Full-length", "Split", "Compilation", etc).
album_artist: List of all artists appeared on the album (e.g.
"Ash Borer / Fell Voices" or "Various Artists", useful for splits
and compilations).
disc_number: Number of the disc or other physical medium the track belongs to,
as an integer.
release_year: Year (YYYY) when the album was released.
Unless mentioned otherwise, the fields should be Unicode strings.
Unless mentioned otherwise, None is equivalent to absence of information.
@@ -346,7 +367,7 @@ class InfoExtractor(object):
def IE_NAME(self):
return compat_str(type(self).__name__[:-2])
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers=None, query=None):
def _request_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, data=None, headers={}, query={}):
""" Returns the response handle """
if note is None:
self.report_download_webpage(video_id)
@@ -355,12 +376,14 @@ class InfoExtractor(object):
self.to_screen('%s' % (note,))
else:
self.to_screen('%s: %s' % (video_id, note))
# data, headers and query params will be ignored for `Request` objects
if isinstance(url_or_request, compat_str):
if isinstance(url_or_request, compat_urllib_request.Request):
url_or_request = update_Request(
url_or_request, data=data, headers=headers, query=query)
else:
if query:
url_or_request = update_url_query(url_or_request, query)
if data or headers:
url_or_request = sanitized_Request(url_or_request, data, headers or {})
url_or_request = sanitized_Request(url_or_request, data, headers)
try:
return self._downloader.urlopen(url_or_request)
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
@@ -376,7 +399,7 @@ class InfoExtractor(object):
self._downloader.report_warning(errmsg)
return False
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers=None, query=None):
def _download_webpage_handle(self, url_or_request, video_id, note=None, errnote=None, fatal=True, encoding=None, data=None, headers={}, query={}):
""" Returns a tuple (page content as string, URL handle) """
# Strip hashes from the URL (#1038)
if isinstance(url_or_request, (compat_str, str)):
@@ -469,7 +492,7 @@ class InfoExtractor(object):
return content
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers=None, query=None):
def _download_webpage(self, url_or_request, video_id, note=None, errnote=None, fatal=True, tries=1, timeout=5, encoding=None, data=None, headers={}, query={}):
""" Returns the data of the page as a string """
success = False
try_count = 0
@@ -490,7 +513,7 @@ class InfoExtractor(object):
def _download_xml(self, url_or_request, video_id,
note='Downloading XML', errnote='Unable to download XML',
transform_source=None, fatal=True, encoding=None, data=None, headers=None, query=None):
transform_source=None, fatal=True, encoding=None, data=None, headers={}, query={}):
"""Return the xml as an xml.etree.ElementTree.Element"""
xml_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal, encoding=encoding, data=data, headers=headers, query=query)
@@ -504,7 +527,7 @@ class InfoExtractor(object):
note='Downloading JSON metadata',
errnote='Unable to download JSON metadata',
transform_source=None,
fatal=True, encoding=None, data=None, headers=None, query=None):
fatal=True, encoding=None, data=None, headers={}, query={}):
json_string = self._download_webpage(
url_or_request, video_id, note, errnote, fatal=fatal,
encoding=encoding, data=data, headers=headers, query=query)
@@ -819,7 +842,7 @@ class InfoExtractor(object):
for input in re.findall(r'(?i)<input([^>]+)>', html):
if not re.search(r'type=(["\'])(?:hidden|submit)\1', input):
continue
name = re.search(r'name=(["\'])(?P<value>.+?)\1', input)
name = re.search(r'(?:name|id)=(["\'])(?P<value>.+?)\1', input)
if not name:
continue
value = re.search(r'value=(["\'])(?P<value>.*?)\1', input)
@@ -989,6 +1012,11 @@ class InfoExtractor(object):
if not media_nodes:
manifest_version = '2.0'
media_nodes = manifest.findall('{http://ns.adobe.com/f4m/2.0}media')
# Remove unsupported DRM protected media from final formats
# rendition (see https://github.com/rg3/youtube-dl/issues/8573).
media_nodes = remove_encrypted_media(media_nodes)
if not media_nodes:
return formats
base_url = xpath_text(
manifest, ['{http://ns.adobe.com/f4m/1.0}baseURL', '{http://ns.adobe.com/f4m/2.0}baseURL'],
'base URL', default=None)
@@ -1021,8 +1049,6 @@ class InfoExtractor(object):
'height': int_or_none(media_el.attrib.get('height')),
'preference': preference,
})
self._sort_formats(formats)
return formats
def _extract_m3u8_formats(self, m3u8_url, video_id, ext=None,
@@ -1143,7 +1169,6 @@ class InfoExtractor(object):
last_media = None
formats.append(f)
last_info = {}
self._sort_formats(formats)
return formats
@staticmethod
@@ -1317,8 +1342,6 @@ class InfoExtractor(object):
})
continue
self._sort_formats(formats)
return formats
def _parse_smil_subtitles(self, smil, namespace=None, subtitles_lang='en'):
@@ -1329,7 +1352,7 @@ class InfoExtractor(object):
if not src or src in urls:
continue
urls.append(src)
ext = textstream.get('ext') or determine_ext(src) or mimetype2ext(textstream.get('type'))
ext = textstream.get('ext') or mimetype2ext(textstream.get('type')) or determine_ext(src)
lang = textstream.get('systemLanguage') or textstream.get('systemLanguageName') or textstream.get('lang') or subtitles_lang
subtitles.setdefault(lang, []).append({
'url': src,
@@ -1509,9 +1532,16 @@ class InfoExtractor(object):
representation_ms_info['total_number'] = int(math.ceil(float(period_duration) / segment_duration))
media_template = representation_ms_info['media_template']
media_template = media_template.replace('$RepresentationID$', representation_id)
media_template = re.sub(r'\$(Number|Bandwidth)(?:%(0\d+)d)?\$', r'%(\1)\2d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)\$', r'%(\1)d', media_template)
media_template = re.sub(r'\$(Number|Bandwidth)%([^$]+)\$', r'%(\1)\2', media_template)
media_template.replace('$$', '$')
representation_ms_info['segment_urls'] = [media_template % {'Number': segment_number, 'Bandwidth': representation_attrib.get('bandwidth')} for segment_number in range(representation_ms_info['start_number'], representation_ms_info['total_number'] + representation_ms_info['start_number'])]
representation_ms_info['segment_urls'] = [
media_template % {
'Number': segment_number,
'Bandwidth': representation_attrib.get('bandwidth')}
for segment_number in range(
representation_ms_info['start_number'],
representation_ms_info['total_number'] + representation_ms_info['start_number'])]
if 'segment_urls' in representation_ms_info:
f.update({
'segment_urls': representation_ms_info['segment_urls'],
@@ -1536,7 +1566,6 @@ class InfoExtractor(object):
existing_format.update(f)
else:
self.report_warning('Unknown MIME type %s in DASH manifest' % mime_type)
self._sort_formats(formats)
return formats
def _live_title(self, name):

View File

@@ -57,6 +57,7 @@ class CWTVIE(InfoExtractor):
formats = self._extract_m3u8_formats(
video_data['videos']['variantplaylist']['uri'], video_id, 'mp4')
self._sort_formats(formats)
thumbnails = [{
'url': image['uri'],

View File

@@ -41,7 +41,9 @@ class DeezerPlaylistIE(InfoExtractor):
'Deezer said: %s' % geoblocking_msg, expected=True)
data_json = self._search_regex(
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n', webpage, 'data JSON')
(r'__DZR_APP_STATE__\s*=\s*({.+?})\s*</script>',
r'naboo\.display\(\'[^\']+\',\s*(.*?)\);\n'),
webpage, 'data JSON')
data = json.loads(data_json)
playlist_title = data.get('DATA', {}).get('TITLE')

View File

@@ -17,37 +17,53 @@ class DemocracynowIE(InfoExtractor):
IE_NAME = 'democracynow'
_TESTS = [{
'url': 'http://www.democracynow.org/shows/2015/7/3',
'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
'md5': '3757c182d3d84da68f5c8f506c18c196',
'info_dict': {
'id': '2015-0703-001',
'ext': 'mp4',
'title': 'July 03, 2015 - Democracy Now!',
'description': 'A daily independent global news hour with Amy Goodman & Juan González "What to the Slave is 4th of July?": James Earl Jones Reads Frederick Douglass\u2019 Historic Speech : "This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag : "We Shall Overcome": Remembering Folk Icon, Activist Pete Seeger in His Own Words & Songs',
'title': 'Daily Show',
},
}, {
'url': 'http://www.democracynow.org/2015/7/3/this_flag_comes_down_today_bree',
'md5': 'fbb8fe3d7a56a5e12431ce2f9b2fab0d',
'info_dict': {
'id': '2015-0703-001',
'ext': 'mp4',
'title': '"This Flag Comes Down Today": Bree Newsome Scales SC Capitol Flagpole, Takes Down Confederate Flag',
'description': 'md5:4d2bc4f0d29f5553c2210a4bc7761a21',
},
'params': {
'skip_download': True,
},
}]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
description = self._og_search_description(webpage)
json_data = self._parse_json(self._search_regex(
r'<script[^>]+type="text/json"[^>]*>\s*({[^>]+})', webpage, 'json'),
display_id)
video_id = None
title = json_data['title']
formats = []
default_lang = 'en'
video_id = None
for key in ('file', 'audio', 'video', 'high_res_video'):
media_url = json_data.get(key, '')
if not media_url:
continue
media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
formats.append({
'url': media_url,
'vcodec': 'none' if key == 'audio' else None,
})
self._sort_formats(formats)
default_lang = 'en'
subtitles = {}
def add_subtitle_item(lang, info_dict):
@@ -67,22 +83,13 @@ class DemocracynowIE(InfoExtractor):
'url': compat_urlparse.urljoin(url, subtitle_item['url']),
})
for key in ('file', 'audio', 'video'):
media_url = json_data.get(key, '')
if not media_url:
continue
media_url = re.sub(r'\?.*', '', compat_urlparse.urljoin(url, media_url))
video_id = video_id or remove_start(os.path.splitext(url_basename(media_url))[0], 'dn')
formats.append({
'url': media_url,
})
self._sort_formats(formats)
description = self._og_search_description(webpage, default=None)
return {
'id': video_id or display_id,
'title': json_data['title'],
'title': title,
'description': description,
'thumbnail': json_data.get('image'),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -38,6 +38,7 @@ class DFBIE(InfoExtractor):
token_el = f4m_info.find('token')
manifest_url = token_el.attrib['url'] + '?' + 'hdnea=' + token_el.attrib['auth'] + '&hdcore=3.2.0'
formats = self._extract_f4m_formats(manifest_url, display_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -63,18 +63,23 @@ class DiscoveryIE(InfoExtractor):
video_title = info.get('playlist_title') or info.get('video_title')
entries = [{
'id': compat_str(video_info['id']),
'formats': self._extract_m3u8_formats(
entries = []
for idx, video_info in enumerate(info['playlist']):
formats = self._extract_m3u8_formats(
video_info['src'], display_id, 'mp4', 'm3u8_native', m3u8_id='hls',
note='Download m3u8 information for video %d' % (idx + 1)),
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
} for idx, video_info in enumerate(info['playlist'])]
note='Download m3u8 information for video %d' % (idx + 1))
self._sort_formats(formats)
entries.append({
'id': compat_str(video_info['id']),
'formats': formats,
'title': video_info['title'],
'description': video_info.get('description'),
'duration': parse_duration(video_info.get('video_length')),
'webpage_url': video_info.get('href') or video_info.get('url'),
'thumbnail': video_info.get('thumbnailURL'),
'alt_title': video_info.get('secondary_title'),
'timestamp': parse_iso8601(video_info.get('publishedDate')),
})
return self.playlist_result(entries, display_id, video_title)

View File

@@ -118,6 +118,8 @@ class DPlayIE(InfoExtractor):
if info.get(protocol):
extract_formats(protocol, info[protocol])
self._sort_formats(formats)
return {
'id': video_id,
'display_id': display_id,

View File

@@ -39,13 +39,13 @@ class DWIE(InfoExtractor):
hidden_inputs = self._hidden_inputs(webpage)
title = hidden_inputs['media_title']
formats = []
if hidden_inputs.get('player_type') == 'video' and hidden_inputs.get('stream_file') == '1':
formats = self._extract_smil_formats(
'http://www.dw.com/smil/v-%s' % media_id, media_id,
transform_source=lambda s: s.replace(
'rtmp://tv-od.dw.de/flash/',
'http://tv-download.dw.de/dwtv_video/flv/'))
self._sort_formats(formats)
else:
formats = [{'url': hidden_inputs['file_name']}]

View File

@@ -4,10 +4,10 @@ from .common import InfoExtractor
class EbaumsWorldIE(InfoExtractor):
_VALID_URL = r'https?://www\.ebaumsworld\.com/video/watch/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?ebaumsworld\.com/videos/[^/]+/(?P<id>\d+)'
_TEST = {
'url': 'http://www.ebaumsworld.com/video/watch/83367677/',
'url': 'http://www.ebaumsworld.com/videos/a-giant-python-opens-the-door/83367677/',
'info_dict': {
'id': '83367677',
'ext': 'mp4',

View File

@@ -0,0 +1,992 @@
# flake8: noqa
from __future__ import unicode_literals
from .abc import ABCIE
from .abc7news import Abc7NewsIE
from .academicearth import AcademicEarthCourseIE
from .acast import (
ACastIE,
ACastChannelIE,
)
from .addanime import AddAnimeIE
from .adobetv import (
AdobeTVIE,
AdobeTVShowIE,
AdobeTVChannelIE,
AdobeTVVideoIE,
)
from .adultswim import AdultSwimIE
from .aenetworks import AENetworksIE
from .aftonbladet import AftonbladetIE
from .airmozilla import AirMozillaIE
from .aljazeera import AlJazeeraIE
from .alphaporno import AlphaPornoIE
from .animeondemand import AnimeOnDemandIE
from .anitube import AnitubeIE
from .anysex import AnySexIE
from .aol import (
AolIE,
AolFeaturesIE,
)
from .allocine import AllocineIE
from .aparat import AparatIE
from .appleconnect import AppleConnectIE
from .appletrailers import (
AppleTrailersIE,
AppleTrailersSectionIE,
)
from .archiveorg import ArchiveOrgIE
from .ard import (
ARDIE,
ARDMediathekIE,
SportschauIE,
)
from .arte import (
ArteTvIE,
ArteTVPlus7IE,
ArteTVCreativeIE,
ArteTVConcertIE,
ArteTVFutureIE,
ArteTVCinemaIE,
ArteTVDDCIE,
ArteTVMagazineIE,
ArteTVEmbedIE,
)
from .atresplayer import AtresPlayerIE
from .atttechchannel import ATTTechChannelIE
from .audimedia import AudiMediaIE
from .audioboom import AudioBoomIE
from .audiomack import AudiomackIE, AudiomackAlbumIE
from .azubu import AzubuIE, AzubuLiveIE
from .baidu import BaiduVideoIE
from .bambuser import BambuserIE, BambuserChannelIE
from .bandcamp import BandcampIE, BandcampAlbumIE
from .bbc import (
BBCCoUkIE,
BBCCoUkArticleIE,
BBCIE,
)
from .beeg import BeegIE
from .behindkink import BehindKinkIE
from .beatportpro import BeatportProIE
from .bet import BetIE
from .bigflix import BigflixIE
from .bild import BildIE
from .bilibili import BiliBiliIE
from .biobiochiletv import BioBioChileTVIE
from .bleacherreport import (
BleacherReportIE,
BleacherReportCMSIE,
)
from .blinkx import BlinkxIE
from .bloomberg import BloombergIE
from .bokecc import BokeCCIE
from .bpb import BpbIE
from .br import BRIE
from .bravotv import BravoTVIE
from .breakcom import BreakIE
from .brightcove import (
BrightcoveLegacyIE,
BrightcoveNewIE,
)
from .buzzfeed import BuzzFeedIE
from .byutv import BYUtvIE
from .c56 import C56IE
from .camdemy import (
CamdemyIE,
CamdemyFolderIE
)
from .camwithher import CamWithHerIE
from .canalplus import CanalplusIE
from .canalc2 import Canalc2IE
from .canvas import CanvasIE
from .cbc import (
CBCIE,
CBCPlayerIE,
)
from .cbs import CBSIE
from .cbsinteractive import CBSInteractiveIE
from .cbsnews import (
CBSNewsIE,
CBSNewsLiveVideoIE,
)
from .cbssports import CBSSportsIE
from .ccc import CCCIE
from .cda import CDAIE
from .ceskatelevize import CeskaTelevizeIE
from .channel9 import Channel9IE
from .chaturbate import ChaturbateIE
from .chilloutzone import ChilloutzoneIE
from .chirbit import (
ChirbitIE,
ChirbitProfileIE,
)
from .cinchcast import CinchcastIE
from .cinemassacre import CinemassacreIE
from .cliprs import ClipRsIE
from .clipfish import ClipfishIE
from .cliphunter import CliphunterIE
from .clipsyndicate import ClipsyndicateIE
from .cloudy import CloudyIE
from .clubic import ClubicIE
from .clyp import ClypIE
from .cmt import CMTIE
from .cnbc import CNBCIE
from .cnn import (
CNNIE,
CNNBlogsIE,
CNNArticleIE,
)
from .collegehumor import CollegeHumorIE
from .collegerama import CollegeRamaIE
from .comedycentral import ComedyCentralIE, ComedyCentralShowsIE
from .comcarcoff import ComCarCoffIE
from .commonmistakes import CommonMistakesIE, UnicodeBOMIE
from .commonprotocols import RtmpIE
from .condenast import CondeNastIE
from .cracked import CrackedIE
from .crackle import CrackleIE
from .criterion import CriterionIE
from .crooksandliars import CrooksAndLiarsIE
from .crunchyroll import (
CrunchyrollIE,
CrunchyrollShowPlaylistIE
)
from .cspan import CSpanIE
from .ctsnews import CtsNewsIE
from .cultureunplugged import CultureUnpluggedIE
from .cwtv import CWTVIE
from .dailymotion import (
DailymotionIE,
DailymotionPlaylistIE,
DailymotionUserIE,
DailymotionCloudIE,
)
from .daum import (
DaumIE,
DaumClipIE,
DaumPlaylistIE,
DaumUserIE,
)
from .dbtv import DBTVIE
from .dcn import (
DCNIE,
DCNVideoIE,
DCNLiveIE,
DCNSeasonIE,
)
from .dctp import DctpTvIE
from .deezer import DeezerPlaylistIE
from .democracynow import DemocracynowIE
from .dfb import DFBIE
from .dhm import DHMIE
from .dotsub import DotsubIE
from .douyutv import DouyuTVIE
from .dplay import DPlayIE
from .dramafever import (
DramaFeverIE,
DramaFeverSeriesIE,
)
from .dreisat import DreiSatIE
from .drbonanza import DRBonanzaIE
from .drtuber import DrTuberIE
from .drtv import DRTVIE
from .dvtv import DVTVIE
from .dump import DumpIE
from .dumpert import DumpertIE
from .defense import DefenseGouvFrIE
from .discovery import DiscoveryIE
from .dropbox import DropboxIE
from .dw import (
DWIE,
DWArticleIE,
)
from .eagleplatform import EaglePlatformIE
from .ebaumsworld import EbaumsWorldIE
from .echomsk import EchoMskIE
from .ehow import EHowIE
from .eighttracks import EightTracksIE
from .einthusan import EinthusanIE
from .eitb import EitbIE
from .ellentv import (
EllenTVIE,
EllenTVClipsIE,
)
from .elpais import ElPaisIE
from .embedly import EmbedlyIE
from .engadget import EngadgetIE
from .eporner import EpornerIE
from .eroprofile import EroProfileIE
from .escapist import EscapistIE
from .espn import ESPNIE
from .esri import EsriVideoIE
from .europa import EuropaIE
from .everyonesmixtape import EveryonesMixtapeIE
from .exfm import ExfmIE
from .expotv import ExpoTVIE
from .extremetube import ExtremeTubeIE
from .facebook import FacebookIE
from .faz import FazIE
from .fc2 import FC2IE
from .fczenit import FczenitIE
from .firstpost import FirstpostIE
from .firsttv import FirstTVIE
from .fivemin import FiveMinIE
from .fivetv import FiveTVIE
from .fktv import FKTVIE
from .flickr import FlickrIE
from .folketinget import FolketingetIE
from .footyroom import FootyRoomIE
from .fourtube import FourTubeIE
from .fox import FOXIE
from .foxgay import FoxgayIE
from .foxnews import FoxNewsIE
from .foxsports import FoxSportsIE
from .franceculture import (
FranceCultureIE,
FranceCultureEmissionIE,
)
from .franceinter import FranceInterIE
from .francetv import (
PluzzIE,
FranceTvInfoIE,
FranceTVIE,
GenerationQuoiIE,
CultureboxIE,
)
from .freesound import FreesoundIE
from .freespeech import FreespeechIE
from .freevideo import FreeVideoIE
from .funimation import FunimationIE
from .funnyordie import FunnyOrDieIE
from .gameinformer import GameInformerIE
from .gamekings import GamekingsIE
from .gameone import (
GameOneIE,
GameOnePlaylistIE,
)
from .gamersyde import GamersydeIE
from .gamespot import GameSpotIE
from .gamestar import GameStarIE
from .gametrailers import GametrailersIE
from .gazeta import GazetaIE
from .gdcvault import GDCVaultIE
from .generic import GenericIE
from .gfycat import GfycatIE
from .giantbomb import GiantBombIE
from .giga import GigaIE
from .glide import GlideIE
from .globo import (
GloboIE,
GloboArticleIE,
)
from .godtube import GodTubeIE
from .goldenmoustache import GoldenMoustacheIE
from .golem import GolemIE
from .googledrive import GoogleDriveIE
from .googleplus import GooglePlusIE
from .googlesearch import GoogleSearchIE
from .goshgay import GoshgayIE
from .gputechconf import GPUTechConfIE
from .groupon import GrouponIE
from .hark import HarkIE
from .hbo import HBOIE
from .hearthisat import HearThisAtIE
from .heise import HeiseIE
from .hellporno import HellPornoIE
from .helsinki import HelsinkiIE
from .hentaistigma import HentaiStigmaIE
from .historicfilms import HistoricFilmsIE
from .hitbox import HitboxIE, HitboxLiveIE
from .hornbunny import HornBunnyIE
from .hotnewhiphop import HotNewHipHopIE
from .hotstar import HotStarIE
from .howcast import HowcastIE
from .howstuffworks import HowStuffWorksIE
from .huffpost import HuffPostIE
from .hypem import HypemIE
from .iconosquare import IconosquareIE
from .ign import (
IGNIE,
OneUPIE,
PCMagIE,
)
from .imdb import (
ImdbIE,
ImdbListIE
)
from .imgur import (
ImgurIE,
ImgurAlbumIE,
)
from .ina import InaIE
from .indavideo import (
IndavideoIE,
IndavideoEmbedIE,
)
from .infoq import InfoQIE
from .instagram import InstagramIE, InstagramUserIE
from .internetvideoarchive import InternetVideoArchiveIE
from .iprima import IPrimaIE
from .iqiyi import IqiyiIE
from .ir90tv import Ir90TvIE
from .ivi import (
IviIE,
IviCompilationIE
)
from .ivideon import IvideonIE
from .izlesene import IzleseneIE
from .jeuxvideo import JeuxVideoIE
from .jove import JoveIE
from .jwplatform import JWPlatformIE
from .jpopsukitv import JpopsukiIE
from .kaltura import KalturaIE
from .kanalplay import KanalPlayIE
from .kankan import KankanIE
from .karaoketv import KaraoketvIE
from .karrierevideos import KarriereVideosIE
from .keezmovies import KeezMoviesIE
from .khanacademy import KhanAcademyIE
from .kickstarter import KickStarterIE
from .keek import KeekIE
from .konserthusetplay import KonserthusetPlayIE
from .kontrtube import KontrTubeIE
from .krasview import KrasViewIE
from .ku6 import Ku6IE
from .kusi import KUSIIE
from .kuwo import (
KuwoIE,
KuwoAlbumIE,
KuwoChartIE,
KuwoSingerIE,
KuwoCategoryIE,
KuwoMvIE,
)
from .la7 import LA7IE
from .laola1tv import Laola1TvIE
from .lecture2go import Lecture2GoIE
from .lemonde import LemondeIE
from .leeco import (
LeIE,
LePlaylistIE,
LetvCloudIE,
)
from .libsyn import LibsynIE
from .lifenews import (
LifeNewsIE,
LifeEmbedIE,
)
from .limelight import (
LimelightMediaIE,
LimelightChannelIE,
LimelightChannelListIE,
)
from .liveleak import LiveLeakIE
from .livestream import (
LivestreamIE,
LivestreamOriginalIE,
LivestreamShortenerIE,
)
from .lnkgo import LnkGoIE
from .lovehomeporn import LoveHomePornIE
from .lrt import LRTIE
from .lynda import (
LyndaIE,
LyndaCourseIE
)
from .m6 import M6IE
from .macgamestore import MacGameStoreIE
from .mailru import MailRuIE
from .makerschannel import MakersChannelIE
from .makertv import MakerTVIE
from .malemotion import MalemotionIE
from .matchtv import MatchTVIE
from .mdr import MDRIE
from .metacafe import MetacafeIE
from .metacritic import MetacriticIE
from .mgoon import MgoonIE
from .minhateca import MinhatecaIE
from .ministrygrid import MinistryGridIE
from .minoto import MinotoIE
from .miomio import MioMioIE
from .mit import TechTVMITIE, MITIE, OCWMITIE
from .mitele import MiTeleIE
from .mixcloud import MixcloudIE
from .mlb import MLBIE
from .mnet import MnetIE
from .mpora import MporaIE
from .moevideo import MoeVideoIE
from .mofosex import MofosexIE
from .mojvideo import MojvideoIE
from .moniker import MonikerIE
from .mooshare import MooshareIE
from .morningstar import MorningstarIE
from .motherless import MotherlessIE
from .motorsport import MotorsportIE
from .movieclips import MovieClipsIE
from .moviezine import MoviezineIE
from .mtv import (
MTVIE,
MTVServicesEmbeddedIE,
MTVIggyIE,
MTVDEIE,
)
from .muenchentv import MuenchenTVIE
from .musicplayon import MusicPlayOnIE
from .muzu import MuzuTVIE
from .mwave import MwaveIE
from .myspace import MySpaceIE, MySpaceAlbumIE
from .myspass import MySpassIE
from .myvi import MyviIE
from .myvideo import MyVideoIE
from .myvidster import MyVidsterIE
from .nationalgeographic import (
NationalGeographicIE,
NationalGeographicChannelIE,
)
from .naver import NaverIE
from .nba import NBAIE
from .nbc import (
CSNNEIE,
NBCIE,
NBCNewsIE,
NBCSportsIE,
NBCSportsVPlayerIE,
MSNBCIE,
)
from .ndr import (
NDRIE,
NJoyIE,
NDREmbedBaseIE,
NDREmbedIE,
NJoyEmbedIE,
)
from .ndtv import NDTVIE
from .netzkino import NetzkinoIE
from .nerdcubed import NerdCubedFeedIE
from .nerdist import NerdistIE
from .neteasemusic import (
NetEaseMusicIE,
NetEaseMusicAlbumIE,
NetEaseMusicSingerIE,
NetEaseMusicListIE,
NetEaseMusicMvIE,
NetEaseMusicProgramIE,
NetEaseMusicDjRadioIE,
)
from .newgrounds import NewgroundsIE
from .newstube import NewstubeIE
from .nextmedia import (
NextMediaIE,
NextMediaActionNewsIE,
AppleDailyIE,
)
from .nextmovie import NextMovieIE
from .nfb import NFBIE
from .nfl import NFLIE
from .nhl import (
NHLIE,
NHLNewsIE,
NHLVideocenterIE,
)
from .nick import NickIE
from .niconico import NiconicoIE, NiconicoPlaylistIE
from .ninegag import NineGagIE
from .noco import NocoIE
from .normalboots import NormalbootsIE
from .nosvideo import NosVideoIE
from .nova import NovaIE
from .novamov import (
AuroraVidIE,
CloudTimeIE,
NowVideoIE,
VideoWeedIE,
WholeCloudIE,
)
from .nowness import (
NownessIE,
NownessPlaylistIE,
NownessSeriesIE,
)
from .nowtv import (
NowTVIE,
NowTVListIE,
)
from .noz import NozIE
from .npo import (
NPOIE,
NPOLiveIE,
NPORadioIE,
NPORadioFragmentIE,
SchoolTVIE,
VPROIE,
WNLIE
)
from .npr import NprIE
from .nrk import (
NRKIE,
NRKPlaylistIE,
NRKSkoleIE,
NRKTVIE,
)
from .ntvde import NTVDeIE
from .ntvru import NTVRuIE
from .nytimes import (
NYTimesIE,
NYTimesArticleIE,
)
from .nuvid import NuvidIE
from .odnoklassniki import OdnoklassnikiIE
from .oktoberfesttv import OktoberfestTVIE
from .onionstudios import OnionStudiosIE
from .ooyala import (
OoyalaIE,
OoyalaExternalIE,
)
from .openload import OpenloadIE
from .ora import OraTVIE
from .orf import (
ORFTVthekIE,
ORFOE1IE,
ORFFM4IE,
ORFIPTVIE,
)
from .pandoratv import PandoraTVIE
from .parliamentliveuk import ParliamentLiveUKIE
from .patreon import PatreonIE
from .pbs import PBSIE
from .periscope import PeriscopeIE
from .philharmoniedeparis import PhilharmonieDeParisIE
from .phoenix import PhoenixIE
from .photobucket import PhotobucketIE
from .pinkbike import PinkbikeIE
from .planetaplay import PlanetaPlayIE
from .pladform import PladformIE
from .played import PlayedIE
from .playfm import PlayFMIE
from .plays import PlaysTVIE
from .playtvak import PlaytvakIE
from .playvid import PlayvidIE
from .playwire import PlaywireIE
from .pluralsight import (
PluralsightIE,
PluralsightCourseIE,
)
from .podomatic import PodomaticIE
from .porn91 import Porn91IE
from .pornhd import PornHdIE
from .pornhub import (
PornHubIE,
PornHubPlaylistIE,
PornHubUserVideosIE,
)
from .pornotube import PornotubeIE
from .pornovoisines import PornoVoisinesIE
from .pornoxo import PornoXOIE
from .presstv import PressTVIE
from .primesharetv import PrimeShareTVIE
from .promptfile import PromptFileIE
from .prosiebensat1 import ProSiebenSat1IE
from .puls4 import Puls4IE
from .pyvideo import PyvideoIE
from .qqmusic import (
QQMusicIE,
QQMusicSingerIE,
QQMusicAlbumIE,
QQMusicToplistIE,
QQMusicPlaylistIE,
)
from .quickvid import QuickVidIE
from .r7 import R7IE
from .radiode import RadioDeIE
from .radiojavan import RadioJavanIE
from .radiobremen import RadioBremenIE
from .radiofrance import RadioFranceIE
from .rai import (
RaiTVIE,
RaiIE,
)
from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redtube import RedTubeIE
from .regiotv import RegioTVIE
from .restudy import RestudyIE
from .reverbnation import ReverbNationIE
from .revision3 import Revision3IE
from .rice import RICEIE
from .ringtv import RingTVIE
from .ro220 import Ro220IE
from .rottentomatoes import RottenTomatoesIE
from .roxwel import RoxwelIE
from .rtbf import RTBFIE
from .rte import RteIE, RteRadioIE
from .rtlnl import RtlNlIE
from .rtl2 import RTL2IE
from .rtp import RTPIE
from .rts import RTSIE
from .rtve import RTVEALaCartaIE, RTVELiveIE, RTVEInfantilIE
from .rtvnh import RTVNHIE
from .ruhd import RUHDIE
from .ruleporn import RulePornIE
from .rutube import (
RutubeIE,
RutubeChannelIE,
RutubeEmbedIE,
RutubeMovieIE,
RutubePersonIE,
)
from .rutv import RUTVIE
from .ruutu import RuutuIE
from .sandia import SandiaIE
from .safari import (
SafariIE,
SafariApiIE,
SafariCourseIE,
)
from .sapo import SapoIE
from .savefrom import SaveFromIE
from .sbs import SBSIE
from .scivee import SciVeeIE
from .screencast import ScreencastIE
from .screencastomatic import ScreencastOMaticIE
from .screenjunkies import ScreenJunkiesIE
from .screenwavemedia import ScreenwaveMediaIE, TeamFourIE
from .senateisvp import SenateISVPIE
from .servingsys import ServingSysIE
from .sexu import SexuIE
from .sexykarma import SexyKarmaIE
from .shahid import ShahidIE
from .shared import SharedIE
from .sharesix import ShareSixIE
from .sina import SinaIE
from .skynewsarabia import (
SkyNewsArabiaIE,
SkyNewsArabiaArticleIE,
)
from .slideshare import SlideshareIE
from .slutload import SlutloadIE
from .smotri import (
SmotriIE,
SmotriCommunityIE,
SmotriUserIE,
SmotriBroadcastIE,
)
from .snagfilms import (
SnagFilmsIE,
SnagFilmsEmbedIE,
)
from .snotr import SnotrIE
from .sohu import SohuIE
from .soundcloud import (
SoundcloudIE,
SoundcloudSetIE,
SoundcloudUserIE,
SoundcloudPlaylistIE,
SoundcloudSearchIE
)
from .soundgasm import (
SoundgasmIE,
SoundgasmProfileIE
)
from .southpark import (
SouthParkIE,
SouthParkDeIE,
SouthParkDkIE,
SouthParkEsIE,
SouthParkNlIE
)
from .spankbang import SpankBangIE
from .spankwire import SpankwireIE
from .spiegel import SpiegelIE, SpiegelArticleIE
from .spiegeltv import SpiegeltvIE
from .spike import SpikeIE
from .stitcher import StitcherIE
from .sport5 import Sport5IE
from .sportbox import (
SportBoxIE,
SportBoxEmbedIE,
)
from .sportdeutschland import SportDeutschlandIE
from .srgssr import (
SRGSSRIE,
SRGSSRPlayIE,
)
from .srmediathek import SRMediathekIE
from .ssa import SSAIE
from .stanfordoc import StanfordOpenClassroomIE
from .steam import SteamIE
from .streamcloud import StreamcloudIE
from .streamcz import StreamCZIE
from .streetvoice import StreetVoiceIE
from .sunporno import SunPornoIE
from .svt import (
SVTIE,
SVTPlayIE,
)
from .swrmediathek import SWRMediathekIE
from .syfy import SyfyIE
from .sztvhu import SztvHuIE
from .tagesschau import TagesschauIE
from .tapely import TapelyIE
from .tass import TassIE
from .teachertube import (
TeacherTubeIE,
TeacherTubeUserIE,
)
from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE
from .techtalks import TechTalksIE
from .ted import TEDIE
from .tele13 import Tele13IE
from .telebruxelles import TeleBruxellesIE
from .telecinco import TelecincoIE
from .telegraaf import TelegraafIE
from .telemb import TeleMBIE
from .teletask import TeleTaskIE
from .testurl import TestURLIE
from .tf1 import TF1IE
from .theintercept import TheInterceptIE
from .theonion import TheOnionIE
from .theplatform import (
ThePlatformIE,
ThePlatformFeedIE,
)
from .thescene import TheSceneIE
from .thesixtyone import TheSixtyOneIE
from .thestar import TheStarIE
from .thisamericanlife import ThisAmericanLifeIE
from .thisav import ThisAVIE
from .tinypic import TinyPicIE
from .tlc import TlcDeIE
from .tmz import (
TMZIE,
TMZArticleIE,
)
from .tnaflix import (
TNAFlixNetworkEmbedIE,
TNAFlixIE,
EMPFlixIE,
MovieFapIE,
)
from .toggle import ToggleIE
from .thvideo import (
THVideoIE,
THVideoPlaylistIE
)
from .toutv import TouTvIE
from .toypics import ToypicsUserIE, ToypicsIE
from .traileraddict import TrailerAddictIE
from .trilulilu import TriluliluIE
from .trollvids import TrollvidsIE
from .trutube import TruTubeIE
from .tube8 import Tube8IE
from .tubitv import TubiTvIE
from .tudou import (
TudouIE,
TudouPlaylistIE,
TudouAlbumIE,
)
from .tumblr import TumblrIE
from .tunein import (
TuneInClipIE,
TuneInStationIE,
TuneInProgramIE,
TuneInTopicIE,
TuneInShortenerIE,
)
from .turbo import TurboIE
from .tutv import TutvIE
from .tv2 import (
TV2IE,
TV2ArticleIE,
)
from .tv3 import TV3IE
from .tv4 import TV4IE
from .tvc import (
TVCIE,
TVCArticleIE,
)
from .tvigle import TvigleIE
from .tvland import TVLandIE
from .tvp import TvpIE, TvpSeriesIE
from .tvplay import TVPlayIE
from .tweakers import TweakersIE
from .twentyfourvideo import TwentyFourVideoIE
from .twentymin import TwentyMinutenIE
from .twentytwotracks import (
TwentyTwoTracksIE,
TwentyTwoTracksGenreIE
)
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchProfileIE,
TwitchPastBroadcastsIE,
TwitchBookmarksIE,
TwitchStreamIE,
)
from .twitter import (
TwitterCardIE,
TwitterIE,
TwitterAmplifyIE,
)
from .ubu import UbuIE
from .udemy import (
UdemyIE,
UdemyCourseIE
)
from .udn import UDNEmbedIE
from .digiteka import DigitekaIE
from .unistra import UnistraIE
from .urort import UrortIE
from .usatoday import USATodayIE
from .ustream import UstreamIE, UstreamChannelIE
from .ustudio import UstudioIE
from .varzesh3 import Varzesh3IE
from .vbox7 import Vbox7IE
from .veehd import VeeHDIE
from .veoh import VeohIE
from .vessel import VesselIE
from .vesti import VestiIE
from .vevo import VevoIE
from .vgtv import (
BTArticleIE,
BTVestlendingenIE,
VGTVIE,
)
from .vh1 import VH1IE
from .vice import (
ViceIE,
ViceShowIE,
)
from .viddler import ViddlerIE
from .videodetective import VideoDetectiveIE
from .videofyme import VideofyMeIE
from .videomega import VideoMegaIE
from .videomore import (
VideomoreIE,
VideomoreVideoIE,
VideomoreSeasonIE,
)
from .videopremium import VideoPremiumIE
from .videott import VideoTtIE
from .vidme import (
VidmeIE,
VidmeUserIE,
VidmeUserLikesIE,
)
from .vidzi import VidziIE
from .vier import VierIE, VierVideosIE
from .viewster import ViewsterIE
from .viidea import ViideaIE
from .vimeo import (
VimeoIE,
VimeoAlbumIE,
VimeoChannelIE,
VimeoGroupsIE,
VimeoLikesIE,
VimeoOndemandIE,
VimeoReviewIE,
VimeoUserIE,
VimeoWatchLaterIE,
)
from .vimple import VimpleIE
from .vine import (
VineIE,
VineUserIE,
)
from .viki import (
VikiIE,
VikiChannelIE,
)
from .vk import (
VKIE,
VKUserVideosIE,
)
from .vlive import VLiveIE
from .vodlocker import VodlockerIE
from .voicerepublic import VoiceRepublicIE
from .voxmedia import VoxMediaIE
from .vporn import VpornIE
from .vrt import VRTIE
from .vube import VubeIE
from .vuclip import VuClipIE
from .vulture import VultureIE
from .walla import WallaIE
from .washingtonpost import WashingtonPostIE
from .wat import WatIE
from .wayofthemaster import WayOfTheMasterIE
from .wdr import (
WDRIE,
WDRMobileIE,
WDRMausIE,
)
from .webofstories import (
WebOfStoriesIE,
WebOfStoriesPlaylistIE,
)
from .weibo import WeiboIE
from .weiqitv import WeiqiTVIE
from .wimp import WimpIE
from .wistia import WistiaIE
from .worldstarhiphop import WorldStarHipHopIE
from .wrzuta import WrzutaIE
from .wsj import WSJIE
from .xbef import XBefIE
from .xboxclips import XboxClipsIE
from .xfileshare import XFileShareIE
from .xhamster import (
XHamsterIE,
XHamsterEmbedIE,
)
from .xminus import XMinusIE
from .xnxx import XNXXIE
from .xstream import XstreamIE
from .xtube import XTubeUserIE, XTubeIE
from .xuite import XuiteIE
from .xvideos import XVideosIE
from .xxxymovies import XXXYMoviesIE
from .yahoo import (
YahooIE,
YahooSearchIE,
)
from .yam import YamIE
from .yandexmusic import (
YandexMusicTrackIE,
YandexMusicAlbumIE,
YandexMusicPlaylistIE,
)
from .yesjapan import YesJapanIE
from .yinyuetai import YinYueTaiIE
from .ynet import YnetIE
from .youjizz import YouJizzIE
from .youku import YoukuIE
from .youporn import YouPornIE
from .yourupload import YourUploadIE
from .youtube import (
YoutubeIE,
YoutubeChannelIE,
YoutubeFavouritesIE,
YoutubeHistoryIE,
YoutubeLiveIE,
YoutubePlaylistIE,
YoutubePlaylistsIE,
YoutubeRecommendedIE,
YoutubeSearchDateIE,
YoutubeSearchIE,
YoutubeSearchURLIE,
YoutubeShowIE,
YoutubeSubscriptionsIE,
YoutubeTruncatedIDIE,
YoutubeTruncatedURLIE,
YoutubeUserIE,
YoutubeWatchLaterIE,
)
from .zapiks import ZapiksIE
from .zdf import ZDFIE, ZDFChannelIE
from .zingmp3 import (
ZingMp3SongIE,
ZingMp3AlbumIE,
)
from .zippcast import ZippCastIE

View File

@@ -2,78 +2,133 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import int_or_none
from ..compat import compat_xpath
from ..utils import (
int_or_none,
qualities,
unified_strdate,
xpath_attr,
xpath_element,
xpath_text,
xpath_with_ns,
)
class FirstTVIE(InfoExtractor):
IE_NAME = '1tv'
IE_DESC = 'Первый канал'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+(?P<id>.+)'
_VALID_URL = r'https?://(?:www\.)?1tv\.ru/(?:[^/]+/)+p?(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.1tv.ru/videoarchive/73390',
'md5': '777f525feeec4806130f4f764bc18a4f',
'info_dict': {
'id': '73390',
'ext': 'mp4',
'title': 'Олимпийские канатные дороги',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'duration': 149,
'like_count': int,
'dislike_count': int,
},
'skip': 'Only works from Russia',
}, {
# single format via video_materials.json API
'url': 'http://www.1tv.ru/prj/inprivate/vypusk/35930',
'md5': 'a1b6b60d530ebcf8daacf4565762bbaf',
'md5': '82a2777648acae812d58b3f5bd42882b',
'info_dict': {
'id': '35930',
'ext': 'mp4',
'title': 'Наедине со всеми. Людмила Сенчина',
'description': 'md5:89553aed1d641416001fe8d450f06cb9',
'title': 'Гость Людмила Сенчина. Наедине со всеми. Выпуск от 12.02.2015',
'description': 'md5:357933adeede13b202c7c21f91b871b2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20150212',
'duration': 2694,
},
'skip': 'Only works from Russia',
}, {
# multiple formats via video_materials.json API
'url': 'http://www.1tv.ru/video_archive/projects/dobroeutro/p113641',
'info_dict': {
'id': '113641',
'ext': 'mp4',
'title': 'Весенняя аллергия. Доброе утро. Фрагмент выпуска от 07.04.2016',
'description': 'md5:8dcebb3dded0ff20fade39087fd1fee2',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20160407',
'duration': 179,
'formats': 'mincount:3',
},
'params': {
'skip_download': True,
},
}, {
# single format only available via ONE_ONLINE_VIDEOS.archive_single_xml API
'url': 'http://www.1tv.ru/video_archive/series/f7552/p47038',
'md5': '519d306c5b5669761fd8906c39dbee23',
'info_dict': {
'id': '47038',
'ext': 'mp4',
'title': '"Побег". Второй сезон. 3 серия',
'description': 'md5:3abf8f6b9bce88201c33e9a3d794a00b',
'thumbnail': 're:^https?://.*\.(?:jpg|JPG)$',
'upload_date': '20120516',
'duration': 3080,
},
}, {
'url': 'http://www.1tv.ru/videoarchive/9967',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id, 'Downloading page')
# Videos with multiple formats only available via this API
video = self._download_json(
'http://www.1tv.ru/video_materials.json?legacy_id=%s' % video_id,
video_id, fatal=False)
video_url = self._html_search_regex(
r'''(?s)(?:jwplayer\('flashvideoportal_1'\)\.setup\({|var\s+playlistObj\s*=).*?'file'\s*:\s*'([^']+)'.*?}\);''',
webpage, 'video URL')
description, thumbnail, upload_date, duration = [None] * 4
title = self._html_search_regex(
[r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"], webpage, 'title')
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
if video:
item = video[0]
title = item['title']
quality = qualities(('ld', 'sd', 'hd', ))
formats = [{
'url': f['src'],
'format_id': f.get('name'),
'quality': quality(f.get('name')),
} for f in item['mbr'] if f.get('src')]
thumbnail = item.get('poster')
else:
# Some videos are not available via video_materials.json
video = self._download_xml(
'http://www.1tv.ru/owa/win/ONE_ONLINE_VIDEOS.archive_single_xml?pid=%s' % video_id,
video_id)
NS_MAP = {
'media': 'http://search.yahoo.com/mrss/',
}
item = xpath_element(video, './channel/item', fatal=True)
title = xpath_text(item, './title', fatal=True)
formats = [{
'url': content.attrib['url'],
} for content in item.findall(
compat_xpath(xpath_with_ns('./media:content', NS_MAP))) if content.attrib.get('url')]
thumbnail = xpath_attr(
item, xpath_with_ns('./media:thumbnail', NS_MAP), 'url')
self._sort_formats(formats)
webpage = self._download_webpage(url, video_id, 'Downloading page', fatal=False)
if webpage:
title = self._html_search_regex(
(r'<div class="tv_translation">\s*<h1><a href="[^"]+">([^<]*)</a>',
r"'title'\s*:\s*'([^']+)'"),
webpage, 'title', default=None) or title
description = self._html_search_regex(
r'<div class="descr">\s*<div>&nbsp;</div>\s*<p>([^<]*)</p></div>',
webpage, 'description', default=None) or self._html_search_meta(
'description', webpage, 'description')
thumbnail = self._og_search_thumbnail(webpage)
duration = self._og_search_property(
'video:duration', webpage,
'video duration', fatal=False)
like_count = self._html_search_regex(
r'title="Понравилось".*?/></label> \[(\d+)\]',
webpage, 'like count', default=None)
dislike_count = self._html_search_regex(
r'title="Не понравилось".*?/></label> \[(\d+)\]',
webpage, 'dislike count', default=None)
thumbnail = thumbnail or self._og_search_thumbnail(webpage)
duration = int_or_none(self._html_search_meta(
'video:duration', webpage, 'video duration', fatal=False))
upload_date = unified_strdate(self._html_search_meta(
'ya:ovs:upload_date', webpage, 'upload date', fatal=False))
return {
'id': video_id,
'url': video_url,
'thumbnail': thumbnail,
'title': title,
'description': description,
'upload_date': upload_date,
'duration': int_or_none(duration),
'like_count': int_or_none(like_count),
'dislike_count': int_or_none(dislike_count),
'formats': formats
}

View File

@@ -16,6 +16,9 @@ class FOXIE(InfoExtractor):
'title': 'Official Trailer: Gotham',
'description': 'Tracing the rise of the great DC Comics Super-Villains and vigilantes, Gotham reveals an entirely new chapter that has never been told.',
'duration': 129,
'timestamp': 1400020798,
'upload_date': '20140513',
'uploader': 'NEWA-FNG-FOXCOM',
},
'add_ie': ['ThePlatform'],
}

View File

@@ -18,8 +18,8 @@ class FoxNewsIE(AMPIE):
'title': 'Frozen in Time',
'description': '16-year-old girl is size of toddler',
'duration': 265,
# 'timestamp': 1304411491,
# 'upload_date': '20110503',
'timestamp': 1304411491,
'upload_date': '20110503',
'thumbnail': 're:^https?://.*\.jpg$',
},
},
@@ -32,8 +32,8 @@ class FoxNewsIE(AMPIE):
'title': "Rep. Luis Gutierrez on if Obama's immigration plan is legal",
'description': "Congressman discusses president's plan",
'duration': 292,
# 'timestamp': 1417662047,
# 'upload_date': '20141204',
'timestamp': 1417662047,
'upload_date': '20141204',
'thumbnail': 're:^https?://.*\.jpg$',
},
'params': {

View File

@@ -46,8 +46,8 @@ class FunnyOrDieIE(InfoExtractor):
links.sort(key=lambda link: 1 if link[1] == 'mp4' else 0)
m3u8_url = self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8)\1',
webpage, 'm3u8 url', default=None, group='url')
r'<source[^>]+src=(["\'])(?P<url>.+?/master\.m3u8[^"\']*)\1',
webpage, 'm3u8 url', group='url')
formats = []

View File

@@ -159,9 +159,10 @@ class GDCVaultIE(InfoExtractor):
'title': title,
}
PLAYER_REGEX = r'<iframe src="(?P<xml_root>.+?)/player.*?\.html.*?".*?</iframe>'
xml_root = self._html_search_regex(
r'<iframe src="(?P<xml_root>.*?)player.html.*?".*?</iframe>',
start_page, 'xml root', default=None)
PLAYER_REGEX, start_page, 'xml root', default=None)
if xml_root is None:
# Probably need to authenticate
login_res = self._login(webpage_url, display_id)
@@ -171,18 +172,19 @@ class GDCVaultIE(InfoExtractor):
start_page = login_res
# Grab the url from the authenticated page
xml_root = self._html_search_regex(
r'<iframe src="(.*?)player.html.*?".*?</iframe>',
start_page, 'xml root')
PLAYER_REGEX, start_page, 'xml root')
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xml=(.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename', default=None)
if xml_name is None:
# Fallback to the older format
xml_name = self._html_search_regex(r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>', start_page, 'xml filename')
xml_name = self._html_search_regex(
r'<iframe src=".*?\?xmlURL=xml/(?P<xml_file>.+?\.xml).*?".*?</iframe>',
start_page, 'xml filename')
xml_description_url = xml_root + 'xml/' + xml_name
xml_description = self._download_xml(xml_description_url, display_id)
xml_description = self._download_xml(
'%s/xml/%s' % (xml_root, xml_name), display_id)
video_title = xml_description.find('./metadata/title').text
video_formats = self._parse_mp4(xml_description)

View File

@@ -406,19 +406,6 @@ class GenericIE(InfoExtractor):
'skip_download': True,
},
},
# multiple ooyala embeds on SBN network websites
{
'url': 'http://www.sbnation.com/college-football-recruiting/2015/2/3/7970291/national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
'info_dict': {
'id': 'national-signing-day-rationalizations-itll-be-ok-itll-be-ok',
'title': '25 lies you will tell yourself on National Signing Day - SBNation.com',
},
'playlist_mincount': 3,
'params': {
'skip_download': True,
},
'add_ie': ['Ooyala'],
},
# embed.ly video
{
'url': 'http://www.tested.com/science/weird/460206-tested-grinding-coffee-2000-frames-second/',
@@ -1124,7 +1111,35 @@ class GenericIE(InfoExtractor):
# m3u8 downloads
'skip_download': True,
}
}
},
# Brightcove embed, with no valid 'renditions' but valid 'IOSRenditions'
# This video can't be played in browsers if Flash disabled and UA set to iPhone, which is actually a false alarm
{
'url': 'https://dl.dropboxusercontent.com/u/29092637/interview.html',
'info_dict': {
'id': '4785848093001',
'ext': 'mp4',
'title': 'The Cardinal Pell Interview',
'description': 'Sky News Contributor Andrew Bolt interviews George Pell in Rome, following the Cardinal\'s evidence before the Royal Commission into Child Abuse. ',
'uploader': 'GlobeCast Australia - GlobeStream',
},
'params': {
# m3u8 downloads
'skip_download': True,
},
},
# Another form of arte.tv embed
{
'url': 'http://www.tv-replay.fr/redirection/09-04-16/arte-reportage-arte-11508975.html',
'md5': '850bfe45417ddf221288c88a0cffe2e2',
'info_dict': {
'id': '030273-562_PLUS7-F',
'ext': 'mp4',
'title': 'ARTE Reportage - Nulle part, en France',
'description': 'md5:e3a0e8868ed7303ed509b9e3af2b870d',
'upload_date': '20160409',
},
},
]
def report_following_redirect(self, new_url):
@@ -1294,6 +1309,7 @@ class GenericIE(InfoExtractor):
'vcodec': 'none' if m.group('type') == 'audio' else None
}]
info_dict['direct'] = True
self._sort_formats(formats)
info_dict['formats'] = formats
return info_dict
@@ -1320,6 +1336,7 @@ class GenericIE(InfoExtractor):
# Is it an M3U playlist?
if first_bytes.startswith(b'#EXTM3U'):
info_dict['formats'] = self._extract_m3u8_formats(url, video_id, 'mp4')
self._sort_formats(info_dict['formats'])
return info_dict
# Maybe it's a direct link to a video?
@@ -1344,15 +1361,19 @@ class GenericIE(InfoExtractor):
if doc.tag == 'rss':
return self._extract_rss(url, video_id, doc)
elif re.match(r'^(?:{[^}]+})?smil$', doc.tag):
return self._parse_smil(doc, url, video_id)
smil = self._parse_smil(doc, url, video_id)
self._sort_formats(smil['formats'])
return smil
elif doc.tag == '{http://xspf.org/ns/0/}playlist':
return self.playlist_result(self._parse_xspf(doc, video_id), video_id)
elif re.match(r'(?i)^(?:{[^}]+})?MPD$', doc.tag):
info_dict['formats'] = self._parse_mpd_formats(
doc, video_id, mpd_base_url=url.rpartition('/')[0])
self._sort_formats(info_dict['formats'])
return info_dict
elif re.match(r'^{http://ns\.adobe\.com/f4m/[12]\.0}manifest$', doc.tag):
info_dict['formats'] = self._parse_f4m_formats(doc, url, video_id)
self._sort_formats(info_dict['formats'])
return info_dict
except compat_xml_parse_error:
pass
@@ -1693,7 +1714,7 @@ class GenericIE(InfoExtractor):
# Look for embedded arte.tv player
mobj = re.search(
r'<script [^>]*?src="(?P<url>http://www\.arte\.tv/playerv2/embed[^"]+)"',
r'<(?:script|iframe) [^>]*?src="(?P<url>http://www\.arte\.tv/(?:playerv2/embed|arte_vp/index)[^"]+)"',
webpage)
if mobj is not None:
return self.url_result(mobj.group('url'), 'ArteTVEmbed')
@@ -2037,6 +2058,9 @@ class GenericIE(InfoExtractor):
else:
entry_info_dict['url'] = video_url
if entry_info_dict.get('formats'):
self._sort_formats(entry_info_dict['formats'])
entries.append(entry_info_dict)
if len(entries) == 1:

View File

@@ -2,6 +2,7 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import unified_strdate
class GlideIE(InfoExtractor):
@@ -15,26 +16,38 @@ class GlideIE(InfoExtractor):
'ext': 'mp4',
'title': 'Damon Timm\'s Glide message',
'thumbnail': 're:^https?://.*?\.cloudfront\.net/.*\.jpg$',
'uploader': 'Damon Timm',
'upload_date': '20140919',
}
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
title = self._html_search_regex(
r'<title>(.*?)</title>', webpage, 'title')
video_url = self.http_scheme() + self._search_regex(
r'<source src="(.*?)" type="video/mp4">', webpage, 'video URL')
thumbnail_url = self._search_regex(
r'<img id="video-thumbnail" src="(.*?)"',
webpage, 'thumbnail url', fatal=False)
thumbnail = (
thumbnail_url if thumbnail_url is None
else self.http_scheme() + thumbnail_url)
r'<title>(.+?)</title>', webpage, 'title')
video_url = self._proto_relative_url(self._search_regex(
r'<source[^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'video URL', default=None,
group='url')) or self._og_search_video_url(webpage)
thumbnail = self._proto_relative_url(self._search_regex(
r'<img[^>]+id=["\']video-thumbnail["\'][^>]+src=(["\'])(?P<url>.+?)\1',
webpage, 'thumbnail url', default=None,
group='url')) or self._og_search_thumbnail(webpage)
uploader = self._search_regex(
r'<div[^>]+class=["\']info-name["\'][^>]*>([^<]+)',
webpage, 'uploader', fatal=False)
upload_date = unified_strdate(self._search_regex(
r'<div[^>]+class="info-date"[^>]*>([^<]+)',
webpage, 'upload date', fatal=False))
return {
'id': video_id,
'title': title,
'url': video_url,
'thumbnail': thumbnail,
'uploader': uploader,
'upload_date': upload_date,
}

View File

@@ -14,13 +14,13 @@ class GoshgayIE(InfoExtractor):
_VALID_URL = r'https?://www\.goshgay\.com/video(?P<id>\d+?)($|/)'
_TEST = {
'url': 'http://www.goshgay.com/video299069/diesel_sfw_xxx_video',
'md5': '027fcc54459dff0feb0bc06a7aeda680',
'md5': '4b6db9a0a333142eb9f15913142b0ed1',
'info_dict': {
'id': '299069',
'ext': 'flv',
'title': 'DIESEL SFW XXX Video',
'thumbnail': 're:^http://.*\.jpg$',
'duration': 79,
'duration': 80,
'age_limit': 18,
}
}
@@ -47,5 +47,5 @@ class GoshgayIE(InfoExtractor):
'title': title,
'thumbnail': thumbnail,
'duration': duration,
'age_limit': self._family_friendly_search(webpage),
'age_limit': 18,
}

View File

@@ -16,14 +16,14 @@ class GrouponIE(InfoExtractor):
'playlist': [{
'info_dict': {
'id': 'tubGNycTo_9Uxg82uESj4i61EYX8nyuf',
'ext': 'mp4',
'ext': 'flv',
'title': 'Bikram Yoga Huntington Beach | Orange County',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'duration': 44.961,
},
}],
'params': {
'skip_download': 'HLS',
'skip_download': 'HDS',
}
}
@@ -32,7 +32,7 @@ class GrouponIE(InfoExtractor):
webpage = self._download_webpage(url, playlist_id)
payload = self._parse_json(self._search_regex(
r'var\s+payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
r'(?:var\s+|window\.)payload\s*=\s*(.*?);\n', webpage, 'payload'), playlist_id)
videos = payload['carousel'].get('dealVideos', [])
entries = []
for v in videos:

View File

@@ -6,6 +6,7 @@ from ..utils import (
int_or_none,
js_to_json,
unescapeHTML,
determine_ext,
)
@@ -23,6 +24,7 @@ class HowStuffWorksIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'duration': 161,
},
'skip': 'Video broken',
},
{
'url': 'http://adventure.howstuffworks.com/7199-survival-zone-food-and-water-in-the-savanna-video.htm',
@@ -39,7 +41,7 @@ class HowStuffWorksIE(InfoExtractor):
'url': 'http://entertainment.howstuffworks.com/arts/2706-sword-swallowing-1-by-dan-meyer-video.htm',
'info_dict': {
'id': '440011',
'ext': 'flv',
'ext': 'mp4',
'title': 'Sword Swallowing #1 by Dan Meyer',
'description': 'Video footage (1 of 3) used by permission of the owner Dan Meyer through Sword Swallowers Association International <www.swordswallow.org>',
'display_id': 'sword-swallowing-1-by-dan-meyer',
@@ -63,13 +65,19 @@ class HowStuffWorksIE(InfoExtractor):
video_id = clip_info['content_id']
formats = []
m3u8_url = clip_info.get('m3u8')
if m3u8_url:
formats += self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
if m3u8_url and determine_ext(m3u8_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(m3u8_url, video_id, 'mp4', format_id='hls', fatal=True))
flv_url = clip_info.get('flv_url')
if flv_url:
formats.append({
'url': flv_url,
'format_id': 'flv',
})
for video in clip_info.get('mp4', []):
formats.append({
'url': video['src'],
'format_id': video['bitrate'],
'vbr': int(video['bitrate'].rstrip('k')),
'format_id': 'mp4-%s' % video['bitrate'],
'vbr': int_or_none(video['bitrate'].rstrip('k')),
})
if not formats:
@@ -102,6 +110,6 @@ class HowStuffWorksIE(InfoExtractor):
'title': unescapeHTML(clip_info['clip_title']),
'description': unescapeHTML(clip_info.get('caption')),
'thumbnail': clip_info.get('video_still_url'),
'duration': clip_info.get('duration'),
'duration': int_or_none(clip_info.get('duration')),
'formats': formats,
}

View File

@@ -152,7 +152,7 @@ class InstagramUserIE(InfoExtractor):
if not page['items']:
break
max_id = page['items'][-1]['id']
max_id = page['items'][-1]['id'].split('_')[0]
media_url = (
'http://instagram.com/%s/media?max_id=%s' % (
uploader_id, max_id))

View File

@@ -1,93 +1,91 @@
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import (
compat_parse_qs,
compat_urlparse,
compat_urllib_parse_urlencode,
)
from ..utils import (
xpath_with_ns,
determine_ext,
int_or_none,
xpath_text,
)
class InternetVideoArchiveIE(InfoExtractor):
_VALID_URL = r'https?://video\.internetvideoarchive\.net/flash/players/.*?\?.*?publishedid.*?'
_VALID_URL = r'https?://video\.internetvideoarchive\.net/(?:player|flash/players)/.*?\?.*?publishedid.*?'
_TEST = {
'url': 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?customerid=69249&publishedid=452693&playerid=247',
'url': 'http://video.internetvideoarchive.net/player/6/configuration.ashx?customerid=69249&publishedid=194487&reporttag=vdbetatitle&playerid=641&autolist=0&domain=www.videodetective.com&maxrate=high&minrate=low&socialplayer=false',
'info_dict': {
'id': '452693',
'id': '194487',
'ext': 'mp4',
'title': 'SKYFALL',
'description': 'In SKYFALL, Bond\'s loyalty to M is tested as her past comes back to haunt her. As MI6 comes under attack, 007 must track down and destroy the threat, no matter how personal the cost.',
'duration': 152,
'title': 'KICK-ASS 2',
'description': 'md5:c189d5b7280400630a1d3dd17eaa8d8a',
},
'params': {
# m3u8 download
'skip_download': True,
},
}
@staticmethod
def _build_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _build_json_url(query):
return 'http://video.internetvideoarchive.net/player/6/configuration.ashx?' + query
@staticmethod
def _clean_query(query):
NEEDED_ARGS = ['publishedid', 'customerid']
query_dic = compat_urlparse.parse_qs(query)
cleaned_dic = dict((k, v[0]) for (k, v) in query_dic.items() if k in NEEDED_ARGS)
# Other player ids return m3u8 urls
cleaned_dic['playerid'] = '247'
cleaned_dic['videokbrate'] = '100000'
return compat_urllib_parse_urlencode(cleaned_dic)
def _build_xml_url(query):
return 'http://video.internetvideoarchive.net/flash/players/flashconfiguration.aspx?' + query
def _real_extract(self, url):
query = compat_urlparse.urlparse(url).query
query_dic = compat_urlparse.parse_qs(query)
query_dic = compat_parse_qs(query)
video_id = query_dic['publishedid'][0]
url = self._build_url(query)
flashconfiguration = self._download_xml(url, video_id,
'Downloading flash configuration')
file_url = flashconfiguration.find('file').text
file_url = file_url.replace('/playlist.aspx', '/mrssplaylist.aspx')
# Replace some of the parameters in the query to get the best quality
# and http links (no m3u8 manifests)
file_url = re.sub(r'(?<=\?)(.+)$',
lambda m: self._clean_query(m.group()),
file_url)
info = self._download_xml(file_url, video_id,
'Downloading video info')
item = info.find('channel/item')
if '/player/' in url:
configuration = self._download_json(url, video_id)
def _bp(p):
return xpath_with_ns(
p,
{
'media': 'http://search.yahoo.com/mrss/',
'jwplayer': 'http://developer.longtailvideo.com/trac/wiki/FlashFormats',
}
)
formats = []
for content in item.findall(_bp('media:group/media:content')):
attr = content.attrib
f_url = attr['url']
width = int(attr['width'])
bitrate = int(attr['bitrate'])
format_id = '%d-%dk' % (width, bitrate)
formats.append({
'format_id': format_id,
'url': f_url,
'width': width,
'tbr': bitrate,
})
# There are multiple videos in the playlist whlie only the first one
# matches the video played in browsers
video_info = configuration['playlist'][0]
self._sort_formats(formats)
formats = []
for source in video_info['sources']:
file_url = source['file']
if determine_ext(file_url) == 'm3u8':
formats.extend(self._extract_m3u8_formats(
file_url, video_id, ext='mp4', m3u8_id='hls'))
else:
a_format = {
'url': file_url,
}
if source.get('label') and source['label'][-4:] == ' kbs':
tbr = int_or_none(source['label'][:-4])
a_format.update({
'tbr': tbr,
'format_id': 'http-%d' % tbr,
})
formats.append(a_format)
self._sort_formats(formats)
title = video_info['title']
description = video_info.get('description')
thumbnail = video_info.get('image')
else:
configuration = self._download_xml(url, video_id)
formats = [{
'url': xpath_text(configuration, './file', 'file URL', fatal=True),
}]
thumbnail = xpath_text(configuration, './image', 'thumbnail')
title = 'InternetVideoArchive video %s' % video_id
description = None
return {
'id': video_id,
'title': item.find('title').text,
'title': title,
'formats': formats,
'thumbnail': item.find(_bp('media:thumbnail')).attrib['url'],
'description': item.find('description').text,
'duration': int(attr['duration']),
'thumbnail': thumbnail,
'description': description,
}

View File

@@ -368,7 +368,10 @@ class IqiyiIE(InfoExtractor):
auth_req, video_id,
note='Downloading video authentication JSON',
errnote='Unable to download video authentication JSON')
if auth_result['code'] == 'Q00506': # requires a VIP account
if auth_result['code'] == 'Q00505': # No preview available (不允许试看鉴权失败)
raise ExtractorError('This video requires a VIP account', expected=True)
if auth_result['code'] == 'Q00506': # End of preview time (试看结束鉴权失败)
if do_report_warning:
self.report_warning('Needs a VIP account for full video')
return False

View File

@@ -29,7 +29,7 @@ class IzleseneIE(InfoExtractor):
'ext': 'mp4',
'title': 'Sevinçten Çıldırtan Doğum Günü Hediyesi',
'description': 'md5:253753e2655dde93f59f74b572454f6d',
'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^https?://.*\.jpg',
'uploader_id': 'pelikzzle',
'timestamp': int,
'upload_date': '20140702',
@@ -44,8 +44,7 @@ class IzleseneIE(InfoExtractor):
'id': '17997',
'ext': 'mp4',
'title': 'Tarkan Dortmund 2006 Konseri',
'description': 'Tarkan Dortmund 2006 Konseri',
'thumbnail': 're:^http://.*\.jpg',
'thumbnail': 're:^https://.*\.jpg',
'uploader_id': 'parlayankiz',
'timestamp': int,
'upload_date': '20061112',
@@ -62,7 +61,7 @@ class IzleseneIE(InfoExtractor):
webpage = self._download_webpage(url, video_id)
title = self._og_search_title(webpage)
description = self._og_search_description(webpage)
description = self._og_search_description(webpage, default=None)
thumbnail = self._proto_relative_url(
self._og_search_thumbnail(webpage), scheme='http:')

View File

@@ -1,47 +0,0 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from .youtube import YoutubeIE
class JadoreCettePubIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?jadorecettepub\.com/[0-9]{4}/[0-9]{2}/(?P<id>.*?)\.html'
_TEST = {
'url': 'http://www.jadorecettepub.com/2010/12/star-wars-massacre-par-les-japonais.html',
'md5': '401286a06067c70b44076044b66515de',
'info_dict': {
'id': 'jLMja3tr7a4',
'ext': 'mp4',
'title': 'La pire utilisation de Star Wars',
'description': "Jadorecettepub.com vous a gratifié de plusieurs pubs géniales utilisant Star Wars et Dark Vador plus particulièrement... Mais l'heure est venue de vous proposer une version totalement massacrée, venue du Japon. Quand les Japonais détruisent l'image de Star Wars pour vendre du thon en boite, ça promet...",
},
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
display_id = mobj.group('id')
webpage = self._download_webpage(url, display_id)
title = self._html_search_regex(
r'<span style="font-size: x-large;"><b>(.*?)</b></span>',
webpage, 'title')
description = self._html_search_regex(
r'(?s)<div id="fb-root">(.*?)<script>', webpage, 'description',
fatal=False)
real_url = self._search_regex(
r'\[/postlink\](.*)endofvid', webpage, 'video URL')
video_id = YoutubeIE.extract_id(real_url)
return {
'_type': 'url_transparent',
'url': real_url,
'id': video_id,
'title': title,
'description': description,
}

View File

@@ -4,16 +4,15 @@ from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import int_or_none
from ..utils import (
float_or_none,
int_or_none,
)
class JWPlatformBaseIE(InfoExtractor):
def _parse_jwplayer_data(self, jwplayer_data, video_id, require_title=True):
video_data = jwplayer_data['playlist'][0]
subtitles = {}
for track in video_data['tracks']:
if track['kind'] == 'captions':
subtitles[track['label']] = [{'url': self._proto_relative_url(track['file'])}]
formats = []
for source in video_data['sources']:
@@ -35,12 +34,22 @@ class JWPlatformBaseIE(InfoExtractor):
})
self._sort_formats(formats)
subtitles = {}
tracks = video_data.get('tracks')
if tracks and isinstance(tracks, list):
for track in tracks:
if track.get('file') and track.get('kind') == 'captions':
subtitles.setdefault(track.get('label') or 'en', []).append({
'url': self._proto_relative_url(track['file'])
})
return {
'id': video_id,
'title': video_data['title'] if require_title else video_data.get('title'),
'description': video_data.get('description'),
'thumbnail': self._proto_relative_url(video_data.get('image')),
'timestamp': int_or_none(video_data.get('pubdate')),
'duration': float_or_none(jwplayer_data.get('duration')),
'subtitles': subtitles,
'formats': formats,
}

View File

@@ -26,10 +26,23 @@ class KuwoBaseIE(InfoExtractor):
def _get_formats(self, song_id, tolerate_ip_deny=False):
formats = []
for file_format in self._FORMATS:
headers = {}
cn_verification_proxy = self._downloader.params.get('cn_verification_proxy')
if cn_verification_proxy:
headers['Ytdl-request-proxy'] = cn_verification_proxy
query = {
'format': file_format['ext'],
'br': file_format.get('br', ''),
'rid': 'MUSIC_%s' % song_id,
'type': 'convert_url',
'response': 'url'
}
song_url = self._download_webpage(
'http://antiserver.kuwo.cn/anti.s?format=%s&br=%s&rid=MUSIC_%s&type=convert_url&response=url' %
(file_format['ext'], file_format.get('br', ''), song_id),
'http://antiserver.kuwo.cn/anti.s',
song_id, note='Download %s url info' % file_format['format'],
query=query, headers=headers,
)
if song_url == 'IPDeny' and not tolerate_ip_deny:
@@ -44,18 +57,13 @@ class KuwoBaseIE(InfoExtractor):
'abr': file_format.get('abr'),
})
# XXX _sort_formats fails if there are not formats, while it's not the
# desired behavior if 'IPDeny' is ignored
# This check can be removed if https://github.com/rg3/youtube-dl/pull/8051 is merged
if not tolerate_ip_deny:
self._sort_formats(formats)
return formats
class KuwoIE(KuwoBaseIE):
IE_NAME = 'kuwo:song'
IE_DESC = '酷我音乐'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+?)'
_VALID_URL = r'https?://www\.kuwo\.cn/yinyue/(?P<id>\d+)'
_TESTS = [{
'url': 'http://www.kuwo.cn/yinyue/635632/',
'info_dict': {
@@ -103,6 +111,7 @@ class KuwoIE(KuwoBaseIE):
lrc_content = None
formats = self._get_formats(song_id)
self._sort_formats(formats)
album_id = self._html_search_regex(
r'<p[^>]+class="album"[^<]+<a[^>]+href="http://www\.kuwo\.cn/album/(\d+)/"',

View File

@@ -130,6 +130,7 @@ class Laola1TvIE(InfoExtractor):
formats = self._extract_f4m_formats(
'%s?hdnea=%s&hdcore=3.2.0' % (token_attrib['url'], token_auth),
video_id, f4m_id='hds')
self._sort_formats(formats)
categories_str = _v('meta_sports')
categories = categories_str.split(',') if categories_str else []

View File

@@ -37,6 +37,7 @@ class LRTIE(InfoExtractor):
r'file\s*:\s*(["\'])(?P<url>.+?)\1\s*\+\s*location\.hash\.substring\(1\)',
webpage, 'm3u8 url', group='url')
formats = self._extract_m3u8_formats(m3u8_url, video_id, 'mp4')
self._sort_formats(formats)
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)

View File

@@ -28,8 +28,8 @@ class LyndaBaseIE(InfoExtractor):
return
login_form = {
'username': username.encode('utf-8'),
'password': password.encode('utf-8'),
'username': username,
'password': password,
'remember': 'false',
'stayPut': 'false'
}
@@ -219,7 +219,7 @@ class LyndaCourseIE(LyndaBaseIE):
'Course %s does not exist' % course_id, expected=True)
unaccessible_videos = 0
videos = []
entries = []
# Might want to extract videos right here from video['Formats'] as it seems 'Formats' is not provided
# by single video API anymore
@@ -229,20 +229,22 @@ class LyndaCourseIE(LyndaBaseIE):
if video.get('HasAccess') is False:
unaccessible_videos += 1
continue
if video.get('ID'):
videos.append(video['ID'])
video_id = video.get('ID')
if video_id:
entries.append({
'_type': 'url_transparent',
'url': 'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'ie_key': LyndaIE.ie_key(),
'chapter': chapter.get('Title'),
'chapter_number': int_or_none(chapter.get('ChapterIndex')),
'chapter_id': compat_str(chapter.get('ID')),
})
if unaccessible_videos > 0:
self._downloader.report_warning(
'%s videos are only available for members (or paid members) and will not be downloaded. '
% unaccessible_videos + self._ACCOUNT_CREDENTIALS_HINT)
entries = [
self.url_result(
'http://www.lynda.com/%s/%s-4.html' % (course_path, video_id),
'Lynda')
for video_id in videos]
course_title = course.get('Title')
return self.playlist_result(entries, course_id, course_title)

View File

@@ -13,7 +13,7 @@ from ..utils import (
class MailRuIE(InfoExtractor):
IE_NAME = 'mailru'
IE_DESC = 'Видео@Mail.Ru'
_VALID_URL = r'https?://(?:www\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
_VALID_URL = r'https?://(?:(?:www|m)\.)?my\.mail\.ru/(?:video/.*#video=/?(?P<idv1>(?:[^/]+/){3}\d+)|(?:(?P<idv2prefix>(?:[^/]+/){2})video/(?P<idv2suffix>[^/]+/\d+))\.html)'
_TESTS = [
{
@@ -61,6 +61,10 @@ class MailRuIE(InfoExtractor):
'duration': 6001,
},
'skip': 'Not accessible from Travis CI server',
},
{
'url': 'http://m.my.mail.ru/mail/3sktvtr/video/_myvideo/138.html',
'only_matching': True,
}
]

View File

@@ -47,6 +47,7 @@ class MatchTVIE(InfoExtractor):
video_url = self._download_json(request, video_id)['data']['videoUrl']
f4m_url = xpath_text(self._download_xml(video_url, video_id), './to')
formats = self._extract_f4m_formats(f4m_url, video_id)
self._sort_formats(formats)
return {
'id': video_id,
'title': self._live_title('Матч ТВ - Прямой эфир'),

View File

@@ -67,6 +67,7 @@ class MiTeleIE(InfoExtractor):
formats.extend(self._extract_f4m_formats(
file_ + '&hdcore=3.2.0&plugin=aasp-3.2.0.77.18',
display_id, f4m_id=loc))
self._sort_formats(formats)
title = self._search_regex(
r'class="Destacado-text"[^>]*>\s*<strong>([^<]+)</strong>', webpage, 'title')

View File

@@ -7,6 +7,7 @@ from ..compat import compat_urllib_parse_unquote
from ..utils import (
ExtractorError,
HEADRequest,
NO_DEFAULT,
parse_count,
str_to_int,
)
@@ -63,8 +64,17 @@ class MixcloudIE(InfoExtractor):
webpage = self._download_webpage(url, track_id)
message = self._html_search_regex(
r'(?s)<div[^>]+class="global-message cloudcast-disabled-notice-light"[^>]*>(.+?)<(?:a|/div)',
webpage, 'error message', default=None)
preview_url = self._search_regex(
r'\s(?:data-preview-url|m-preview)="([^"]+)"', webpage, 'preview url')
r'\s(?:data-preview-url|m-preview)="([^"]+)"',
webpage, 'preview url', default=None if message else NO_DEFAULT)
if message:
raise ExtractorError('%s said: %s' % (self.IE_NAME, message), expected=True)
song_url = re.sub(r'audiocdn(\d+)', r'stream\1', preview_url)
song_url = song_url.replace('/previews/', '/c/originals/')
if not self._check_url(song_url, track_id, 'mp3'):

View File

@@ -2,39 +2,48 @@
from __future__ import unicode_literals
from .common import InfoExtractor
from ..utils import sanitized_Request
from ..utils import (
smuggle_url,
float_or_none,
parse_iso8601,
update_url_query,
)
class MovieClipsIE(InfoExtractor):
_VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/(?P<id>[^/?#]+)'
_VALID_URL = r'https?://(?:www.)?movieclips\.com/videos/.+-(?P<id>\d+)(?:\?|$)'
_TEST = {
'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597?autoPlay=true&playlistId=5',
'url': 'http://www.movieclips.com/videos/warcraft-trailer-1-561180739597',
'md5': '42b5a0352d4933a7bd54f2104f481244',
'info_dict': {
'id': 'pKIGmG83AqD9',
'display_id': 'warcraft-trailer-1-561180739597',
'ext': 'mp4',
'title': 'Warcraft Trailer 1',
'description': 'Watch Trailer 1 from Warcraft (2016). Legendarys WARCRAFT is a 3D epic adventure of world-colliding conflict based.',
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1446843055,
'upload_date': '20151106',
'uploader': 'Movieclips',
},
'add_ie': ['ThePlatform'],
}
def _real_extract(self, url):
display_id = self._match_id(url)
req = sanitized_Request(url)
# it doesn't work if it thinks the browser it's too old
req.add_header('User-Agent', 'Mozilla/5.0 (X11; Linux x86_64; rv:10.0) Gecko/20150101 Firefox/43.0 (Chrome)')
webpage = self._download_webpage(req, display_id)
theplatform_link = self._html_search_regex(r'src="(http://player.theplatform.com/p/.*?)"', webpage, 'theplatform link')
title = self._html_search_regex(r'<title[^>]*>([^>]+)-\s*\d+\s*|\s*Movieclips.com</title>', webpage, 'title')
description = self._html_search_meta('description', webpage)
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
video = next(v for v in self._parse_json(self._search_regex(
r'var\s+__REACT_ENGINE__\s*=\s*({.+});',
webpage, 'react engine'), video_id)['playlist']['videos'] if v['id'] == video_id)
return {
'_type': 'url_transparent',
'url': theplatform_link,
'title': title,
'display_id': display_id,
'description': description,
'ie_key': 'ThePlatform',
'url': smuggle_url(update_url_query(
video['contentUrl'], {'mbr': 'true'}), {'force_smil_url': True}),
'title': self._og_search_title(webpage),
'description': self._html_search_meta('description', webpage),
'duration': float_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('dateCreated')),
'thumbnail': video.get('defaultImage'),
'uploader': video.get('provider'),
}

View File

@@ -2,13 +2,13 @@
from __future__ import unicode_literals
import re
import json
from .common import InfoExtractor
from ..compat import (
compat_str,
from ..utils import (
ExtractorError,
int_or_none,
parse_iso8601,
)
from ..utils import ExtractorError
class MySpaceIE(InfoExtractor):
@@ -24,6 +24,8 @@ class MySpaceIE(InfoExtractor):
'description': 'This country quartet was all smiles while playing a sold out show at the Pacific Amphitheatre in Orange County, California.',
'uploader': 'Five Minutes to the Stage',
'uploader_id': 'fiveminutestothestage',
'timestamp': 1414108751,
'upload_date': '20141023',
},
'params': {
# rtmp download
@@ -64,7 +66,7 @@ class MySpaceIE(InfoExtractor):
'ext': 'mp4',
'title': 'Starset - First Light',
'description': 'md5:2d5db6c9d11d527683bcda818d332414',
'uploader': 'Jacob Soren',
'uploader': 'Yumi K',
'uploader_id': 'SorenPromotions',
'upload_date': '20140725',
}
@@ -78,6 +80,19 @@ class MySpaceIE(InfoExtractor):
player_url = self._search_regex(
r'playerSwf":"([^"?]*)', webpage, 'player URL')
def rtmp_format_from_stream_url(stream_url, width=None, height=None):
rtmp_url, play_path = stream_url.split(';', 1)
return {
'format_id': 'rtmp',
'url': rtmp_url,
'play_path': play_path,
'player_url': player_url,
'protocol': 'rtmp',
'ext': 'flv',
'width': width,
'height': height,
}
if mobj.group('mediatype').startswith('music/song'):
# songs don't store any useful info in the 'context' variable
song_data = self._search_regex(
@@ -93,8 +108,8 @@ class MySpaceIE(InfoExtractor):
return self._search_regex(
r'''data-%s=([\'"])(?P<data>.*?)\1''' % name,
song_data, name, default='', group='data')
streamUrl = search_data('stream-url')
if not streamUrl:
stream_url = search_data('stream-url')
if not stream_url:
vevo_id = search_data('vevo-id')
youtube_id = search_data('youtube-id')
if vevo_id:
@@ -106,36 +121,47 @@ class MySpaceIE(InfoExtractor):
else:
raise ExtractorError(
'Found song but don\'t know how to download it')
info = {
return {
'id': video_id,
'title': self._og_search_title(webpage),
'uploader': search_data('artist-name'),
'uploader_id': search_data('artist-username'),
'thumbnail': self._og_search_thumbnail(webpage),
'duration': int_or_none(search_data('duration')),
'formats': [rtmp_format_from_stream_url(stream_url)]
}
else:
context = json.loads(self._search_regex(
r'context = ({.*?});', webpage, 'context'))
video = context['video']
streamUrl = video['streamUrl']
info = {
'id': compat_str(video['mediaId']),
video = self._parse_json(self._search_regex(
r'context = ({.*?});', webpage, 'context'),
video_id)['video']
formats = []
hls_stream_url = video.get('hlsStreamUrl')
if hls_stream_url:
formats.append({
'format_id': 'hls',
'url': hls_stream_url,
'protocol': 'm3u8_native',
'ext': 'mp4',
})
stream_url = video.get('streamUrl')
if stream_url:
formats.append(rtmp_format_from_stream_url(
stream_url,
int_or_none(video.get('width')),
int_or_none(video.get('height'))))
self._sort_formats(formats)
return {
'id': video_id,
'title': video['title'],
'description': video['description'],
'thumbnail': video['imageUrl'],
'uploader': video['artistName'],
'uploader_id': video['artistUsername'],
'description': video.get('description'),
'thumbnail': video.get('imageUrl'),
'uploader': video.get('artistName'),
'uploader_id': video.get('artistUsername'),
'duration': int_or_none(video.get('duration')),
'timestamp': parse_iso8601(video.get('dateAdded')),
'formats': formats,
}
rtmp_url, play_path = streamUrl.split(';', 1)
info.update({
'url': rtmp_url,
'play_path': play_path,
'player_url': player_url,
'ext': 'flv',
})
return info
class MySpaceAlbumIE(InfoExtractor):
IE_NAME = 'MySpace:album'

View File

@@ -4,30 +4,40 @@ from .common import InfoExtractor
from ..utils import (
smuggle_url,
url_basename,
update_url_query,
)
class NationalGeographicIE(InfoExtractor):
IE_NAME = 'natgeo'
_VALID_URL = r'https?://video\.nationalgeographic\.com/.*?'
_TESTS = [
{
'url': 'http://video.nationalgeographic.com/video/news/150210-news-crab-mating-vin?source=featuredvideo',
'md5': '730855d559abbad6b42c2be1fa584917',
'info_dict': {
'id': '4DmDACA6Qtk_',
'ext': 'flv',
'id': '0000014b-70a1-dd8c-af7f-f7b559330001',
'ext': 'mp4',
'title': 'Mating Crabs Busted by Sharks',
'description': 'md5:16f25aeffdeba55aaa8ec37e093ad8b3',
'timestamp': 1423523799,
'upload_date': '20150209',
'uploader': 'NAGS',
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://video.nationalgeographic.com/wild/when-sharks-attack/the-real-jaws',
'md5': '6a3105eb448c070503b3105fb9b320b5',
'info_dict': {
'id': '_JeBD_D7PlS5',
'ext': 'flv',
'id': 'ngc-I0IauNSWznb_UV008GxSbwY35BZvgi2e',
'ext': 'mp4',
'title': 'The Real Jaws',
'description': 'md5:8d3e09d9d53a85cd397b4b21b2c77be6',
'timestamp': 1433772632,
'upload_date': '20150608',
'uploader': 'NAGS',
},
'add_ie': ['ThePlatform'],
},
@@ -37,18 +47,67 @@ class NationalGeographicIE(InfoExtractor):
name = url_basename(url)
webpage = self._download_webpage(url, name)
feed_url = self._search_regex(
r'data-feed-url="([^"]+)"', webpage, 'feed url')
guid = self._search_regex(
r'id="(?:videoPlayer|player-container)"[^>]+data-guid="([^"]+)"',
webpage, 'guid')
feed = self._download_xml('%s?byGuid=%s' % (feed_url, guid), name)
content = feed.find('.//{http://search.yahoo.com/mrss/}content')
theplatform_id = url_basename(content.attrib.get('url'))
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
'http://link.theplatform.com/s/ngs/media/guid/2423130747/%s?mbr=true' % guid,
{'force_smil_url': True}),
'id': guid,
}
return self.url_result(smuggle_url(
'http://link.theplatform.com/s/ngs/%s?formats=MPEG4&manifest=f4m' % theplatform_id,
# For some reason, the normal links don't work and we must force
# the use of f4m
{'force_smil_url': True}))
class NationalGeographicChannelIE(InfoExtractor):
IE_NAME = 'natgeo:channel'
_VALID_URL = r'https?://channel\.nationalgeographic\.com/(?:wild/)?[^/]+/videos/(?P<id>[^/?]+)'
_TESTS = [
{
'url': 'http://channel.nationalgeographic.com/the-story-of-god-with-morgan-freeman/videos/uncovering-a-universal-knowledge/',
'md5': '518c9aa655686cf81493af5cc21e2a04',
'info_dict': {
'id': 'nB5vIAfmyllm',
'ext': 'mp4',
'title': 'Uncovering a Universal Knowledge',
'description': 'md5:1a89148475bf931b3661fcd6ddb2ae3a',
'timestamp': 1458680907,
'upload_date': '20160322',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
},
{
'url': 'http://channel.nationalgeographic.com/wild/destination-wild/videos/the-stunning-red-bird-of-paradise/',
'md5': 'c4912f656b4cbe58f3e000c489360989',
'info_dict': {
'id': '3TmMv9OvGwIR',
'ext': 'mp4',
'title': 'The Stunning Red Bird of Paradise',
'description': 'md5:7bc8cd1da29686be4d17ad1230f0140c',
'timestamp': 1459362152,
'upload_date': '20160330',
'uploader': 'NEWA-FNG-NGTV',
},
'add_ie': ['ThePlatform'],
},
]
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
release_url = self._search_regex(
r'video_auth_playlist_url\s*=\s*"([^"]+)"',
webpage, 'release url')
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(
update_url_query(release_url, {'mbr': 'true', 'switch': 'http'}),
{'force_smil_url': True}),
'display_id': display_id,
}

View File

@@ -27,6 +27,9 @@ class NBCIE(InfoExtractor):
'ext': 'mp4',
'title': 'Jimmy Fallon Surprises Fans at Ben & Jerry\'s',
'description': 'Jimmy gives out free scoops of his new "Tonight Dough" ice cream flavor by surprising customers at the Ben & Jerry\'s scoop shop.',
'timestamp': 1424246400,
'upload_date': '20150218',
'uploader': 'NBCU-COM',
},
'params': {
# m3u8 download
@@ -50,6 +53,9 @@ class NBCIE(InfoExtractor):
'ext': 'mp4',
'title': 'Star Wars Teaser',
'description': 'md5:0b40f9cbde5b671a7ff62fceccc4f442',
'timestamp': 1417852800,
'upload_date': '20141206',
'uploader': 'NBCU-COM',
},
'params': {
# m3u8 download
@@ -78,6 +84,7 @@ class NBCIE(InfoExtractor):
theplatform_url = 'http:' + theplatform_url
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': smuggle_url(theplatform_url, {'source_url': url}),
'id': video_id,
}
@@ -93,6 +100,9 @@ class NBCSportsVPlayerIE(InfoExtractor):
'ext': 'flv',
'description': 'md5:df390f70a9ba7c95ff1daace988f0d8d',
'title': 'Tyler Kalinoski hits buzzer-beater to lift Davidson',
'timestamp': 1426270238,
'upload_date': '20150313',
'uploader': 'NBCU-SPORTS',
}
}, {
'url': 'http://vplayer.nbcsports.com/p/BxmELC/nbc_embedshare/select/_hqLjQ95yx8Z',
@@ -134,6 +144,33 @@ class NBCSportsIE(InfoExtractor):
NBCSportsVPlayerIE._extract_url(webpage), 'NBCSportsVPlayer')
class CSNNEIE(InfoExtractor):
_VALID_URL = r'https?://www\.csnne\.com/video/(?P<id>[0-9a-z-]+)'
_TEST = {
'url': 'http://www.csnne.com/video/snc-evening-update-wright-named-red-sox-no-5-starter',
'info_dict': {
'id': 'yvBLLUgQ8WU0',
'ext': 'mp4',
'title': 'SNC evening update: Wright named Red Sox\' No. 5 starter.',
'description': 'md5:1753cfee40d9352b19b4c9b3e589b9e3',
'timestamp': 1459369979,
'upload_date': '20160330',
'uploader': 'NBCU-SPORTS',
}
}
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'url': self._html_search_meta('twitter:player:stream', webpage),
'display_id': display_id,
}
class NBCNewsIE(ThePlatformIE):
_VALID_URL = r'''(?x)https?://(?:www\.)?nbcnews\.com/
(?:video/.+?/(?P<id>\d+)|
@@ -307,6 +344,7 @@ class MSNBCIE(InfoExtractor):
'thumbnail': 're:^https?://.*\.jpg$',
'timestamp': 1406937606,
'upload_date': '20140802',
'uploader': 'NBCU-NEWS',
'categories': ['MSNBC/Topics/Franchise/Best of last night', 'MSNBC/Topics/General/Congress'],
},
}

View File

@@ -89,6 +89,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1431878400,
'description': 'md5:a10a54589c2860300d02e1de821eb2ef',
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'No lyrics translation.',
'url': 'http://music.163.com/#/song?id=29822014',
@@ -101,6 +102,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'timestamp': 1419523200,
'description': 'md5:a4d8d89f44656af206b7b2555c0bce6c',
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'No lyrics.',
'url': 'http://music.163.com/song?id=17241424',
@@ -112,6 +114,7 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20080211',
'timestamp': 1202745600,
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Has translated name.',
'url': 'http://music.163.com/#/song?id=22735043',
@@ -124,7 +127,8 @@ class NetEaseMusicIE(NetEaseMusicBaseIE):
'upload_date': '20100127',
'timestamp': 1264608000,
'alt_title': '说出愿望吧(Genie)',
}
},
'skip': 'Blocked outside Mainland China',
}]
def _process_lyrics(self, lyrics_info):
@@ -192,6 +196,7 @@ class NetEaseMusicAlbumIE(NetEaseMusicBaseIE):
'title': 'B\'day',
},
'playlist_count': 23,
'skip': 'Blocked outside Mainland China',
}
def _real_extract(self, url):
@@ -223,6 +228,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '张惠妹 - aMEI;阿密特',
},
'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Singer has translated name.',
'url': 'http://music.163.com/#/artist?id=124098',
@@ -231,6 +237,7 @@ class NetEaseMusicSingerIE(NetEaseMusicBaseIE):
'title': '李昇基 - 이승기',
},
'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}]
def _real_extract(self, url):
@@ -266,6 +273,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:12fd0819cab2965b9583ace0f8b7b022'
},
'playlist_count': 99,
'skip': 'Blocked outside Mainland China',
}, {
'note': 'Toplist/Charts sample',
'url': 'http://music.163.com/#/discover/toplist?id=3733003',
@@ -275,6 +283,7 @@ class NetEaseMusicListIE(NetEaseMusicBaseIE):
'description': 'md5:73ec782a612711cadc7872d9c1e134fc',
},
'playlist_count': 50,
'skip': 'Blocked outside Mainland China',
}]
def _real_extract(self, url):
@@ -314,6 +323,7 @@ class NetEaseMusicMvIE(NetEaseMusicBaseIE):
'creator': '白雅言',
'upload_date': '20150520',
},
'skip': 'Blocked outside Mainland China',
}
def _real_extract(self, url):
@@ -357,6 +367,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'upload_date': '20150613',
'duration': 900,
},
'skip': 'Blocked outside Mainland China',
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
@@ -366,6 +377,7 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
'description': 'md5:8d594db46cc3e6509107ede70a4aaa3b',
},
'playlist_count': 4,
'skip': 'Blocked outside Mainland China',
}, {
'note': 'This program has accompanying songs.',
'url': 'http://music.163.com/#/program?id=10141022',
@@ -379,7 +391,8 @@ class NetEaseMusicProgramIE(NetEaseMusicBaseIE):
},
'params': {
'noplaylist': True
}
},
'skip': 'Blocked outside Mainland China',
}]
def _real_extract(self, url):
@@ -438,6 +451,7 @@ class NetEaseMusicDjRadioIE(NetEaseMusicBaseIE):
'description': 'md5:766220985cbd16fdd552f64c578a6b15'
},
'playlist_mincount': 40,
'skip': 'Blocked outside Mainland China',
}
_PAGE_SIZE = 1000

View File

@@ -16,7 +16,14 @@ class NovaMovIE(InfoExtractor):
IE_NAME = 'novamov'
IE_DESC = 'NovaMov'
_VALID_URL_TEMPLATE = r'http://(?:(?:www\.)?%(host)s/(?:file|video|mobile/#/videos)/|(?:(?:embed|www)\.)%(host)s/embed\.php\?(?:.*?&)?v=)(?P<id>[a-z\d]{13})'
_VALID_URL_TEMPLATE = r'''(?x)
http://
(?:
(?:www\.)?%(host)s/(?:file|video|mobile/\#/videos)/|
(?:(?:embed|www)\.)%(host)s/embed(?:\.php|/)?\?(?:.*?&)?\bv=
)
(?P<id>[a-z\d]{13})
'''
_VALID_URL = _VALID_URL_TEMPLATE % {'host': 'novamov\.com'}
_HOST = 'www.novamov.com'
@@ -27,17 +34,7 @@ class NovaMovIE(InfoExtractor):
_DESCRIPTION_REGEX = r'(?s)<div class="v_tab blockborder rounded5" id="v_tab1">\s*<h3>[^<]+</h3><p>([^<]+)</p>'
_URL_TEMPLATE = 'http://%s/video/%s'
_TEST = {
'url': 'http://www.novamov.com/video/4rurhn9x446jj',
'md5': '7205f346a52bbeba427603ba10d4b935',
'info_dict': {
'id': '4rurhn9x446jj',
'ext': 'flv',
'title': 'search engine optimization',
'description': 'search engine optimization is used to rank the web page in the google search engine'
},
'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
}
_TEST = None
def _check_existence(self, webpage, video_id):
if re.search(self._FILE_DELETED_REGEX, webpage) is not None:
@@ -81,7 +78,7 @@ class NovaMovIE(InfoExtractor):
filekey = extract_filekey()
title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title', fatal=False)
title = self._html_search_regex(self._TITLE_REGEX, webpage, 'title')
description = self._html_search_regex(self._DESCRIPTION_REGEX, webpage, 'description', default='', fatal=False)
api_response = self._download_webpage(
@@ -187,3 +184,29 @@ class CloudTimeIE(NovaMovIE):
_TITLE_REGEX = r'<div[^>]+class=["\']video_det["\'][^>]*>\s*<strong>([^<]+)</strong>'
_TEST = None
class AuroraVidIE(NovaMovIE):
IE_NAME = 'auroravid'
IE_DESC = 'AuroraVid'
_VALID_URL = NovaMovIE._VALID_URL_TEMPLATE % {'host': 'auroravid\.to'}
_HOST = 'www.auroravid.to'
_FILE_DELETED_REGEX = r'This file no longer exists on our servers!<'
_TESTS = [{
'url': 'http://www.auroravid.to/video/4rurhn9x446jj',
'md5': '7205f346a52bbeba427603ba10d4b935',
'info_dict': {
'id': '4rurhn9x446jj',
'ext': 'flv',
'title': 'search engine optimization',
'description': 'search engine optimization is used to rank the web page in the google search engine'
},
'skip': '"Invalid token" errors abound (in web interface as well as youtube-dl, there is nothing we can do about it.)'
}, {
'url': 'http://www.auroravid.to/embed/?v=4rurhn9x446jj',
'only_matching': True,
}]

View File

@@ -63,8 +63,11 @@ class NownessIE(NownessBaseIE):
'title': 'Candor: The Art of Gesticulation',
'description': 'Candor: The Art of Gesticulation',
'thumbnail': 're:^https?://.*\.jpg',
'uploader': 'Nowness',
'timestamp': 1446745676,
'upload_date': '20151105',
'uploader_id': '2385340575001',
},
'add_ie': ['BrightcoveNew'],
}, {
'url': 'https://cn.nowness.com/story/kasper-bjorke-ft-jaakko-eino-kalevi-tnr',
'md5': 'e79cf125e387216f86b2e0a5b5c63aa3',
@@ -74,8 +77,11 @@ class NownessIE(NownessBaseIE):
'title': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
'description': 'Kasper Bjørke ft. Jaakko Eino Kalevi: TNR',
'thumbnail': 're:^https?://.*\.jpg',
'uploader': 'Nowness',
'timestamp': 1407315371,
'upload_date': '20140806',
'uploader_id': '2385340575001',
},
'add_ie': ['BrightcoveNew'],
}, {
# vimeo
'url': 'https://www.nowness.com/series/nowness-picks/jean-luc-godard-supercut',
@@ -90,6 +96,7 @@ class NownessIE(NownessBaseIE):
'uploader': 'Cinema Sem Lei',
'uploader_id': 'cinemasemlei',
},
'add_ie': ['Vimeo'],
}]
def _real_extract(self, url):

View File

@@ -63,6 +63,7 @@ class NRKIE(InfoExtractor):
if determine_ext(media_url) == 'f4m':
formats = self._extract_f4m_formats(
media_url + '?hdcore=3.5.0&plugin=aasp-3.5.0.151.81', video_id, f4m_id='hds')
self._sort_formats(formats)
else:
formats = [{
'url': media_url,

View File

@@ -64,8 +64,8 @@ class PluralsightIE(PluralsightBaseIE):
login_form = self._hidden_inputs(login_page)
login_form.update({
'Username': username.encode('utf-8'),
'Password': password.encode('utf-8'),
'Username': username,
'Password': password,
})
post_url = self._search_regex(
@@ -279,13 +279,18 @@ class PluralsightCourseIE(PluralsightBaseIE):
course_id, 'Downloading course data JSON')
entries = []
for module in course_data:
for num, module in enumerate(course_data, 1):
for clip in module.get('clips', []):
player_parameters = clip.get('playerParameters')
if not player_parameters:
continue
entries.append(self.url_result(
'%s/training/player?%s' % (self._API_BASE, player_parameters),
'Pluralsight'))
entries.append({
'_type': 'url_transparent',
'url': '%s/training/player?%s' % (self._API_BASE, player_parameters),
'ie_key': PluralsightIE.ie_key(),
'chapter': module.get('title'),
'chapter_number': num,
'chapter_id': module.get('moduleRef'),
})
return self.playlist_result(entries, course_id, title, description)

View File

@@ -1,10 +1,12 @@
from __future__ import unicode_literals
import itertools
import os
import re
from .common import InfoExtractor
from ..compat import (
compat_HTTPError,
compat_urllib_parse_unquote,
compat_urllib_parse_unquote_plus,
compat_urllib_parse_urlparse,
@@ -12,6 +14,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
int_or_none,
orderedSet,
sanitized_Request,
str_to_int,
)
@@ -75,7 +78,7 @@ class PornHubIE(InfoExtractor):
flashvars = self._parse_json(
self._search_regex(
r'var\s+flashv1ars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
r'var\s+flashvars_\d+\s*=\s*({.+?});', webpage, 'flashvars', default='{}'),
video_id)
if flashvars:
video_title = flashvars.get('video_title')
@@ -149,9 +152,12 @@ class PornHubIE(InfoExtractor):
class PornHubPlaylistBaseIE(InfoExtractor):
def _extract_entries(self, webpage):
return [
self.url_result('http://www.pornhub.com/%s' % video_url, PornHubIE.ie_key())
for video_url in set(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"', webpage))
self.url_result(
'http://www.pornhub.com/%s' % video_url,
PornHubIE.ie_key(), video_title=title)
for video_url, title in orderedSet(re.findall(
r'href="/?(view_video\.php\?.*\bviewkey=[\da-z]+[^"]*)"[^>]*\s+title="([^"]+)"',
webpage))
]
def _real_extract(self, url):
@@ -185,16 +191,31 @@ class PornHubPlaylistIE(PornHubPlaylistBaseIE):
class PornHubUserVideosIE(PornHubPlaylistBaseIE):
_VALID_URL = r'https?://(?:www\.)?pornhub\.com/users/(?P<id>[^/]+)/videos'
_TESTS = [{
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'url': 'http://www.pornhub.com/users/zoe_ph/videos/public',
'info_dict': {
'id': 'rushandlia',
'id': 'zoe_ph',
},
'playlist_mincount': 13,
'playlist_mincount': 171,
}, {
'url': 'http://www.pornhub.com/users/rushandlia/videos',
'only_matching': True,
}]
def _real_extract(self, url):
user_id = self._match_id(url)
webpage = self._download_webpage(url, user_id)
entries = []
for page_num in itertools.count(1):
try:
webpage = self._download_webpage(
url, user_id, 'Downloading page %d' % page_num,
query={'page': page_num})
except ExtractorError as e:
if isinstance(e.cause, compat_HTTPError) and e.cause.code == 404:
break
page_entries = self._extract_entries(webpage)
if not page_entries:
break
entries.extend(page_entries)
return self.playlist_result(self._extract_entries(webpage), user_id)
return self.playlist_result(entries, user_id)

View File

@@ -0,0 +1,74 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import remove_start
class PressTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?presstv\.ir/[^/]+/(?P<y>\d+)/(?P<m>\d+)/(?P<d>\d+)/(?P<id>\d+)/(?P<display_id>[^/]+)?'
_TEST = {
'url': 'http://www.presstv.ir/Detail/2016/04/09/459911/Australian-sewerage-treatment-facility-/',
'md5': '5d7e3195a447cb13e9267e931d8dd5a5',
'info_dict': {
'id': '459911',
'display_id': 'Australian-sewerage-treatment-facility-',
'ext': 'mp4',
'title': 'Organic mattresses used to clean waste water',
'upload_date': '20160409',
'thumbnail': 're:^https?://.*\.jpg',
'description': 'md5:20002e654bbafb6908395a5c0cfcd125'
}
}
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
video_id = mobj.group('id')
display_id = mobj.group('display_id') or video_id
webpage = self._download_webpage(url, display_id)
# extract video URL from webpage
video_url = self._hidden_inputs(webpage)['inpPlayback']
# build list of available formats
# specified in http://www.presstv.ir/Scripts/playback.js
base_url = 'http://192.99.219.222:82/presstv'
_formats = [
(180, '_low200.mp4'),
(360, '_low400.mp4'),
(720, '_low800.mp4'),
(1080, '.mp4')
]
formats = [{
'url': base_url + video_url[:-4] + extension,
'format_id': '%dp' % height,
'height': height,
} for height, extension in _formats]
# extract video metadata
title = remove_start(
self._html_search_meta('title', webpage, fatal=True), 'PressTV-')
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage)
upload_date = '%04d%02d%02d' % (
int(mobj.group('y')),
int(mobj.group('m')),
int(mobj.group('d')),
)
return {
'id': video_id,
'display_id': display_id,
'title': title,
'formats': formats,
'thumbnail': thumbnail,
'upload_date': upload_date,
'description': description
}

View File

@@ -31,6 +31,7 @@ class RestudyIE(InfoExtractor):
formats = self._extract_smil_formats(
'https://www.restudy.dk/awsmedia/SmilDirectory/video_%s.xml' % video_id,
video_id)
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -1,11 +1,11 @@
from __future__ import unicode_literals
from .videodetective import VideoDetectiveIE
from .common import InfoExtractor
from ..compat import compat_urlparse
from .internetvideoarchive import InternetVideoArchiveIE
# It just uses the same method as videodetective.com,
# the internetvideoarchive.com is extracted from the og:video property
class RottenTomatoesIE(VideoDetectiveIE):
class RottenTomatoesIE(InfoExtractor):
_VALID_URL = r'https?://www\.rottentomatoes\.com/m/[^/]+/trailers/(?P<id>\d+)'
_TEST = {
@@ -13,7 +13,19 @@ class RottenTomatoesIE(VideoDetectiveIE):
'info_dict': {
'id': '613340',
'ext': 'mp4',
'title': 'TOY STORY 3',
'description': 'From the creators of the beloved TOY STORY films, comes a story that will reunite the gang in a whole new way.',
'title': 'Toy Story 3',
},
}
def _real_extract(self, url):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
og_video = self._og_search_video_url(webpage)
query = compat_urlparse.urlparse(og_video).query
return {
'_type': 'url_transparent',
'url': InternetVideoArchiveIE._build_xml_url(query),
'ie_key': InternetVideoArchiveIE.ie_key(),
'title': self._og_search_title(webpage),
}

View File

@@ -39,9 +39,14 @@ class RteIE(InfoExtractor):
duration = float_or_none(self._html_search_meta(
'duration', webpage, 'duration', fatal=False), 1000)
thumbnail_id = self._search_regex(
r'<meta name="thumbnail" content="uri:irus:(.*?)" />', webpage, 'thumbnail')
thumbnail = 'http://img.rasset.ie/' + thumbnail_id + '.jpg'
thumbnail = None
thumbnail_meta = self._html_search_meta('thumbnail', webpage)
if thumbnail_meta:
thumbnail_id = self._search_regex(
r'uri:irus:(.+)', thumbnail_meta,
'thumbnail id', fatal=False)
if thumbnail_id:
thumbnail = 'http://img.rasset.ie/%s.jpg' % thumbnail_id
feeds_url = self._html_search_meta('feeds-prefix', webpage, 'feeds url') + video_id
json_string = self._download_json(feeds_url, video_id)
@@ -49,6 +54,7 @@ class RteIE(InfoExtractor):
# f4m_url = server + relative_url
f4m_url = json_string['shows'][0]['media:group'][0]['rte:server'] + json_string['shows'][0]['media:group'][0]['url']
f4m_formats = self._extract_f4m_formats(f4m_url, video_id)
self._sort_formats(f4m_formats)
return {
'id': video_id,

View File

@@ -209,6 +209,7 @@ class RTVELiveIE(InfoExtractor):
png = self._download_webpage(png_url, video_id, 'Downloading url information')
m3u8_url = _decrypt_url(png)
formats = self._extract_m3u8_formats(m3u8_url, video_id, ext='mp4')
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -38,6 +38,7 @@ class RTVNHIE(InfoExtractor):
item['file'], video_id, ext='mp4', entry_protocol='m3u8_native'))
elif item.get('type') == '':
formats.append({'url': item['file']})
self._sort_formats(formats)
return {
'id': video_id,

View File

@@ -24,6 +24,9 @@ class SBSIE(InfoExtractor):
'description': 'md5:f250a9856fca50d22dec0b5b8015f8a5',
'thumbnail': 're:http://.*\.jpg',
'duration': 308,
'timestamp': 1408613220,
'upload_date': '20140821',
'uploader': 'SBSC',
},
}, {
'url': 'http://www.sbs.com.au/ondemand/video/320403011771/Dingo-Conservation-The-Feed',
@@ -57,6 +60,7 @@ class SBSIE(InfoExtractor):
return {
'_type': 'url_transparent',
'ie_key': 'ThePlatform',
'id': video_id,
'url': smuggle_url(theplatform_url, {'force_smil_url': True}),
'url': smuggle_url(self._proto_relative_url(theplatform_url), {'force_smil_url': True}),
}

View File

@@ -12,7 +12,7 @@ from ..utils import (
class ScreencastIE(InfoExtractor):
_VALID_URL = r'https?://www\.screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
_VALID_URL = r'https?://(?:www\.)?screencast\.com/t/(?P<id>[a-zA-Z0-9]+)'
_TESTS = [{
'url': 'http://www.screencast.com/t/3ZEjQXlT',
'md5': '917df1c13798a3e96211dd1561fded83',
@@ -53,8 +53,10 @@ class ScreencastIE(InfoExtractor):
'description': 'md5:7b9f393bc92af02326a5c5889639eab0',
'thumbnail': 're:^https?://.*\.(?:gif|jpg)$',
}
},
]
}, {
'url': 'http://screencast.com/t/aAB3iowa',
'only_matching': True,
}]
def _real_extract(self, url):
video_id = self._match_id(url)
@@ -94,8 +96,9 @@ class ScreencastIE(InfoExtractor):
title = self._og_search_title(webpage, default=None)
if title is None:
title = self._html_search_regex(
[r'<b>Title:</b> ([^<]*)</div>',
r'class="tabSeperator">></span><span class="tabText">(.*?)<'],
[r'<b>Title:</b> ([^<]+)</div>',
r'class="tabSeperator">></span><span class="tabText">(.+?)<',
r'<title>([^<]+)</title>'],
webpage, 'title')
thumbnail = self._og_search_thumbnail(webpage)
description = self._og_search_description(webpage, default=None)

View File

@@ -1,15 +1,11 @@
# coding: utf-8
from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urlparse
from ..utils import (
ExtractorError,
js_to_json,
)
from .jwplatform import JWPlatformBaseIE
from ..utils import js_to_json
class ScreencastOMaticIE(InfoExtractor):
class ScreencastOMaticIE(JWPlatformBaseIE):
_VALID_URL = r'https?://screencast-o-matic\.com/watch/(?P<id>[0-9a-zA-Z]+)'
_TEST = {
'url': 'http://screencast-o-matic.com/watch/c2lD3BeOPl',
@@ -20,6 +16,7 @@ class ScreencastOMaticIE(InfoExtractor):
'title': 'Welcome to 3-4 Philosophy @ DECV!',
'thumbnail': 're:^https?://.*\.jpg$',
'description': 'as the title says! also: some general info re 1) VCE philosophy and 2) distance learning.',
'duration': 369.163,
}
}
@@ -27,23 +24,14 @@ class ScreencastOMaticIE(InfoExtractor):
video_id = self._match_id(url)
webpage = self._download_webpage(url, video_id)
setup_js = self._search_regex(
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);",
webpage, 'setup code')
data = self._parse_json(setup_js, video_id, transform_source=js_to_json)
try:
video_data = next(
m for m in data['modes'] if m.get('type') == 'html5')
except StopIteration:
raise ExtractorError('Could not find any video entries!')
video_url = compat_urlparse.urljoin(url, video_data['config']['file'])
thumbnail = data.get('image')
jwplayer_data = self._parse_json(
self._search_regex(
r"(?s)jwplayer\('mp4Player'\).setup\((\{.*?\})\);", webpage, 'setup code'),
video_id, transform_source=js_to_json)
return {
'id': video_id,
info_dict = self._parse_jwplayer_data(jwplayer_data, video_id, require_title=False)
info_dict.update({
'title': self._og_search_title(webpage),
'description': self._og_search_description(webpage),
'url': video_url,
'ext': 'mp4',
'thumbnail': thumbnail,
}
})
return info_dict

View File

@@ -77,6 +77,7 @@ class ShahidIE(InfoExtractor):
raise ExtractorError('This video is DRM protected.', expected=True)
formats = self._extract_m3u8_formats(player['url'], video_id, 'mp4')
self._sort_formats(formats)
video = self._download_json(
'%s/%s/%s?%s' % (

View File

@@ -99,6 +99,7 @@ class SportBoxEmbedIE(InfoExtractor):
webpage, 'hls file')
formats = self._extract_m3u8_formats(hls, video_id, 'mp4')
self._sort_formats(formats)
title = self._search_regex(
r'sportboxPlayer\.node_title\s*=\s*"([^"]+)"', webpage, 'title')

View File

@@ -1,11 +1,13 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
class TeleBruxellesIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?telebruxelles\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
_VALID_URL = r'https?://(?:www\.)?(?:telebruxelles|bx1)\.be/(news|sport|dernier-jt)/?(?P<id>[^/#?]+)'
_TESTS = [{
'url': 'http://www.telebruxelles.be/news/auditions-devant-parlement-francken-galant-tres-attendus/',
'md5': '59439e568c9ee42fb77588b2096b214f',
@@ -39,18 +41,18 @@ class TeleBruxellesIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
article_id = self._html_search_regex(
r"<article id=\"post-(\d+)\"", webpage, 'article ID')
r"<article id=\"post-(\d+)\"", webpage, 'article ID', default=None)
title = self._html_search_regex(
r'<h1 class=\"entry-title\">(.*?)</h1>', webpage, 'title')
description = self._og_search_description(webpage)
description = self._og_search_description(webpage, default=None)
rtmp_url = self._html_search_regex(
r"file: \"(rtmp://\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}:\d{1,5}/vod/mp4:\" \+ \"\w+\" \+ \".mp4)\"",
r'file\s*:\s*"(rtmp://[^/]+/vod/mp4:"\s*\+\s*"[^"]+"\s*\+\s*".mp4)"',
webpage, 'RTMP url')
rtmp_url = rtmp_url.replace("\" + \"", "")
rtmp_url = re.sub(r'"\s*\+\s*"', '', rtmp_url)
return {
'id': article_id,
'id': article_id or display_id,
'display_id': display_id,
'title': title,
'description': description,

View File

@@ -82,6 +82,7 @@ class TelecincoIE(InfoExtractor):
)
formats = self._extract_m3u8_formats(
token_info['tokenizedUrl'], episode, ext='mp4', entry_protocol='m3u8_native')
self._sort_formats(formats)
return {
'id': embed_data['videoId'],

Some files were not shown because too many files have changed in this diff Show More