1
0
mirror of https://github.com/l1ving/youtube-dl synced 2020-11-18 19:53:54 -08:00

Compare commits

..

No commits in common. "master" and "2020.06.16.1" have entirely different histories.

43 changed files with 593 additions and 1157 deletions

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.11.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -26,7 +26,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support
- [ ] I've verified that I'm running youtube-dl version **2020.11.10**
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar issues including closed ones
@ -41,7 +41,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.11.10
[debug] youtube-dl version 2020.06.16.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,7 +19,7 @@ labels: 'site-support-request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.11.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that site you are requesting is not dedicated to copyright infringement, see https://yt-dl.org/copyright-infringement. youtube-dl does not support such sites. In order for site support request to be accepted all provided example URLs should not violate any copyrights.
- Search the bugtracker for similar site support requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a new site support request
- [ ] I've verified that I'm running youtube-dl version **2020.11.10**
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that none of provided URLs violate any copyrights
- [ ] I've searched the bugtracker for similar site support requests including closed ones

View File

@ -18,13 +18,13 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.11.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar site feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a site feature request
- [ ] I've verified that I'm running youtube-dl version **2020.11.10**
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1**
- [ ] I've searched the bugtracker for similar site feature requests including closed ones

View File

@ -18,7 +18,7 @@ title: ''
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.11.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Make sure that all provided video/audio/playlist URLs (if any) are alive and playable in a browser.
- Make sure that all URLs and arguments with special characters are properly quoted or escaped as explained in http://yt-dl.org/escape.
- Search the bugtracker for similar issues: http://yt-dl.org/search-issues. DO NOT post duplicates.
@ -27,7 +27,7 @@ Carefully read and work through this check list in order to prevent the most com
-->
- [ ] I'm reporting a broken site support issue
- [ ] I've verified that I'm running youtube-dl version **2020.11.10**
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1**
- [ ] I've checked that all provided URLs are alive and playable in a browser
- [ ] I've checked that all URLs and arguments with special characters are properly quoted or escaped
- [ ] I've searched the bugtracker for similar bug reports including closed ones
@ -43,7 +43,7 @@ Add the `-v` flag to your command line you run youtube-dl with (`youtube-dl -v <
[debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2020.11.10
[debug] youtube-dl version 2020.06.16.1
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {}

View File

@ -19,13 +19,13 @@ labels: 'request'
<!--
Carefully read and work through this check list in order to prevent the most common mistakes and misuse of youtube-dl:
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.11.10. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- First of, make sure you are using the latest version of youtube-dl. Run `youtube-dl --version` and ensure your version is 2020.06.16.1. If it's not, see https://yt-dl.org/update on how to update. Issues with outdated version will be REJECTED.
- Search the bugtracker for similar feature requests: http://yt-dl.org/search-issues. DO NOT post duplicates.
- Finally, put x into all relevant boxes (like this [x])
-->
- [ ] I'm reporting a feature request
- [ ] I've verified that I'm running youtube-dl version **2020.11.10**
- [ ] I've verified that I'm running youtube-dl version **2020.06.16.1**
- [ ] I've searched the bugtracker for similar feature requests including closed ones

109
ChangeLog
View File

@ -1,112 +1,3 @@
version 2020.11.10
Extractors
* [youtube] Fix YoutubePlaylistsIE (#32)
* [xiami] Raise expressive error thrown by vendor
version 2020.11.01
Core
* [utils] Don't attempt to coerce JS strings to numbers in js_to_json (#26851)
* [downloader/http] Properly handle missing message in SSLError (#26646)
* [downloader/http] Fix access to not yet opened stream in retry
Extractors
* [youtube] Fix JS player URL extraction
* [ytsearch] Fix extraction (#26920)
* [afreecatv] Fix typo (#26970)
* [23video] Relax URL regular expression (#26870)
+ [ustream] Add support for video.ibm.com (#26894)
* [iqiyi] Fix typo (#26884)
+ [expressen] Add support for di.se (#26670)
* [iprima] Improve video id extraction (#26507, #26494)
version 2020.10.31
Core
+ [release] Fix release tarball not being properly generated
* [readme] Fix download link for Unix users
Extractors
* [youtube] Re-add age restriction test (#14)
version 2020.10.30
Extractors
* [youtube] Fix "Unable to extract JS player URL" (#12)
version 2020.09.20
Core
* [extractor/common] Relax interaction count extraction in _json_ld
+ [extractor/common] Extract author as uploader for VideoObject in _json_ld
* [downloader/hls] Fix incorrect end byte in Range HTTP header for
media segments with EXT-X-BYTERANGE (#14748, #24512)
* [extractor/common] Handle ssl.CertificateError in _request_webpage (#26601)
* [downloader/http] Improve timeout detection when reading block of data
(#10935)
* [downloader/http] Retry download when urlopen times out (#10935, #26603)
Extractors
* [redtube] Extend URL regular expression (#26506)
* [twitch] Refactor
* [twitch:stream] Switch to GraphQL and fix reruns (#26535)
+ [telequebec] Add support for brightcove videos (#25833)
* [pornhub] Extract metadata from JSON-LD (#26614)
* [pornhub] Fix view count extraction (#26621, #26614)
version 2020.09.14
Core
+ [postprocessor/embedthumbnail] Add support for non jpg/png thumbnails
(#25687, #25717)
Extractors
* [rtlnl] Extend URL regular expression (#26549, #25821)
* [youtube] Fix empty description extraction (#26575, #26006)
* [srgssr] Extend URL regular expression (#26555, #26556, #26578)
* [googledrive] Use redirect URLs for source format (#18877, #23919, #24689,
#26565)
* [svtplay] Fix id extraction (#26576)
* [redbulltv] Improve support for rebull.com TV localized URLs (#22063)
+ [redbulltv] Add support for new redbull.com TV URLs (#22037, #22063)
* [soundcloud:pagedplaylist] Reduce pagination limit (#26557)
version 2020.09.06
Core
+ [utils] Recognize wav mimetype (#26463)
Extractors
* [nrktv:episode] Improve video id extraction (#25594, #26369, #26409)
* [youtube] Fix age gate content detection (#26100, #26152, #26311, #26384)
* [youtube:user] Extend URL regular expression (#26443)
* [xhamster] Improve initials regular expression (#26526, #26353)
* [svtplay] Fix video id extraction (#26425, #26428, #26438)
* [twitch] Rework extractors (#12297, #20414, #20604, #21811, #21812, #22979,
#24263, #25010, #25553, #25606)
* Switch to GraphQL
+ Add support for collections
+ Add support for clips and collections playlists
* [biqle] Improve video ext extraction
* [xhamster] Fix extraction (#26157, #26254)
* [xhamster] Extend URL regular expression (#25789, #25804, #25927))
version 2020.07.28
Extractors
* [youtube] Fix sigfunc name extraction (#26134, #26135, #26136, #26137)
* [youtube] Improve description extraction (#25937, #25980)
* [wistia] Restrict embed regular expression (#25969)
* [youtube] Prevent excess HTTP 301 (#25786)
+ [youtube:playlists] Extend URL regular expression (#25810)
+ [bellmedia] Add support for cp24.com clip URLs (#25764)
* [brightcove] Improve embed detection (#25674)
version 2020.06.16.1
Extractors

10
LICENSE
View File

@ -1,13 +1,3 @@
The authors of this software recognize that most technologies can be
used for lawful and unlawful purposes. The authors request that this
software not be used to violate any local laws. The authors recognize
that this is public domain code, that they have no real power
to dictate how it is used other than a polite non-binding request,
and therefore that they cannot be held responsible for what anyone
chooses to do with it.
---
This is free and unencumbered software released into the public domain.
Anyone is free to copy, modify, publish, use, compile, sell, or

View File

@ -117,9 +117,8 @@ _EXTRACTOR_FILES = $(shell find youtube_dl/extractor -iname '*.py' -and -not -in
youtube_dl/extractor/lazy_extractors.py: devscripts/make_lazy_extractors.py devscripts/lazy_load_template.py $(_EXTRACTOR_FILES)
$(PYTHON) devscripts/make_lazy_extractors.py $@
VERSION = $(shell grep version youtube_dl/version.py |cut -f2 -d "'")
youtube-dl.tar.gz: README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog AUTHORS
@tar -czf youtube-dl-$(VERSION).tar.gz --transform "s|^|youtube-dl-$(VERSION)/|" --owner 0 --group 0 \
youtube-dl.tar.gz: youtube-dl README.md README.txt youtube-dl.1 youtube-dl.bash-completion youtube-dl.zsh youtube-dl.fish ChangeLog AUTHORS
@tar -czf youtube-dl.tar.gz --transform "s|^|youtube-dl/|" --owner 0 --group 0 \
--exclude '*.DS_Store' \
--exclude '*.kate-swp' \
--exclude '*.pyc' \
@ -133,3 +132,4 @@ youtube-dl.tar.gz: README.md README.txt youtube-dl.1 youtube-dl.bash-completion
ChangeLog AUTHORS LICENSE README.md README.txt \
Makefile MANIFEST.in youtube-dl.1 youtube-dl.bash-completion \
youtube-dl.zsh youtube-dl.fish setup.py setup.cfg \
youtube-dl

View File

@ -1,8 +1,7 @@
[![Build Status](https://travis-ci.com/l1ving/youtube-dl.svg?branch=master)](https://travis-ci.com/l1ving/youtube-dl)
[![Build Status](https://travis-ci.org/ytdl-org/youtube-dl.svg?branch=master)](https://travis-ci.org/ytdl-org/youtube-dl)
youtube-dl - download videos from youtube.com or other video platforms
- [CHANGES](#changes)
- [INSTALLATION](#installation)
- [DESCRIPTION](#description)
- [OPTIONS](#options)
@ -16,24 +15,16 @@ youtube-dl - download videos from youtube.com or other video platforms
- [BUGS](#bugs)
- [COPYRIGHT](#copyright)
# CHANGES
You can view the changes made to ytdl-org/youtube-dl [here](https://github.com/l1ving/youtube-dl/compare/416da574ec0df3388f652e44f7fe71b1e3a4701f...master)
You can view the archived tags here: [youtube-dl/releases](https://github.com/l1ving/youtube-dl/releases)
You can view the archived unmerged pull requests here: [youtube-dl/tree/archive/recovered-github-prs](https://github.com/l1ving/youtube-dl/tree/archive/recovered-github-prs)
# INSTALLATION
To install it right away for all UNIX users (Linux, macOS, etc.), type:
sudo curl -L https://github.com/l1ving/youtube-dl/releases/latest/download/youtube-dl -o /usr/local/bin/youtube-dl
sudo curl -L https://yt-dl.org/downloads/latest/youtube-dl -o /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
If you do not have curl, you can alternatively use a recent wget:
sudo wget https://github.com/l1ving/youtube-dl/releases/latest/download/youtube-dl -O /usr/local/bin/youtube-dl
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
Windows users can [download an .exe file](https://yt-dl.org/latest/youtube-dl.exe) and place it in any location on their [PATH](https://en.wikipedia.org/wiki/PATH_%28variable%29) except for `%SYSTEMROOT%\System32` (e.g. **do not** put in `C:\Windows\System32`).
@ -554,7 +545,7 @@ The basic usage is not to set any template arguments when downloading a single f
- `extractor` (string): Name of the extractor
- `extractor_key` (string): Key name of the extractor
- `epoch` (numeric): Unix epoch when creating the file
- `autonumber` (numeric): Number that will be increased with each download, starting at `--autonumber-start`
- `autonumber` (numeric): Five-digit number that will be increased with each download, starting at zero
- `playlist` (string): Name or id of the playlist that contains the video
- `playlist_index` (numeric): Index of the video in the playlist padded with leading zeros according to the total length of the playlist
- `playlist_id` (string): Playlist identifier
@ -761,7 +752,7 @@ As a last resort, you can also uninstall the version installed by your package m
Afterwards, simply follow [our manual installation instructions](https://ytdl-org.github.io/youtube-dl/download.html):
```
sudo wget https://github.com/l1ving/youtube-dl/releases/latest/download/youtube-dl -O /usr/local/bin/youtube-dl
sudo wget https://yt-dl.org/downloads/latest/youtube-dl -O /usr/local/bin/youtube-dl
sudo chmod a+rx /usr/local/bin/youtube-dl
hash -r
```

View File

@ -717,8 +717,6 @@
- **RayWenderlichCourse**
- **RBMARadio**
- **RDS**: RDS.ca
- **RedBull**
- **RedBullEmbed**
- **RedBullTV**
- **RedBullTVRrnContent**
- **Reddit**
@ -952,13 +950,16 @@
- **TVPlayHome**
- **Tweakers**
- **TwitCasting**
- **twitch:chapter**
- **twitch:clips**
- **twitch:profile**
- **twitch:stream**
- **twitch:video**
- **twitch:videos:all**
- **twitch:videos:highlights**
- **twitch:videos:past-broadcasts**
- **twitch:videos:uploads**
- **twitch:vod**
- **TwitchCollection**
- **TwitchVideos**
- **TwitchVideosClips**
- **TwitchVideosCollections**
- **twitter**
- **twitter:amplify**
- **twitter:broadcast**

View File

@ -33,6 +33,7 @@ class TestAllURLsMatching(unittest.TestCase):
assertPlaylist = lambda url: self.assertMatch(url, ['youtube:playlist'])
assertPlaylist('ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist('UUBABnxM4Ar9ten8Mdjj1j0Q') # 585
assertPlaylist('PL63F0C78739B09958')
assertPlaylist('https://www.youtube.com/playlist?list=UUBABnxM4Ar9ten8Mdjj1j0Q')
assertPlaylist('https://www.youtube.com/course?list=ECUl4u3cNGP61MdtwGTqZA0MreSaDybji8')
assertPlaylist('https://www.youtube.com/playlist?list=PLwP_SiAcdui0KVebT0mU9Apz359a4ubsC')

View File

@ -803,8 +803,6 @@ class TestUtil(unittest.TestCase):
self.assertEqual(mimetype2ext('text/vtt'), 'vtt')
self.assertEqual(mimetype2ext('text/vtt;charset=utf-8'), 'vtt')
self.assertEqual(mimetype2ext('text/html; charset=utf-8'), 'html')
self.assertEqual(mimetype2ext('audio/x-wav'), 'wav')
self.assertEqual(mimetype2ext('audio/x-wav;codec=pcm'), 'wav')
def test_month_by_name(self):
self.assertEqual(month_by_name(None), None)
@ -994,12 +992,6 @@ class TestUtil(unittest.TestCase):
on = js_to_json('{42:4.2e1}')
self.assertEqual(json.loads(on), {'42': 42.0})
on = js_to_json('{ "0x40": "0x40" }')
self.assertEqual(json.loads(on), {'0x40': '0x40'})
on = js_to_json('{ "040": "040" }')
self.assertEqual(json.loads(on), {'040': '040'})
def test_js_to_json_malformed(self):
self.assertEqual(js_to_json('42a1'), '42"a1"')
self.assertEqual(js_to_json('42a-1'), '42"a"-1')

View File

@ -141,7 +141,7 @@ class HlsFD(FragmentFD):
count = 0
headers = info_dict.get('http_headers', {})
if byte_range:
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'] - 1)
headers['Range'] = 'bytes=%d-%d' % (byte_range['start'], byte_range['end'])
while count <= fragment_retries:
try:
success, frag_content = self._download_fragment(

View File

@ -106,12 +106,7 @@ class HttpFD(FileDownloader):
set_range(request, range_start, range_end)
# Establish connection
try:
try:
ctx.data = self.ydl.urlopen(request)
except (compat_urllib_error.URLError, ) as err:
if isinstance(err.reason, socket.timeout):
raise RetryDownload(err)
raise err
ctx.data = self.ydl.urlopen(request)
# When trying to resume, Content-Range HTTP header of response has to be checked
# to match the value of requested Range HTTP header. This is due to a webservers
# that don't support resuming and serve a whole file with no Content-Range
@ -223,10 +218,9 @@ class HttpFD(FileDownloader):
def retry(e):
to_stdout = ctx.tmpfilename == '-'
if ctx.stream is not None:
if not to_stdout:
ctx.stream.close()
ctx.stream = None
if not to_stdout:
ctx.stream.close()
ctx.stream = None
ctx.resume_len = byte_counter if to_stdout else os.path.getsize(encodeFilename(ctx.tmpfilename))
raise RetryDownload(e)
@ -239,11 +233,9 @@ class HttpFD(FileDownloader):
except socket.timeout as e:
retry(e)
except socket.error as e:
# SSLError on python 2 (inherits socket.error) may have
# no errno set but this error message
if e.errno in (errno.ECONNRESET, errno.ETIMEDOUT) or getattr(e, 'message', None) == 'The read operation timed out':
retry(e)
raise
if e.errno not in (errno.ECONNRESET, errno.ETIMEDOUT):
raise
retry(e)
byte_counter += len(data_block)

View File

@ -275,7 +275,7 @@ class AfreecaTVIE(InfoExtractor):
video_element = video_xml.findall(compat_xpath('./track/video'))[-1]
if video_element is None or video_element.text is None:
raise ExtractorError(
'Video %s does not exist' % video_id, expected=True)
'Video %s video does not exist' % video_id, expected=True)
video_url = video_element.text.strip()

View File

@ -25,8 +25,8 @@ class BellMediaIE(InfoExtractor):
etalk|
marilyn
)\.ca|
(?:much|cp24)\.com
)/.*?(?:\b(?:vid(?:eoid)?|clipId)=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
much\.com
)/.*?(?:\bvid(?:eoid)?=|-vid|~|%7E|/(?:episode)?)(?P<id>[0-9]{6,})'''
_TESTS = [{
'url': 'https://www.bnnbloomberg.ca/video/david-cockfield-s-top-picks~1403070',
'md5': '36d3ef559cfe8af8efe15922cd3ce950',
@ -62,9 +62,6 @@ class BellMediaIE(InfoExtractor):
}, {
'url': 'http://www.etalk.ca/video?videoid=663455',
'only_matching': True,
}, {
'url': 'https://www.cp24.com/video?clipId=1982548',
'only_matching': True,
}]
_DOMAINS = {
'thecomedynetwork': 'comedy',

View File

@ -3,11 +3,10 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from .vk import VKIE
from ..compat import (
compat_b64decode,
compat_urllib_parse_unquote,
from ..utils import (
HEADRequest,
int_or_none,
)
from ..utils import int_or_none
class BIQLEIE(InfoExtractor):
@ -48,16 +47,9 @@ class BIQLEIE(InfoExtractor):
if VKIE.suitable(embed_url):
return self.url_result(embed_url, VKIE.ie_key(), video_id)
embed_page = self._download_webpage(
embed_url, video_id, headers={'Referer': url})
video_ext = self._get_cookies(embed_url).get('video_ext')
if video_ext:
video_ext = compat_urllib_parse_unquote(video_ext.value)
if not video_ext:
video_ext = compat_b64decode(self._search_regex(
r'video_ext\s*:\s*[\'"]([A-Za-z0-9+/=]+)',
embed_page, 'video_ext')).decode()
video_id, sig, _, access_token = video_ext.split(':')
self._request_webpage(
HEADRequest(embed_url), video_id, headers={'Referer': url})
video_id, sig, _, access_token = self._get_cookies(embed_url)['video_ext'].value.split('%3A')
item = self._download_json(
'https://api.vk.com/method/video.get', video_id,
headers={'User-Agent': 'okhttp/3.4.1'}, query={

View File

@ -426,7 +426,7 @@ class BrightcoveNewIE(AdobePassIE):
# [2] looks like:
for video, script_tag, account_id, player_id, embed in re.findall(
r'''(?isx)
(<video(?:-js)?\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(<video\s+[^>]*\bdata-video-id\s*=\s*['"]?[^>]+>)
(?:.*?
(<script[^>]+
src=["\'](?:https?:)?//players\.brightcove\.net/

View File

@ -10,7 +10,6 @@ import os
import random
import re
import socket
import ssl
import sys
import time
import math
@ -68,7 +67,6 @@ from ..utils import (
sanitized_Request,
sanitize_filename,
str_or_none,
str_to_int,
strip_or_none,
unescapeHTML,
unified_strdate,
@ -625,12 +623,9 @@ class InfoExtractor(object):
url_or_request = update_url_query(url_or_request, query)
if data is not None or headers:
url_or_request = sanitized_Request(url_or_request, data, headers)
exceptions = [compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error]
if hasattr(ssl, 'CertificateError'):
exceptions.append(ssl.CertificateError)
try:
return self._downloader.urlopen(url_or_request)
except tuple(exceptions) as err:
except (compat_urllib_error.URLError, compat_http_client.HTTPException, socket.error) as err:
if isinstance(err, compat_urllib_error.HTTPError):
if self.__can_accept_status_code(err, expected_status):
# Retain reference to error to prevent file object from
@ -1249,10 +1244,7 @@ class InfoExtractor(object):
interaction_type = is_e.get('interactionType')
if not isinstance(interaction_type, compat_str):
continue
# For interaction count some sites provide string instead of
# an integer (as per spec) with non digit characters (e.g. ",")
# so extracting count with more relaxed str_to_int
interaction_count = str_to_int(is_e.get('userInteractionCount'))
interaction_count = int_or_none(is_e.get('userInteractionCount'))
if interaction_count is None:
continue
count_kind = INTERACTION_TYPE_MAP.get(interaction_type.split('/')[-1])
@ -1272,7 +1264,6 @@ class InfoExtractor(object):
'thumbnail': url_or_none(e.get('thumbnailUrl') or e.get('thumbnailURL')),
'duration': parse_duration(e.get('duration')),
'timestamp': unified_timestamp(e.get('uploadDate')),
'uploader': str_or_none(e.get('author')),
'filesize': float_or_none(e.get('contentSize')),
'tbr': int_or_none(e.get('bitrate')),
'width': int_or_none(e.get('width')),

View File

@ -15,7 +15,7 @@ from ..utils import (
class ExpressenIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://
(?:www\.)?(?:expressen|di)\.se/
(?:www\.)?expressen\.se/
(?:(?:tvspelare/video|videoplayer/embed)/)?
tv/(?:[^/]+/)*
(?P<id>[^/?#&]+)
@ -42,16 +42,13 @@ class ExpressenIE(InfoExtractor):
}, {
'url': 'https://www.expressen.se/videoplayer/embed/tv/ditv/ekonomistudion/experterna-har-ar-fragorna-som-avgor-valet/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}, {
'url': 'https://www.di.se/videoplayer/embed/tv/ditv/borsmorgon/implantica-rusar-70--under-borspremiaren-hor-styrelsemedlemmen/?embed=true&external=true&autoplay=true&startVolume=0&partnerId=di',
'only_matching': True,
}]
@staticmethod
def _extract_urls(webpage):
return [
mobj.group('url') for mobj in re.finditer(
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?(?:expressen|di)\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
r'<iframe[^>]+\bsrc=(["\'])(?P<url>(?:https?:)?//(?:www\.)?expressen\.se/(?:tvspelare/video|videoplayer/embed)/tv/.+?)\1',
webpage)]
def _real_extract(self, url):

View File

@ -918,9 +918,7 @@ from .rbmaradio import RBMARadioIE
from .rds import RDSIE
from .redbulltv import (
RedBullTVIE,
RedBullEmbedIE,
RedBullTVRrnContentIE,
RedBullIE,
)
from .reddit import (
RedditIE,
@ -1231,11 +1229,14 @@ from .twentymin import TwentyMinutenIE
from .twentythreevideo import TwentyThreeVideoIE
from .twitcasting import TwitCastingIE
from .twitch import (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchCollectionIE,
TwitchVideosIE,
TwitchVideosClipsIE,
TwitchVideosCollectionsIE,
TwitchProfileIE,
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
TwitchHighlightsIE,
TwitchStreamIE,
TwitchClipsIE,
)

View File

@ -220,27 +220,19 @@ class GoogleDriveIE(InfoExtractor):
'id': video_id,
'export': 'download',
})
def request_source_file(source_url, kind):
return self._request_webpage(
source_url, video_id, note='Requesting %s file' % kind,
errnote='Unable to request %s file' % kind, fatal=False)
urlh = request_source_file(source_url, 'source')
urlh = self._request_webpage(
source_url, video_id, note='Requesting source file',
errnote='Unable to request source file', fatal=False)
if urlh:
def add_source_format(urlh):
def add_source_format(src_url):
formats.append({
# Use redirect URLs as download URLs in order to calculate
# correct cookies in _calc_cookies.
# Using original URLs may result in redirect loop due to
# google.com's cookies mistakenly used for googleusercontent.com
# redirect URLs (see #23919).
'url': urlh.geturl(),
'url': src_url,
'ext': determine_ext(title, 'mp4').lower(),
'format_id': 'source',
'quality': 1,
})
if urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
add_source_format(source_url)
else:
confirmation_webpage = self._webpage_read_content(
urlh, url, video_id, note='Downloading confirmation page',
@ -250,12 +242,9 @@ class GoogleDriveIE(InfoExtractor):
r'confirm=([^&"\']+)', confirmation_webpage,
'confirmation code', fatal=False)
if confirm:
confirmed_source_url = update_url_query(source_url, {
add_source_format(update_url_query(source_url, {
'confirm': confirm,
})
urlh = request_source_file(confirmed_source_url, 'confirmed source')
if urlh and urlh.headers.get('Content-Disposition'):
add_source_format(urlh)
}))
if not formats:
reason = self._search_regex(

View File

@ -86,8 +86,7 @@ class IPrimaIE(InfoExtractor):
(r'<iframe[^>]+\bsrc=["\'](?:https?:)?//(?:api\.play-backend\.iprima\.cz/prehravac/embedded|prima\.iprima\.cz/[^/]+/[^/]+)\?.*?\bid=(p\d+)',
r'data-product="([^"]+)">',
r'id=["\']player-(p\d+)"',
r'playerId\s*:\s*["\']player-(p\d+)',
r'\bvideos\s*=\s*["\'](p\d+)'),
r'playerId\s*:\s*["\']player-(p\d+)'),
webpage, 'real id')
playerpage = self._download_webpage(

View File

@ -150,7 +150,7 @@ class IqiyiSDKInterpreter(object):
elif function in other_functions:
other_functions[function]()
else:
raise ExtractorError('Unknown function %s' % function)
raise ExtractorError('Unknown funcion %s' % function)
return sdk.target

View File

@ -11,6 +11,7 @@ from ..compat import (
from ..utils import (
ExtractorError,
int_or_none,
JSON_LD_RE,
js_to_json,
NO_DEFAULT,
parse_age_limit,
@ -424,20 +425,13 @@ class NRKTVEpisodeIE(InfoExtractor):
webpage = self._download_webpage(url, display_id)
info = self._search_json_ld(webpage, display_id, default={})
nrk_id = info.get('@id') or self._html_search_meta(
'nrk:program-id', webpage, default=None) or self._search_regex(
r'data-program-id=["\'](%s)' % NRKTVIE._EPISODE_RE, webpage,
'nrk id')
assert re.match(NRKTVIE._EPISODE_RE, nrk_id)
nrk_id = self._parse_json(
self._search_regex(JSON_LD_RE, webpage, 'JSON-LD', group='json_ld'),
display_id)['@id']
info.update({
'_type': 'url_transparent',
'id': nrk_id,
'url': 'nrk:%s' % nrk_id,
'ie_key': NRKIE.ie_key(),
})
return info
assert re.match(NRKTVIE._EPISODE_RE, nrk_id)
return self.url_result(
'nrk:%s' % nrk_id, ie=NRKIE.ie_key(), video_id=nrk_id)
class NRKTVSerieBaseIE(InfoExtractor):

View File

@ -17,7 +17,6 @@ from ..utils import (
determine_ext,
ExtractorError,
int_or_none,
merge_dicts,
NO_DEFAULT,
orderedSet,
remove_quotes,
@ -60,14 +59,13 @@ class PornHubIE(PornHubBaseIE):
'''
_TESTS = [{
'url': 'http://www.pornhub.com/view_video.php?viewkey=648719015',
'md5': 'a6391306d050e4547f62b3f485dd9ba9',
'md5': '1e19b41231a02eba417839222ac9d58e',
'info_dict': {
'id': '648719015',
'ext': 'mp4',
'title': 'Seductive Indian beauty strips down and fingers her pink pussy',
'uploader': 'Babes',
'upload_date': '20130628',
'timestamp': 1372447216,
'duration': 361,
'view_count': int,
'like_count': int,
@ -84,8 +82,8 @@ class PornHubIE(PornHubBaseIE):
'id': '1331683002',
'ext': 'mp4',
'title': '重庆婷婷女王足交',
'uploader': 'Unknown',
'upload_date': '20150213',
'timestamp': 1423804862,
'duration': 1753,
'view_count': int,
'like_count': int,
@ -123,7 +121,6 @@ class PornHubIE(PornHubBaseIE):
'params': {
'skip_download': True,
},
'skip': 'This video has been disabled',
}, {
'url': 'http://www.pornhub.com/view_video.php?viewkey=ph557bbb6676d2d',
'only_matching': True,
@ -341,10 +338,10 @@ class PornHubIE(PornHubBaseIE):
video_uploader = self._html_search_regex(
r'(?s)From:&nbsp;.+?<(?:a\b[^>]+\bhref=["\']/(?:(?:user|channel)s|model|pornstar)/|span\b[^>]+\bclass=["\']username)[^>]+>(.+?)<',
webpage, 'uploader', default=None)
webpage, 'uploader', fatal=False)
view_count = self._extract_count(
r'<span class="count">([\d,\.]+)</span> [Vv]iews', webpage, 'view')
r'<span class="count">([\d,\.]+)</span> views', webpage, 'view')
like_count = self._extract_count(
r'<span class="votesUp">([\d,\.]+)</span>', webpage, 'like')
dislike_count = self._extract_count(
@ -359,11 +356,7 @@ class PornHubIE(PornHubBaseIE):
if div:
return re.findall(r'<a[^>]+\bhref=[^>]+>([^<]+)', div)
info = self._search_json_ld(webpage, video_id, default={})
# description provided in JSON-LD is irrelevant
info['description'] = None
return merge_dicts({
return {
'id': video_id,
'uploader': video_uploader,
'upload_date': upload_date,
@ -379,7 +372,7 @@ class PornHubIE(PornHubBaseIE):
'tags': extract_list('tags'),
'categories': extract_list('categories'),
'subtitles': subtitles,
}, info)
}
class PornHubPlaylistBaseIE(PornHubBaseIE):

View File

@ -1,8 +1,6 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..compat import compat_HTTPError
from ..utils import (
@ -12,7 +10,7 @@ from ..utils import (
class RedBullTVIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)(?:/events/[^/]+)?/(?:videos?|live|(?:film|episode)s)/(?P<id>AP-\w+)'
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)(?:/events/[^/]+)?/(?:videos?|live)/(?P<id>AP-\w+)'
_TESTS = [{
# film
'url': 'https://www.redbull.tv/video/AP-1Q6XCDTAN1W11',
@ -31,8 +29,8 @@ class RedBullTVIE(InfoExtractor):
'id': 'AP-1PMHKJFCW1W11',
'ext': 'mp4',
'title': 'Grime - Hashtags S2E4',
'description': 'md5:5546aa612958c08a98faaad4abce484d',
'duration': 904,
'description': 'md5:b5f522b89b72e1e23216e5018810bb25',
'duration': 904.6,
},
'params': {
'skip_download': True,
@ -46,15 +44,11 @@ class RedBullTVIE(InfoExtractor):
}, {
'url': 'https://www.redbull.com/us-en/events/AP-1XV2K61Q51W11/live/AP-1XUJ86FDH1W11',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/films/AP-1ZSMAW8FH2111',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/episodes/AP-1TQWK7XE11W11',
'only_matching': True,
}]
def extract_info(self, video_id):
def _real_extract(self, url):
video_id = self._match_id(url)
session = self._download_json(
'https://api.redbull.tv/v3/session', video_id,
note='Downloading access token', query={
@ -111,119 +105,24 @@ class RedBullTVIE(InfoExtractor):
'subtitles': subtitles,
}
def _real_extract(self, url):
video_id = self._match_id(url)
return self.extract_info(video_id)
class RedBullEmbedIE(RedBullTVIE):
_VALID_URL = r'https?://(?:www\.)?redbull\.com/embed/(?P<id>rrn:content:[^:]+:[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12}:[a-z]{2}-[A-Z]{2,3})'
_TESTS = [{
# HLS manifest accessible only using assetId
'url': 'https://www.redbull.com/embed/rrn:content:episode-videos:f3021f4f-3ed4-51ac-915a-11987126e405:en-INT',
'only_matching': True,
}]
_VIDEO_ESSENSE_TMPL = '''... on %s {
videoEssence {
attributes
}
}'''
def _real_extract(self, url):
rrn_id = self._match_id(url)
asset_id = self._download_json(
'https://edge-graphql.crepo-production.redbullaws.com/v1/graphql',
rrn_id, headers={'API-KEY': 'e90a1ff11335423998b100c929ecc866'},
query={
'query': '''{
resource(id: "%s", enforceGeoBlocking: false) {
%s
%s
}
}''' % (rrn_id, self._VIDEO_ESSENSE_TMPL % 'LiveVideo', self._VIDEO_ESSENSE_TMPL % 'VideoResource'),
})['data']['resource']['videoEssence']['attributes']['assetId']
return self.extract_info(asset_id)
class RedBullTVRrnContentIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull\.com/(?P<region>[a-z]{2,3})-(?P<lang>[a-z]{2})/tv/(?:video|live|film)/(?P<id>rrn:content:[^:]+:[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_VALID_URL = r'https?://(?:www\.)?redbull(?:\.tv|\.com(?:/[^/]+)?(?:/tv)?)/(?:video|live)/rrn:content:[^:]+:(?P<id>[\da-f]{8}-[\da-f]{4}-[\da-f]{4}-[\da-f]{4}-[\da-f]{12})'
_TESTS = [{
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:live-videos:e3e6feb4-e95f-50b7-962a-c70f8fd13c73/mens-dh-finals-fort-william',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/tv/video/rrn:content:videos:a36a0f36-ff1b-5db8-a69d-ee11a14bf48b/tn-ts-style?playlist=rrn:content:event-profiles:83f05926-5de8-5389-b5e4-9bb312d715e8:extras',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/tv/film/rrn:content:films:d1f4d00e-4c04-5d19-b510-a805ffa2ab83/follow-me',
'only_matching': True,
}]
def _real_extract(self, url):
region, lang, rrn_id = re.search(self._VALID_URL, url).groups()
rrn_id += ':%s-%s' % (lang, region.upper())
return self.url_result(
'https://www.redbull.com/embed/' + rrn_id,
RedBullEmbedIE.ie_key(), rrn_id)
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
class RedBullIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?redbull\.com/(?P<region>[a-z]{2,3})-(?P<lang>[a-z]{2})/(?P<type>(?:episode|film|(?:(?:recap|trailer)-)?video)s|live)/(?!AP-|rrn:content:)(?P<id>[^/?#&]+)'
_TESTS = [{
'url': 'https://www.redbull.com/int-en/episodes/grime-hashtags-s02-e04',
'md5': 'db8271a7200d40053a1809ed0dd574ff',
'info_dict': {
'id': 'AA-1MT8DQWA91W14',
'ext': 'mp4',
'title': 'Grime - Hashtags S2E4',
'description': 'md5:5546aa612958c08a98faaad4abce484d',
},
}, {
'url': 'https://www.redbull.com/int-en/films/kilimanjaro-mountain-of-greatness',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/recap-videos/uci-mountain-bike-world-cup-2017-mens-xco-finals-from-vallnord',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/trailer-videos/kings-of-content',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/videos/tnts-style-red-bull-dance-your-style-s1-e12',
'only_matching': True,
}, {
'url': 'https://www.redbull.com/int-en/live/mens-dh-finals-fort-william',
'only_matching': True,
}, {
# only available on the int-en website so a fallback is need for the API
# https://www.redbull.com/v3/api/graphql/v1/v3/query/en-GB>en-INT?filter[uriSlug]=fia-wrc-saturday-recap-estonia&rb3Schema=v1:hero
'url': 'https://www.redbull.com/gb-en/live/fia-wrc-saturday-recap-estonia',
'only_matching': True,
}]
_INT_FALLBACK_LIST = ['de', 'en', 'es', 'fr']
_LAT_FALLBACK_MAP = ['ar', 'bo', 'car', 'cl', 'co', 'mx', 'pe']
def _real_extract(self, url):
region, lang, filter_type, display_id = re.search(self._VALID_URL, url).groups()
if filter_type == 'episodes':
filter_type = 'episode-videos'
elif filter_type == 'live':
filter_type = 'live-videos'
regions = [region.upper()]
if region != 'int':
if region in self._LAT_FALLBACK_MAP:
regions.append('LAT')
if lang in self._INT_FALLBACK_LIST:
regions.append('INT')
locale = '>'.join(['%s-%s' % (lang, reg) for reg in regions])
rrn_id = self._download_json(
'https://www.redbull.com/v3/api/graphql/v1/v3/query/' + locale,
display_id, query={
'filter[type]': filter_type,
'filter[uriSlug]': display_id,
'rb3Schema': 'v1:hero',
})['data']['id']
video_url = self._og_search_url(webpage)
return self.url_result(
'https://www.redbull.com/embed/' + rrn_id,
RedBullEmbedIE.ie_key(), rrn_id)
video_url, ie=RedBullTVIE.ie_key(),
video_id=RedBullTVIE._match_id(video_url))

View File

@ -15,7 +15,7 @@ from ..utils import (
class RedTubeIE(InfoExtractor):
_VALID_URL = r'https?://(?:(?:\w+\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
_VALID_URL = r'https?://(?:(?:www\.)?redtube\.com/|embed\.redtube\.com/\?.*?\bid=)(?P<id>[0-9]+)'
_TESTS = [{
'url': 'http://www.redtube.com/66418',
'md5': 'fc08071233725f26b8f014dba9590005',
@ -31,9 +31,6 @@ class RedTubeIE(InfoExtractor):
}, {
'url': 'http://embed.redtube.com/?bgcolor=000000&id=1443286',
'only_matching': True,
}, {
'url': 'http://it.redtube.com/66418',
'only_matching': True,
}]
@staticmethod

View File

@ -14,27 +14,12 @@ class RtlNlIE(InfoExtractor):
_VALID_URL = r'''(?x)
https?://(?:(?:www|static)\.)?
(?:
rtlxl\.nl/(?:[^\#]*\#!|programma)/[^/]+/|
rtl\.nl/(?:(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html|embed)\b.+?\buuid=|video/)|
embed\.rtl\.nl/\#uuid=
rtlxl\.nl/[^\#]*\#!/[^/]+/|
rtl\.nl/(?:(?:system/videoplayer/(?:[^/]+/)+(?:video_)?embed\.html|embed)\b.+?\buuid=|video/)
)
(?P<id>[0-9a-f-]+)'''
_TESTS = [{
# new URL schema
'url': 'https://www.rtlxl.nl/programma/rtl-nieuws/0bd1384d-d970-3086-98bb-5c104e10c26f',
'md5': '490428f1187b60d714f34e1f2e3af0b6',
'info_dict': {
'id': '0bd1384d-d970-3086-98bb-5c104e10c26f',
'ext': 'mp4',
'title': 'RTL Nieuws',
'description': 'md5:d41d8cd98f00b204e9800998ecf8427e',
'timestamp': 1593293400,
'upload_date': '20200627',
'duration': 661.08,
},
}, {
# old URL schema
'url': 'http://www.rtlxl.nl/#!/rtl-nieuws-132237/82b1aad1-4a14-3d7b-b554-b0aed1b2c416',
'md5': '473d1946c1fdd050b2c0161a4b13c373',
'info_dict': {
@ -46,7 +31,6 @@ class RtlNlIE(InfoExtractor):
'upload_date': '20160429',
'duration': 1167.96,
},
'skip': '404',
}, {
# best format available a3t
'url': 'http://www.rtl.nl/system/videoplayer/derden/rtlnieuws/video_embed.html#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
@ -92,10 +76,6 @@ class RtlNlIE(InfoExtractor):
}, {
'url': 'https://static.rtl.nl/embed/?uuid=1a2970fc-5c0b-43ff-9fdc-927e39e6d1bc&autoplay=false&publicatiepunt=rtlnieuwsnl',
'only_matching': True,
}, {
# new embed URL schema
'url': 'https://embed.rtl.nl/#uuid=84ae5571-ac25-4225-ae0c-ef8d9efb2aed/autoplay=false',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -558,10 +558,8 @@ class SoundcloudSetIE(SoundcloudPlaylistBaseIE):
class SoundcloudPagedPlaylistBaseIE(SoundcloudIE):
def _extract_playlist(self, base_url, playlist_id, playlist_title):
# Per the SoundCloud documentation, the maximum limit for a linked partioning query is 200.
# https://developers.soundcloud.com/blog/offset-pagination-deprecated
COMMON_QUERY = {
'limit': 200,
'limit': 80000,
'linked_partitioning': '1',
}

View File

@ -114,7 +114,7 @@ class SRGSSRPlayIE(InfoExtractor):
[^/]+/(?P<type>video|audio)/[^?]+|
popup(?P<type_2>video|audio)player
)
\?.*?\b(?:id=|urn=urn:[^:]+:video:)(?P<id>[0-9a-f\-]{36}|\d+)
\?id=(?P<id>[0-9a-f\-]{36}|\d+)
'''
_TESTS = [{
@ -175,12 +175,6 @@ class SRGSSRPlayIE(InfoExtractor):
}, {
'url': 'https://www.srf.ch/play/tv/popupvideoplayer?id=c4dba0ca-e75b-43b2-a34f-f708a4932e01',
'only_matching': True,
}, {
'url': 'https://www.srf.ch/play/tv/10vor10/video/snowden-beantragt-asyl-in-russland?urn=urn:srf:video:28e1a57d-5b76-4399-8ab3-9097f071e6c5',
'only_matching': True,
}, {
'url': 'https://www.rts.ch/play/tv/19h30/video/le-19h30?urn=urn:rts:video:6348260',
'only_matching': True,
}]
def _real_extract(self, url):

View File

@ -224,17 +224,9 @@ class SVTPlayIE(SVTPlayBaseIE):
self._adjust_title(info_dict)
return info_dict
svt_id = try_get(
data, lambda x: x['statistics']['dataLake']['content']['id'],
compat_str)
if not svt_id:
svt_id = self._search_regex(
(r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
r'["\']videoSvtId["\']\s*:\s*["\']([\da-zA-Z-]+)',
r'"content"\s*:\s*{.*?"id"\s*:\s*"([\da-zA-Z-]+)"',
r'["\']svtId["\']\s*:\s*["\']([\da-zA-Z-]+)'),
webpage, 'video id')
svt_id = self._search_regex(
r'<video[^>]+data-video-id=["\']([\da-zA-Z-]+)',
webpage, 'video id')
return self._extract_by_video_id(svt_id, webpage)

View File

@ -13,24 +13,14 @@ from ..utils import (
class TeleQuebecBaseIE(InfoExtractor):
@staticmethod
def _result(url, ie_key):
def _limelight_result(media_id):
return {
'_type': 'url_transparent',
'url': smuggle_url(url, {'geo_countries': ['CA']}),
'ie_key': ie_key,
'url': smuggle_url(
'limelight:media:' + media_id, {'geo_countries': ['CA']}),
'ie_key': 'LimelightMedia',
}
@staticmethod
def _limelight_result(media_id):
return TeleQuebecBaseIE._result(
'limelight:media:' + media_id, 'LimelightMedia')
@staticmethod
def _brightcove_result(brightcove_id):
return TeleQuebecBaseIE._result(
'http://players.brightcove.net/6150020952001/default_default/index.html?videoId=%s'
% brightcove_id, 'BrightcoveNew')
class TeleQuebecIE(TeleQuebecBaseIE):
_VALID_URL = r'''(?x)
@ -47,27 +37,11 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'id': '577116881b4b439084e6b1cf4ef8b1b3',
'ext': 'mp4',
'title': 'Un petit choc et puis repart!',
'description': 'md5:067bc84bd6afecad85e69d1000730907',
'description': 'md5:b04a7e6b3f74e32d7b294cffe8658374',
},
'params': {
'skip_download': True,
},
}, {
'url': 'https://zonevideo.telequebec.tv/media/55267/le-soleil/passe-partout',
'info_dict': {
'id': '6167180337001',
'ext': 'mp4',
'title': 'Le soleil',
'description': 'md5:64289c922a8de2abbe99c354daffde02',
'uploader_id': '6150020952001',
'upload_date': '20200625',
'timestamp': 1593090307,
},
'params': {
'format': 'bestvideo',
'skip_download': True,
},
'add_ie': ['BrightcoveNew'],
}, {
# no description
'url': 'http://zonevideo.telequebec.tv/media/30261',
@ -84,14 +58,7 @@ class TeleQuebecIE(TeleQuebecBaseIE):
'https://mnmedias.api.telequebec.tv/api/v2/media/' + media_id,
media_id)['media']
source_id = media_data['streamInfo']['sourceId']
source = (try_get(
media_data, lambda x: x['streamInfo']['source'],
compat_str) or 'limelight').lower()
if source == 'brightcove':
info = self._brightcove_result(source_id)
else:
info = self._limelight_result(source_id)
info = self._limelight_result(media_data['streamInfo']['sourceId'])
info.update({
'title': media_data.get('title'),
'description': try_get(

View File

@ -8,8 +8,8 @@ from ..utils import int_or_none
class TwentyThreeVideoIE(InfoExtractor):
IE_NAME = '23video'
_VALID_URL = r'https?://(?P<domain>[^.]+\.(?:twentythree\.net|23video\.com|filmweb\.no))/v\.ihtml/player\.html\?(?P<query>.*?\bphoto(?:_|%5f)id=(?P<id>\d+).*)'
_TESTS = [{
_VALID_URL = r'https?://video\.(?P<domain>twentythree\.net|23video\.com|filmweb\.no)/v\.ihtml/player\.html\?(?P<query>.*?\bphoto(?:_|%5f)id=(?P<id>\d+).*)'
_TEST = {
'url': 'https://video.twentythree.net/v.ihtml/player.html?showDescriptions=0&source=site&photo%5fid=20448876&autoPlay=1',
'md5': '75fcf216303eb1dae9920d651f85ced4',
'info_dict': {
@ -21,14 +21,11 @@ class TwentyThreeVideoIE(InfoExtractor):
'uploader_id': '12258964',
'uploader': 'Rasmus Bysted',
}
}, {
'url': 'https://bonnier-publications-danmark.23video.com/v.ihtml/player.html?token=f0dc46476e06e13afd5a1f84a29e31e8&source=embed&photo%5fid=36137620',
'only_matching': True,
}]
}
def _real_extract(self, url):
domain, query, photo_id = re.match(self._VALID_URL, url).groups()
base_url = 'https://%s' % domain
base_url = 'https://video.%s' % domain
photo_data = self._download_json(
base_url + '/api/photo/list?' + query, photo_id, query={
'format': 'json',

View File

@ -1,29 +1,28 @@
# coding: utf-8
from __future__ import unicode_literals
import collections
import itertools
import json
import random
import re
import random
import json
from .common import InfoExtractor
from ..compat import (
compat_kwargs,
compat_parse_qs,
compat_str,
compat_urlparse,
compat_urllib_parse_urlencode,
compat_urllib_parse_urlparse,
)
from ..utils import (
clean_html,
ExtractorError,
float_or_none,
int_or_none,
orderedSet,
parse_duration,
parse_iso8601,
qualities,
str_or_none,
try_get,
unified_timestamp,
update_url_query,
@ -151,16 +150,120 @@ class TwitchBaseIE(InfoExtractor):
})
self._sort_formats(formats)
def _download_access_token(self, channel_name):
return self._call_api(
'api/channels/%s/access_token' % channel_name, channel_name,
'Downloading access token JSON')
def _extract_channel_id(self, token, channel_name):
return compat_str(self._parse_json(token, channel_name)['channel_id'])
class TwitchItemBaseIE(TwitchBaseIE):
def _download_info(self, item, item_id):
return self._extract_info(self._call_api(
'kraken/videos/%s%s' % (item, item_id), item_id,
'Downloading %s info JSON' % self._ITEM_TYPE))
def _extract_media(self, item_id):
info = self._download_info(self._ITEM_SHORTCUT, item_id)
response = self._call_api(
'api/videos/%s%s' % (self._ITEM_SHORTCUT, item_id), item_id,
'Downloading %s playlist JSON' % self._ITEM_TYPE)
entries = []
chunks = response['chunks']
qualities = list(chunks.keys())
for num, fragment in enumerate(zip(*chunks.values()), start=1):
formats = []
for fmt_num, fragment_fmt in enumerate(fragment):
format_id = qualities[fmt_num]
fmt = {
'url': fragment_fmt['url'],
'format_id': format_id,
'quality': 1 if format_id == 'live' else 0,
}
m = re.search(r'^(?P<height>\d+)[Pp]', format_id)
if m:
fmt['height'] = int(m.group('height'))
formats.append(fmt)
self._sort_formats(formats)
entry = dict(info)
entry['id'] = '%s_%d' % (entry['id'], num)
entry['title'] = '%s part %d' % (entry['title'], num)
entry['formats'] = formats
entries.append(entry)
return self.playlist_result(entries, info['id'], info['title'])
def _extract_info(self, info):
status = info.get('status')
if status == 'recording':
is_live = True
elif status == 'recorded':
is_live = False
else:
is_live = None
_QUALITIES = ('small', 'medium', 'large')
quality_key = qualities(_QUALITIES)
thumbnails = []
preview = info.get('preview')
if isinstance(preview, dict):
for thumbnail_id, thumbnail_url in preview.items():
thumbnail_url = url_or_none(thumbnail_url)
if not thumbnail_url:
continue
if thumbnail_id not in _QUALITIES:
continue
thumbnails.append({
'url': thumbnail_url,
'preference': quality_key(thumbnail_id),
})
return {
'id': info['_id'],
'title': info.get('title') or 'Untitled Broadcast',
'description': info.get('description'),
'duration': int_or_none(info.get('length')),
'thumbnails': thumbnails,
'uploader': info.get('channel', {}).get('display_name'),
'uploader_id': info.get('channel', {}).get('name'),
'timestamp': parse_iso8601(info.get('recorded_at')),
'view_count': int_or_none(info.get('views')),
'is_live': is_live,
}
def _real_extract(self, url):
return self._extract_media(self._match_id(url))
class TwitchVodIE(TwitchBaseIE):
class TwitchVideoIE(TwitchItemBaseIE):
IE_NAME = 'twitch:video'
_VALID_URL = r'%s/[^/]+/b/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'video'
_ITEM_SHORTCUT = 'a'
_TEST = {
'url': 'http://www.twitch.tv/riotgames/b/577357806',
'info_dict': {
'id': 'a577357806',
'title': 'Worlds Semifinals - Star Horn Royal Club vs. OMG',
},
'playlist_mincount': 12,
'skip': 'HTTP Error 404: Not Found',
}
class TwitchChapterIE(TwitchItemBaseIE):
IE_NAME = 'twitch:chapter'
_VALID_URL = r'%s/[^/]+/c/(?P<id>\d+)' % TwitchBaseIE._VALID_URL_BASE
_ITEM_TYPE = 'chapter'
_ITEM_SHORTCUT = 'c'
_TESTS = [{
'url': 'http://www.twitch.tv/acracingleague/c/5285812',
'info_dict': {
'id': 'c5285812',
'title': 'ACRL Off Season - Sports Cars @ Nordschleife',
},
'playlist_mincount': 3,
'skip': 'HTTP Error 404: Not Found',
}, {
'url': 'http://www.twitch.tv/tsm_theoddone/c/2349361',
'only_matching': True,
}]
class TwitchVodIE(TwitchItemBaseIE):
IE_NAME = 'twitch:vod'
_VALID_URL = r'''(?x)
https?://
@ -229,60 +332,17 @@ class TwitchVodIE(TwitchBaseIE):
'only_matching': True,
}]
def _download_info(self, item_id):
return self._extract_info(
self._call_api(
'kraken/videos/%s' % item_id, item_id,
'Downloading video info JSON'))
@staticmethod
def _extract_info(info):
status = info.get('status')
if status == 'recording':
is_live = True
elif status == 'recorded':
is_live = False
else:
is_live = None
_QUALITIES = ('small', 'medium', 'large')
quality_key = qualities(_QUALITIES)
thumbnails = []
preview = info.get('preview')
if isinstance(preview, dict):
for thumbnail_id, thumbnail_url in preview.items():
thumbnail_url = url_or_none(thumbnail_url)
if not thumbnail_url:
continue
if thumbnail_id not in _QUALITIES:
continue
thumbnails.append({
'url': thumbnail_url,
'preference': quality_key(thumbnail_id),
})
return {
'id': info['_id'],
'title': info.get('title') or 'Untitled Broadcast',
'description': info.get('description'),
'duration': int_or_none(info.get('length')),
'thumbnails': thumbnails,
'uploader': info.get('channel', {}).get('display_name'),
'uploader_id': info.get('channel', {}).get('name'),
'timestamp': parse_iso8601(info.get('recorded_at')),
'view_count': int_or_none(info.get('views')),
'is_live': is_live,
}
def _real_extract(self, url):
vod_id = self._match_id(url)
item_id = self._match_id(url)
info = self._download_info(vod_id)
info = self._download_info(self._ITEM_SHORTCUT, item_id)
access_token = self._call_api(
'api/vods/%s/access_token' % vod_id, vod_id,
'api/vods/%s/access_token' % item_id, item_id,
'Downloading %s access token' % self._ITEM_TYPE)
formats = self._extract_m3u8_formats(
'%s/vod/%s.m3u8?%s' % (
self._USHER_BASE, vod_id,
self._USHER_BASE, item_id,
compat_urllib_parse_urlencode({
'allow_source': 'true',
'allow_audio_only': 'true',
@ -292,7 +352,7 @@ class TwitchVodIE(TwitchBaseIE):
'nauth': access_token['token'],
'nauthsig': access_token['sig'],
})),
vod_id, 'mp4', entry_protocol='m3u8_native')
item_id, 'mp4', entry_protocol='m3u8_native')
self._prefer_source(formats)
info['formats'] = formats
@ -306,7 +366,7 @@ class TwitchVodIE(TwitchBaseIE):
info['subtitles'] = {
'rechat': [{
'url': update_url_query(
'https://api.twitch.tv/v5/videos/%s/comments' % vod_id, {
'https://api.twitch.tv/v5/videos/%s/comments' % item_id, {
'client_id': self._CLIENT_ID,
}),
'ext': 'json',
@ -316,415 +376,166 @@ class TwitchVodIE(TwitchBaseIE):
return info
def _make_video_result(node):
assert isinstance(node, dict)
video_id = node.get('id')
if not video_id:
return
return {
'_type': 'url_transparent',
'ie_key': TwitchVodIE.ie_key(),
'id': video_id,
'url': 'https://www.twitch.tv/videos/%s' % video_id,
'title': node.get('title'),
'thumbnail': node.get('previewThumbnailURL'),
'duration': float_or_none(node.get('lengthSeconds')),
'view_count': int_or_none(node.get('viewCount')),
}
class TwitchGraphQLBaseIE(TwitchBaseIE):
class TwitchPlaylistBaseIE(TwitchBaseIE):
_PLAYLIST_PATH = 'kraken/channels/%s/videos/?offset=%d&limit=%d'
_PAGE_LIMIT = 100
_OPERATION_HASHES = {
'CollectionSideBar': '27111f1b382effad0b6def325caef1909c733fe6a4fbabf54f8d491ef2cf2f14',
'FilterableVideoTower_Videos': 'a937f1d22e269e39a03b509f65a7490f9fc247d7f83d6ac1421523e3b68042cb',
'ClipsCards__User': 'b73ad2bfaecfd30a9e6c28fada15bd97032c83ec77a0440766a56fe0bd632777',
'ChannelCollectionsContent': '07e3691a1bad77a36aba590c351180439a40baefc1c275356f40fc7082419a84',
'StreamMetadata': '1c719a40e481453e5c48d9bb585d971b8b372f8ebb105b17076722264dfa5b3e',
'ComscoreStreamingQuery': 'e1edae8122517d013405f237ffcc124515dc6ded82480a88daef69c83b53ac01',
'VideoPreviewOverlay': '3006e77e51b128d838fa4e835723ca4dc9a05c5efd4466c1085215c6e437e65c',
}
def _download_gql(self, video_id, ops, note, fatal=True):
for op in ops:
op['extensions'] = {
'persistedQuery': {
'version': 1,
'sha256Hash': self._OPERATION_HASHES[op['operationName']],
}
}
return self._download_json(
'https://gql.twitch.tv/gql', video_id, note,
data=json.dumps(ops).encode(),
headers={
'Content-Type': 'text/plain;charset=UTF-8',
'Client-ID': self._CLIENT_ID,
}, fatal=fatal)
class TwitchCollectionIE(TwitchGraphQLBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/collections/(?P<id>[^/]+)'
_TESTS = [{
'url': 'https://www.twitch.tv/collections/wlDCoH0zEBZZbQ',
'info_dict': {
'id': 'wlDCoH0zEBZZbQ',
'title': 'Overthrow Nook, capitalism for children',
},
'playlist_mincount': 13,
}]
_OPERATION_NAME = 'CollectionSideBar'
def _real_extract(self, url):
collection_id = self._match_id(url)
collection = self._download_gql(
collection_id, [{
'operationName': self._OPERATION_NAME,
'variables': {'collectionID': collection_id},
}],
'Downloading collection GraphQL')[0]['data']['collection']
title = collection.get('title')
def _extract_playlist(self, channel_id):
info = self._call_api(
'kraken/channels/%s' % channel_id,
channel_id, 'Downloading channel info JSON')
channel_name = info.get('display_name') or info.get('name')
entries = []
for edge in collection['items']['edges']:
if not isinstance(edge, dict):
continue
node = edge.get('node')
if not isinstance(node, dict):
continue
video = _make_video_result(node)
if video:
entries.append(video)
return self.playlist_result(
entries, playlist_id=collection_id, playlist_title=title)
class TwitchPlaylistBaseIE(TwitchGraphQLBaseIE):
def _entries(self, channel_name, *args):
cursor = None
variables_common = self._make_variables(channel_name, *args)
entries_key = '%ss' % self._ENTRY_KIND
for page_num in itertools.count(1):
variables = variables_common.copy()
variables['limit'] = self._PAGE_LIMIT
if cursor:
variables['cursor'] = cursor
page = self._download_gql(
channel_name, [{
'operationName': self._OPERATION_NAME,
'variables': variables,
}],
'Downloading %ss GraphQL page %s' % (self._NODE_KIND, page_num),
fatal=False)
if not page:
break
edges = try_get(
page, lambda x: x[0]['data']['user'][entries_key]['edges'], list)
if not edges:
break
for edge in edges:
if not isinstance(edge, dict):
continue
if edge.get('__typename') != self._EDGE_KIND:
continue
node = edge.get('node')
if not isinstance(node, dict):
continue
if node.get('__typename') != self._NODE_KIND:
continue
entry = self._extract_entry(node)
if entry:
cursor = edge.get('cursor')
yield entry
if not cursor or not isinstance(cursor, compat_str):
break
# Deprecated kraken v5 API
def _entries_kraken(self, channel_name, broadcast_type, sort):
access_token = self._download_access_token(channel_name)
channel_id = self._extract_channel_id(access_token['token'], channel_name)
offset = 0
limit = self._PAGE_LIMIT
broken_paging_detected = False
counter_override = None
for counter in itertools.count(1):
response = self._call_api(
'kraken/channels/%s/videos/' % channel_id,
self._PLAYLIST_PATH % (channel_id, offset, limit),
channel_id,
'Downloading video JSON page %s' % (counter_override or counter),
query={
'offset': offset,
'limit': self._PAGE_LIMIT,
'broadcast_type': broadcast_type,
'sort': sort,
})
videos = response.get('videos')
if not isinstance(videos, list):
'Downloading %s JSON page %s'
% (self._PLAYLIST_TYPE, counter_override or counter))
page_entries = self._extract_playlist_page(response)
if not page_entries:
break
for video in videos:
if not isinstance(video, dict):
continue
video_url = url_or_none(video.get('url'))
if not video_url:
continue
yield {
'_type': 'url_transparent',
'ie_key': TwitchVodIE.ie_key(),
'id': video.get('_id'),
'url': video_url,
'title': video.get('title'),
'description': video.get('description'),
'timestamp': unified_timestamp(video.get('published_at')),
'duration': float_or_none(video.get('length')),
'view_count': int_or_none(video.get('views')),
'language': video.get('language'),
}
offset += self._PAGE_LIMIT
total = int_or_none(response.get('_total'))
if total and offset >= total:
# Since the beginning of March 2016 twitch's paging mechanism
# is completely broken on the twitch side. It simply ignores
# a limit and returns the whole offset number of videos.
# Working around by just requesting all videos at once.
# Upd: pagination bug was fixed by twitch on 15.03.2016.
if not broken_paging_detected and total and len(page_entries) > limit:
self.report_warning(
'Twitch pagination is broken on twitch side, requesting all videos at once',
channel_id)
broken_paging_detected = True
offset = total
counter_override = '(all at once)'
continue
entries.extend(page_entries)
if broken_paging_detected or total and len(page_entries) >= total:
break
offset += limit
return self.playlist_result(
[self._make_url_result(entry) for entry in orderedSet(entries)],
channel_id, channel_name)
def _make_url_result(self, url):
try:
video_id = 'v%s' % TwitchVodIE._match_id(url)
return self.url_result(url, TwitchVodIE.ie_key(), video_id=video_id)
except AssertionError:
return self.url_result(url)
def _extract_playlist_page(self, response):
videos = response.get('videos')
return [video['url'] for video in videos] if videos else []
def _real_extract(self, url):
return self._extract_playlist(self._match_id(url))
class TwitchVideosIE(TwitchPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:videos|profile)'
class TwitchProfileIE(TwitchPlaylistBaseIE):
IE_NAME = 'twitch:profile'
_VALID_URL = r'%s/(?P<id>[^/]+)/profile/?(?:\#.*)?$' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_TYPE = 'profile'
_TESTS = [{
# All Videos sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=all',
'url': 'http://www.twitch.tv/vanillatv/profile',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - All Videos sorted by Date',
'id': 'vanillatv',
'title': 'VanillaTV',
},
'playlist_mincount': 924,
'playlist_mincount': 412,
}, {
# All Videos sorted by Popular
'url': 'https://www.twitch.tv/spamfish/videos?filter=all&sort=views',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - All Videos sorted by Popular',
},
'playlist_mincount': 931,
}, {
# Past Broadcasts sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=archives',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Past Broadcasts sorted by Date',
},
'playlist_mincount': 27,
}, {
# Highlights sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=highlights',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Highlights sorted by Date',
},
'playlist_mincount': 901,
}, {
# Uploads sorted by Date
'url': 'https://www.twitch.tv/esl_csgo/videos?filter=uploads&sort=time',
'info_dict': {
'id': 'esl_csgo',
'title': 'esl_csgo - Uploads sorted by Date',
},
'playlist_mincount': 5,
}, {
# Past Premieres sorted by Date
'url': 'https://www.twitch.tv/spamfish/videos?filter=past_premieres',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Past Premieres sorted by Date',
},
'playlist_mincount': 1,
}, {
'url': 'https://www.twitch.tv/spamfish/videos/all',
'url': 'http://m.twitch.tv/vanillatv/profile',
'only_matching': True,
}]
class TwitchVideosBaseIE(TwitchPlaylistBaseIE):
_VALID_URL_VIDEOS_BASE = r'%s/(?P<id>[^/]+)/videos' % TwitchBaseIE._VALID_URL_BASE
_PLAYLIST_PATH = TwitchPlaylistBaseIE._PLAYLIST_PATH + '&broadcast_type='
class TwitchAllVideosIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:all'
_VALID_URL = r'%s/all' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive,upload,highlight'
_PLAYLIST_TYPE = 'all videos'
_TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/all',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 869,
}, {
'url': 'https://m.twitch.tv/spamfish/videos/all',
'only_matching': True,
}, {
'url': 'https://www.twitch.tv/spamfish/videos',
'only_matching': True,
}]
Broadcast = collections.namedtuple('Broadcast', ['type', 'label'])
_DEFAULT_BROADCAST = Broadcast(None, 'All Videos')
_BROADCASTS = {
'archives': Broadcast('ARCHIVE', 'Past Broadcasts'),
'highlights': Broadcast('HIGHLIGHT', 'Highlights'),
'uploads': Broadcast('UPLOAD', 'Uploads'),
'past_premieres': Broadcast('PAST_PREMIERE', 'Past Premieres'),
'all': _DEFAULT_BROADCAST,
}
_DEFAULT_SORTED_BY = 'Date'
_SORTED_BY = {
'time': _DEFAULT_SORTED_BY,
'views': 'Popular',
}
_OPERATION_NAME = 'FilterableVideoTower_Videos'
_ENTRY_KIND = 'video'
_EDGE_KIND = 'VideoEdge'
_NODE_KIND = 'Video'
@classmethod
def suitable(cls, url):
return (False
if any(ie.suitable(url) for ie in (
TwitchVideosClipsIE,
TwitchVideosCollectionsIE))
else super(TwitchVideosIE, cls).suitable(url))
@staticmethod
def _make_variables(channel_name, broadcast_type, sort):
return {
'channelOwnerLogin': channel_name,
'broadcastType': broadcast_type,
'videoSort': sort.upper(),
}
@staticmethod
def _extract_entry(node):
return _make_video_result(node)
def _real_extract(self, url):
channel_name = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
filter = qs.get('filter', ['all'])[0]
sort = qs.get('sort', ['time'])[0]
broadcast = self._BROADCASTS.get(filter, self._DEFAULT_BROADCAST)
return self.playlist_result(
self._entries(channel_name, broadcast.type, sort),
playlist_id=channel_name,
playlist_title='%s - %s sorted by %s'
% (channel_name, broadcast.label,
self._SORTED_BY.get(sort, self._DEFAULT_SORTED_BY)))
class TwitchVideosClipsIE(TwitchPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/(?:clips|videos/*?\?.*?\bfilter=clips)'
class TwitchUploadsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:uploads'
_VALID_URL = r'%s/uploads' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'upload'
_PLAYLIST_TYPE = 'uploads'
_TESTS = [{
# Clips
'url': 'https://www.twitch.tv/vanillatv/clips?filter=clips&range=all',
'info_dict': {
'id': 'vanillatv',
'title': 'vanillatv - Clips Top All',
},
'playlist_mincount': 1,
}, {
'url': 'https://www.twitch.tv/dota2ruhub/videos?filter=clips&range=7d',
'only_matching': True,
}]
Clip = collections.namedtuple('Clip', ['filter', 'label'])
_DEFAULT_CLIP = Clip('LAST_WEEK', 'Top 7D')
_RANGE = {
'24hr': Clip('LAST_DAY', 'Top 24H'),
'7d': _DEFAULT_CLIP,
'30d': Clip('LAST_MONTH', 'Top 30D'),
'all': Clip('ALL_TIME', 'Top All'),
}
# NB: values other than 20 result in skipped videos
_PAGE_LIMIT = 20
_OPERATION_NAME = 'ClipsCards__User'
_ENTRY_KIND = 'clip'
_EDGE_KIND = 'ClipEdge'
_NODE_KIND = 'Clip'
@staticmethod
def _make_variables(channel_name, filter):
return {
'login': channel_name,
'criteria': {
'filter': filter,
},
}
@staticmethod
def _extract_entry(node):
assert isinstance(node, dict)
clip_url = url_or_none(node.get('url'))
if not clip_url:
return
return {
'_type': 'url_transparent',
'ie_key': TwitchClipsIE.ie_key(),
'id': node.get('id'),
'url': clip_url,
'title': node.get('title'),
'thumbnail': node.get('thumbnailURL'),
'duration': float_or_none(node.get('durationSeconds')),
'timestamp': unified_timestamp(node.get('createdAt')),
'view_count': int_or_none(node.get('viewCount')),
'language': node.get('language'),
}
def _real_extract(self, url):
channel_name = self._match_id(url)
qs = compat_urlparse.parse_qs(compat_urlparse.urlparse(url).query)
range = qs.get('range', ['7d'])[0]
clip = self._RANGE.get(range, self._DEFAULT_CLIP)
return self.playlist_result(
self._entries(channel_name, clip.filter),
playlist_id=channel_name,
playlist_title='%s - Clips %s' % (channel_name, clip.label))
class TwitchVideosCollectionsIE(TwitchPlaylistBaseIE):
_VALID_URL = r'https?://(?:(?:www|go|m)\.)?twitch\.tv/(?P<id>[^/]+)/videos/*?\?.*?\bfilter=collections'
_TESTS = [{
# Collections
'url': 'https://www.twitch.tv/spamfish/videos?filter=collections',
'url': 'https://www.twitch.tv/spamfish/videos/uploads',
'info_dict': {
'id': 'spamfish',
'title': 'spamfish - Collections',
'title': 'Spamfish',
},
'playlist_mincount': 3,
'playlist_mincount': 0,
}, {
'url': 'https://m.twitch.tv/spamfish/videos/uploads',
'only_matching': True,
}]
_OPERATION_NAME = 'ChannelCollectionsContent'
_ENTRY_KIND = 'collection'
_EDGE_KIND = 'CollectionsItemEdge'
_NODE_KIND = 'Collection'
@staticmethod
def _make_variables(channel_name):
return {
'ownerLogin': channel_name,
}
class TwitchPastBroadcastsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:past-broadcasts'
_VALID_URL = r'%s/past-broadcasts' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'archive'
_PLAYLIST_TYPE = 'past broadcasts'
@staticmethod
def _extract_entry(node):
assert isinstance(node, dict)
collection_id = node.get('id')
if not collection_id:
return
return {
'_type': 'url_transparent',
'ie_key': TwitchCollectionIE.ie_key(),
'id': collection_id,
'url': 'https://www.twitch.tv/collections/%s' % collection_id,
'title': node.get('title'),
'thumbnail': node.get('thumbnailURL'),
'duration': float_or_none(node.get('lengthSeconds')),
'timestamp': unified_timestamp(node.get('updatedAt')),
'view_count': int_or_none(node.get('viewCount')),
}
def _real_extract(self, url):
channel_name = self._match_id(url)
return self.playlist_result(
self._entries(channel_name), playlist_id=channel_name,
playlist_title='%s - Collections' % channel_name)
_TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/past-broadcasts',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 0,
}, {
'url': 'https://m.twitch.tv/spamfish/videos/past-broadcasts',
'only_matching': True,
}]
class TwitchStreamIE(TwitchGraphQLBaseIE):
class TwitchHighlightsIE(TwitchVideosBaseIE):
IE_NAME = 'twitch:videos:highlights'
_VALID_URL = r'%s/highlights' % TwitchVideosBaseIE._VALID_URL_VIDEOS_BASE
_PLAYLIST_PATH = TwitchVideosBaseIE._PLAYLIST_PATH + 'highlight'
_PLAYLIST_TYPE = 'highlights'
_TESTS = [{
'url': 'https://www.twitch.tv/spamfish/videos/highlights',
'info_dict': {
'id': 'spamfish',
'title': 'Spamfish',
},
'playlist_mincount': 805,
}, {
'url': 'https://m.twitch.tv/spamfish/videos/highlights',
'only_matching': True,
}]
class TwitchStreamIE(TwitchBaseIE):
IE_NAME = 'twitch:stream'
_VALID_URL = r'''(?x)
https?://
@ -772,52 +583,43 @@ class TwitchStreamIE(TwitchGraphQLBaseIE):
def suitable(cls, url):
return (False
if any(ie.suitable(url) for ie in (
TwitchVideoIE,
TwitchChapterIE,
TwitchVodIE,
TwitchCollectionIE,
TwitchVideosIE,
TwitchVideosClipsIE,
TwitchVideosCollectionsIE,
TwitchProfileIE,
TwitchAllVideosIE,
TwitchUploadsIE,
TwitchPastBroadcastsIE,
TwitchHighlightsIE,
TwitchClipsIE))
else super(TwitchStreamIE, cls).suitable(url))
def _real_extract(self, url):
channel_name = self._match_id(url).lower()
channel_name = self._match_id(url)
gql = self._download_gql(
channel_name, [{
'operationName': 'StreamMetadata',
'variables': {'channelLogin': channel_name},
}, {
'operationName': 'ComscoreStreamingQuery',
'variables': {
'channel': channel_name,
'clipSlug': '',
'isClip': False,
'isLive': True,
'isVodOrCollection': False,
'vodID': '',
},
}, {
'operationName': 'VideoPreviewOverlay',
'variables': {'login': channel_name},
}],
'Downloading stream GraphQL')
access_token = self._call_api(
'api/channels/%s/access_token' % channel_name, channel_name,
'Downloading access token JSON')
user = gql[0]['data']['user']
token = access_token['token']
channel_id = compat_str(self._parse_json(
token, channel_name)['channel_id'])
if not user:
raise ExtractorError(
'%s does not exist' % channel_name, expected=True)
stream = user['stream']
stream = self._call_api(
'kraken/streams/%s?stream_type=all' % channel_id,
channel_id, 'Downloading stream JSON').get('stream')
if not stream:
raise ExtractorError('%s is offline' % channel_name, expected=True)
raise ExtractorError('%s is offline' % channel_id, expected=True)
access_token = self._download_access_token(channel_name)
token = access_token['token']
# Channel name may be typed if different case than the original channel name
# (e.g. http://www.twitch.tv/TWITCHPLAYSPOKEMON) that will lead to constructing
# an invalid m3u8 URL. Working around by use of original channel name from stream
# JSON and fallback to lowercase if it's not available.
channel_name = try_get(
stream, lambda x: x['channel']['name'],
compat_str) or channel_name.lower()
stream_id = stream.get('id') or channel_name
query = {
'allow_source': 'true',
'allow_audio_only': 'true',
@ -830,39 +632,41 @@ class TwitchStreamIE(TwitchGraphQLBaseIE):
'token': token.encode('utf-8'),
}
formats = self._extract_m3u8_formats(
'%s/api/channel/hls/%s.m3u8' % (self._USHER_BASE, channel_name),
stream_id, 'mp4', query=query)
'%s/api/channel/hls/%s.m3u8?%s'
% (self._USHER_BASE, channel_name, compat_urllib_parse_urlencode(query)),
channel_id, 'mp4')
self._prefer_source(formats)
view_count = stream.get('viewers')
timestamp = unified_timestamp(stream.get('createdAt'))
timestamp = parse_iso8601(stream.get('created_at'))
sq_user = try_get(gql, lambda x: x[1]['data']['user'], dict) or {}
uploader = sq_user.get('displayName')
description = try_get(
sq_user, lambda x: x['broadcastSettings']['title'], compat_str)
channel = stream['channel']
title = self._live_title(channel.get('display_name') or channel.get('name'))
description = channel.get('status')
thumbnail = url_or_none(try_get(
gql, lambda x: x[2]['data']['user']['stream']['previewImageURL'],
compat_str))
title = uploader or channel_name
stream_type = stream.get('type')
if stream_type in ['rerun', 'live']:
title += ' (%s)' % stream_type
thumbnails = []
for thumbnail_key, thumbnail_url in stream['preview'].items():
m = re.search(r'(?P<width>\d+)x(?P<height>\d+)\.jpg$', thumbnail_key)
if not m:
continue
thumbnails.append({
'url': thumbnail_url,
'width': int(m.group('width')),
'height': int(m.group('height')),
})
return {
'id': stream_id,
'id': str_or_none(stream.get('_id')) or channel_id,
'display_id': channel_name,
'title': self._live_title(title),
'title': title,
'description': description,
'thumbnail': thumbnail,
'uploader': uploader,
'uploader_id': channel_name,
'thumbnails': thumbnails,
'uploader': channel.get('display_name'),
'uploader_id': channel.get('name'),
'timestamp': timestamp,
'view_count': view_count,
'formats': formats,
'is_live': stream_type == 'live',
'is_live': True,
}

View File

@ -19,7 +19,7 @@ from ..utils import (
class UstreamIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?(?:ustream\.tv|video\.ibm\.com)/(?P<type>recorded|embed|embed/recorded)/(?P<id>\d+)'
_VALID_URL = r'https?://(?:www\.)?ustream\.tv/(?P<type>recorded|embed|embed/recorded)/(?P<id>\d+)'
IE_NAME = 'ustream'
_TESTS = [{
'url': 'http://www.ustream.tv/recorded/20274954',
@ -67,15 +67,12 @@ class UstreamIE(InfoExtractor):
'params': {
'skip_download': True, # m3u8 download
},
}, {
'url': 'https://video.ibm.com/embed/recorded/128240221?&autoplay=true&controls=true&volume=100',
'only_matching': True,
}]
@staticmethod
def _extract_url(webpage):
mobj = re.search(
r'<iframe[^>]+?src=(["\'])(?P<url>http://(?:www\.)?(?:ustream\.tv|video\.ibm\.com)/embed/.+?)\1', webpage)
r'<iframe[^>]+?src=(["\'])(?P<url>http://www\.ustream\.tv/embed/.+?)\1', webpage)
if mobj is not None:
return mobj.group('url')

View File

@ -56,7 +56,7 @@ class WistiaIE(InfoExtractor):
urls.append(unescapeHTML(match.group('url')))
for match in re.finditer(
r'''(?sx)
<div[^>]+class=(["'])(?:(?!\1).)*?\bwistia_async_(?P<id>[a-z0-9]{10})\b(?:(?!\1).)*?\1
<div[^>]+class=(["']).*?\bwistia_async_(?P<id>[a-z0-9]{10})\b.*?\2
''', webpage):
urls.append('wistia:%s' % match.group('id'))
for match in re.finditer(r'(?:data-wistia-?id=["\']|Wistia\.embed\(["\']|id=["\']wistia_)(?P<id>[a-z0-9]{10})', webpage):

View File

@ -20,13 +20,13 @@ from ..utils import (
class XHamsterIE(InfoExtractor):
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster\d+\.com)'
_DOMAINS = r'(?:xhamster\.(?:com|one|desi)|xhms\.pro|xhamster[27]\.com)'
_VALID_URL = r'''(?x)
https?://
(?:.+?\.)?%s/
(?:
movies/(?P<id>[\dA-Za-z]+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>[\dA-Za-z]+)
movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
)
''' % _DOMAINS
_TESTS = [{
@ -99,21 +99,12 @@ class XHamsterIE(InfoExtractor):
}, {
'url': 'https://xhamster2.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, {
'url': 'https://xhamster11.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, {
'url': 'https://xhamster26.com/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}, {
'url': 'http://xhamster.com/movies/1509445/femaleagent_shy_beauty_takes_the_bait.html',
'only_matching': True,
}, {
'url': 'http://xhamster.com/movies/2221348/britney_spears_sexy_booty.html?hd',
'only_matching': True,
}, {
'url': 'http://de.xhamster.com/videos/skinny-girl-fucks-herself-hard-in-the-forest-xhnBJZx',
'only_matching': True,
}]
def _real_extract(self, url):
@ -138,8 +129,7 @@ class XHamsterIE(InfoExtractor):
initials = self._parse_json(
self._search_regex(
(r'window\.initials\s*=\s*({.+?})\s*;\s*</script>',
r'window\.initials\s*=\s*({.+?})\s*;'), webpage, 'initials',
r'window\.initials\s*=\s*({.+?})\s*;\s*\n', webpage, 'initials',
default='{}'),
video_id, fatal=False)
if initials:

View File

@ -3,7 +3,7 @@ from __future__ import unicode_literals
from .common import InfoExtractor
from ..compat import compat_urllib_parse_unquote
from ..utils import int_or_none, ExtractorError
from ..utils import int_or_none
class XiamiBaseIE(InfoExtractor):
@ -46,8 +46,6 @@ class XiamiBaseIE(InfoExtractor):
item_id, headers={
'Referer': referer,
})
if 'message' in playlist and playlist['message']:
raise ExtractorError(playlist['message'], expected=True)
return [
self._extract_track(track, item_id)
for track in playlist['data']['trackList']]

View File

@ -288,12 +288,11 @@ class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
# Extract entries from page with "Load more" button
def _entries(self, page, playlist_id):
more_widget_html = content_html = page
mobj_reg = r'(?:(?:data-uix-load-more-href="[^"]+?;continuation=)|(?:"continuation":"))(?P<more>[^"]+)"'
for page_num in itertools.count(1):
for entry in self._process_page(content_html):
yield entry
mobj = re.search(mobj_reg, more_widget_html)
mobj = re.search(r'data-uix-load-more-href="/?(?P<more>[^"]+)"', more_widget_html)
if not mobj:
break
@ -304,7 +303,7 @@ class YoutubeEntryListBaseInfoExtractor(YoutubeBaseInfoExtractor):
# Downloading page may result in intermittent 5xx HTTP error
# that is usually worked around with a retry
more = self._download_json(
'https://www.youtube.com/browse_ajax?ctoken=%s' % mobj.group('more'), playlist_id,
'https://youtube.com/%s' % mobj.group('more'), playlist_id,
'Downloading page #%s%s'
% (page_num, ' (retry #%d)' % count if count else ''),
transform_source=uppercase_escape,
@ -361,7 +360,7 @@ class YoutubePlaylistBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
class YoutubePlaylistsBaseInfoExtractor(YoutubeEntryListBaseInfoExtractor):
def _process_page(self, content):
for playlist_id in orderedSet(re.findall(
r'"/?playlist\?list=([0-9A-Za-z-_]{10,})"',
r'<h3[^>]+class="[^"]*yt-lockup-title[^"]*"[^>]*><a[^>]+href="/?playlist\?list=([0-9A-Za-z-_]{10,})"',
content)):
yield self.url_result(
'https://www.youtube.com/playlist?list=%s' % playlist_id, 'YoutubePlaylist')
@ -580,18 +579,44 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
}
},
{
'url': 'https://www.youtube.com/watch?v=PA4gbtKWNAI',
'note': 'Test video with age protection',
'url': 'https://www.youtube.com/watch?v=UxxajLWwzqY',
'note': 'Test generic use_cipher_signature video (#897)',
'info_dict': {
'id': 'PA4gbtKWNAI',
'id': 'UxxajLWwzqY',
'ext': 'mp4',
'upload_date': '20201028',
'title': 'Big Buck Bunny (TEST) (Age-restricted)',
'description': 'An age restricted test video for youtube-dl',
'duration': 635,
'uploader': 'Ali Sherief',
'uploader_id': 'UCi1TsEkfcMaYSadGms3UnqA',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/channel/UCi1TsEkfcMaYSadGms3UnqA',
'upload_date': '20120506',
'title': 'Icona Pop - I Love It (feat. Charli XCX) [OFFICIAL VIDEO]',
'alt_title': 'I Love It (feat. Charli XCX)',
'description': 'md5:19a2f98d9032b9311e686ed039564f63',
'tags': ['Icona Pop i love it', 'sweden', 'pop music', 'big beat records', 'big beat', 'charli',
'xcx', 'charli xcx', 'girls', 'hbo', 'i love it', "i don't care", 'icona', 'pop',
'iconic ep', 'iconic', 'love', 'it'],
'duration': 180,
'uploader': 'Icona Pop',
'uploader_id': 'IconaPop',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/IconaPop',
'creator': 'Icona Pop',
'track': 'I Love It (feat. Charli XCX)',
'artist': 'Icona Pop',
}
},
{
'url': 'https://www.youtube.com/watch?v=07FYdnEawAQ',
'note': 'Test VEVO video with age protection (#956)',
'info_dict': {
'id': '07FYdnEawAQ',
'ext': 'mp4',
'upload_date': '20130703',
'title': 'Justin Timberlake - Tunnel Vision (Official Music Video) (Explicit)',
'alt_title': 'Tunnel Vision',
'description': 'md5:07dab3356cde4199048e4c7cd93471e1',
'duration': 419,
'uploader': 'justintimberlakeVEVO',
'uploader_id': 'justintimberlakeVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/justintimberlakeVEVO',
'creator': 'Justin Timberlake',
'track': 'Tunnel Vision',
'artist': 'Justin Timberlake',
'age_limit': 18,
}
},
@ -652,6 +677,42 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
},
'skip': 'format 141 not served anymore',
},
# DASH manifest with encrypted signature
{
'url': 'https://www.youtube.com/watch?v=IB3lcPjvWLA',
'info_dict': {
'id': 'IB3lcPjvWLA',
'ext': 'm4a',
'title': 'Afrojack, Spree Wilson - The Spark (Official Music Video) ft. Spree Wilson',
'description': 'md5:8f5e2b82460520b619ccac1f509d43bf',
'duration': 244,
'uploader': 'AfrojackVEVO',
'uploader_id': 'AfrojackVEVO',
'upload_date': '20131011',
},
'params': {
'youtube_include_dash_manifest': True,
'format': '141/bestaudio[ext=m4a]',
},
},
# JS player signature function name containing $
{
'url': 'https://www.youtube.com/watch?v=nfWlot6h_JM',
'info_dict': {
'id': 'nfWlot6h_JM',
'ext': 'm4a',
'title': 'Taylor Swift - Shake It Off',
'description': 'md5:307195cd21ff7fa352270fe884570ef0',
'duration': 242,
'uploader': 'TaylorSwiftVEVO',
'uploader_id': 'TaylorSwiftVEVO',
'upload_date': '20140818',
},
'params': {
'youtube_include_dash_manifest': True,
'format': '141/bestaudio[ext=m4a]',
},
},
# Controversy video
{
'url': 'https://www.youtube.com/watch?v=T4XJQO3qol8',
@ -667,6 +728,59 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'description': 'SUBSCRIBE: http://www.youtube.com/saturninefilms\n\nEven Obama has taken a stand against freedom on this issue: http://www.huffingtonpost.com/2010/09/09/obama-gma-interview-quran_n_710282.html',
}
},
# Normal age-gate video (No vevo, embed allowed)
{
'url': 'https://youtube.com/watch?v=HtVdAasjOgU',
'info_dict': {
'id': 'HtVdAasjOgU',
'ext': 'mp4',
'title': 'The Witcher 3: Wild Hunt - The Sword Of Destiny Trailer',
'description': r're:(?s).{100,}About the Game\n.*?The Witcher 3: Wild Hunt.{100,}',
'duration': 142,
'uploader': 'The Witcher',
'uploader_id': 'WitcherGame',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/WitcherGame',
'upload_date': '20140605',
'age_limit': 18,
},
},
# Age-gate video with encrypted signature
{
'url': 'https://www.youtube.com/watch?v=6kLq3WMV1nU',
'info_dict': {
'id': '6kLq3WMV1nU',
'ext': 'mp4',
'title': 'Dedication To My Ex (Miss That) (Lyric Video)',
'description': 'md5:33765bb339e1b47e7e72b5490139bb41',
'duration': 246,
'uploader': 'LloydVEVO',
'uploader_id': 'LloydVEVO',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/LloydVEVO',
'upload_date': '20110629',
'age_limit': 18,
},
},
# video_info is None (https://github.com/ytdl-org/youtube-dl/issues/4421)
# YouTube Red ad is not captured for creator
{
'url': '__2ABJjxzNo',
'info_dict': {
'id': '__2ABJjxzNo',
'ext': 'mp4',
'duration': 266,
'upload_date': '20100430',
'uploader_id': 'deadmau5',
'uploader_url': r're:https?://(?:www\.)?youtube\.com/user/deadmau5',
'creator': 'Dada Life, deadmau5',
'description': 'md5:12c56784b8032162bb936a5f76d55360',
'uploader': 'deadmau5',
'title': 'Deadmau5 - Some Chords (HD)',
'alt_title': 'This Machine Kills Some Chords',
},
'expected_warnings': [
'DASH manifest missing',
]
},
# Olympics (https://github.com/ytdl-org/youtube-dl/issues/4431)
{
'url': 'lqQg6PlCWgI',
@ -1150,23 +1264,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
'params': {
'skip_download': True,
},
},
{
# empty description results in an empty string
'url': 'https://www.youtube.com/watch?v=x41yOUIvK2k',
'info_dict': {
'id': 'x41yOUIvK2k',
'ext': 'mp4',
'title': 'IMG 3456',
'description': '',
'upload_date': '20170613',
'uploader_id': 'ElevageOrVert',
'uploader': 'ElevageOrVert',
},
'params': {
'skip_download': True,
},
},
}
]
def __init__(self, *args, **kwargs):
@ -1286,7 +1384,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
funcname = self._search_regex(
(r'\b[cs]\s*&&\s*[adf]\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'\b[a-zA-Z0-9]+\s*&&\s*[a-zA-Z0-9]+\.set\([^,]+\s*,\s*encodeURIComponent\s*\(\s*(?P<sig>[a-zA-Z0-9$]+)\(',
r'(?:\b|[^a-zA-Z0-9$])(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'\b(?P<sig>[a-zA-Z0-9$]{2})\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
r'(?P<sig>[a-zA-Z0-9$]+)\s*=\s*function\(\s*a\s*\)\s*{\s*a\s*=\s*a\.split\(\s*""\s*\)',
# Obsolete patterns
r'(["\'])signature\1\s*,\s*(?P<sig>[a-zA-Z0-9$]+)\(',
@ -1727,8 +1825,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
# Get video info
video_info = {}
embed_webpage = None
if (self._og_search_property('restrictions:age', video_webpage, default=None) == '18+'
or re.search(r'player-age-gate-content">', video_webpage) is not None):
if re.search(r'player-age-gate-content">', video_webpage) is not None:
age_gate = True
# We simulate the access to the video from www.youtube.com/v/{video_id}
# this can be viewed without login into Youtube
@ -1833,9 +1930,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
''', replace_url, video_description)
video_description = clean_html(video_description)
else:
video_description = video_details.get('shortDescription')
if video_description is None:
video_description = self._html_search_meta('description', video_webpage)
video_description = self._html_search_meta('description', video_webpage) or video_details.get('shortDescription')
if not smuggled_data.get('force_singlefeed', False):
if not self._downloader.params.get('noplaylist'):
@ -1972,10 +2067,7 @@ class YoutubeIE(YoutubeBaseInfoExtractor):
if cipher:
if 's' in url_data or self._downloader.params.get('youtube_include_dash_manifest', True):
ASSETS_RE = (
r'<script[^>]+\bsrc=("[^"]+")[^>]+\bname=["\']player_ias/base',
r'"jsUrl"\s*:\s*("[^"]+")',
r'"assets":.+?"js":\s*("[^"]+")')
ASSETS_RE = r'"assets":.+?"js":\s*("[^"]+")'
jsplayer_url_json = self._search_regex(
ASSETS_RE,
embed_webpage if age_gate else video_webpage,
@ -2684,7 +2776,7 @@ class YoutubePlaylistIE(YoutubePlaylistBaseInfoExtractor):
ids = []
last_id = playlist_id[-11:]
for n in itertools.count(1):
url = 'https://www.youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
url = 'https://youtube.com/watch?v=%s&list=%s' % (last_id, playlist_id)
webpage = self._download_webpage(
url, playlist_id, 'Downloading page {0} of Youtube mix'.format(n))
new_ids = orderedSet(re.findall(
@ -2916,7 +3008,7 @@ class YoutubeChannelIE(YoutubePlaylistBaseInfoExtractor):
class YoutubeUserIE(YoutubeChannelIE):
IE_DESC = 'YouTube.com user videos (URL or "ytuser" keyword)'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9%-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_%-]+)'
_VALID_URL = r'(?:(?:https?://(?:\w+\.)?youtube\.com/(?:(?P<user>user|c)/)?(?!(?:attribution_link|watch|results|shared)(?:$|[^a-z_A-Z0-9-])))|ytuser:)(?!feed/)(?P<id>[A-Za-z0-9_-]+)'
_TEMPLATE_URL = 'https://www.youtube.com/%s/%s/videos'
IE_NAME = 'youtube:user'
@ -2946,9 +3038,6 @@ class YoutubeUserIE(YoutubeChannelIE):
}, {
'url': 'https://www.youtube.com/c/gametrailers',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/c/Pawe%C5%82Zadro%C5%BCniak',
'only_matching': True,
}, {
'url': 'https://www.youtube.com/gametrailers',
'only_matching': True,
@ -3027,7 +3116,7 @@ class YoutubeLiveIE(YoutubeBaseInfoExtractor):
class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
IE_DESC = 'YouTube.com user/channel playlists'
_VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel|c)/(?P<id>[^/]+)/playlists'
_VALID_URL = r'https?://(?:\w+\.)?youtube\.com/(?:user|channel)/(?P<id>[^/]+)/playlists'
IE_NAME = 'youtube:playlists'
_TESTS = [{
@ -3053,9 +3142,6 @@ class YoutubePlaylistsIE(YoutubePlaylistsBaseInfoExtractor):
'title': 'Chem Player',
},
'skip': 'Blocked',
}, {
'url': 'https://www.youtube.com/c/ChristophLaimer/playlists',
'only_matching': True,
}]
@ -3070,94 +3156,54 @@ class YoutubeSearchIE(SearchInfoExtractor, YoutubeSearchBaseInfoExtractor):
_MAX_RESULTS = float('inf')
IE_NAME = 'youtube:search'
_SEARCH_KEY = 'ytsearch'
_SEARCH_PARAMS = None
_EXTRA_QUERY_ARGS = {}
_TESTS = []
def _entries(self, query, n):
data = {
'context': {
'client': {
'clientName': 'WEB',
'clientVersion': '2.20201021.03.00',
}
},
'query': query,
}
if self._SEARCH_PARAMS:
data['params'] = self._SEARCH_PARAMS
total = 0
for page_num in itertools.count(1):
search = self._download_json(
'https://www.youtube.com/youtubei/v1/search?key=AIzaSyAO_FJ2SlqU8Q4STEHLGCilw_Y9_11qcW8',
video_id='query "%s"' % query,
note='Downloading page %s' % page_num,
errnote='Unable to download API page', fatal=False,
data=json.dumps(data).encode('utf8'),
headers={'content-type': 'application/json'})
if not search:
break
slr_contents = try_get(
search,
(lambda x: x['contents']['twoColumnSearchResultsRenderer']['primaryContents']['sectionListRenderer']['contents'],
lambda x: x['onResponseReceivedCommands'][0]['appendContinuationItemsAction']['continuationItems']),
list)
if not slr_contents:
break
isr_contents = try_get(
slr_contents,
lambda x: x[0]['itemSectionRenderer']['contents'],
list)
if not isr_contents:
break
for content in isr_contents:
if not isinstance(content, dict):
continue
video = content.get('videoRenderer')
if not isinstance(video, dict):
continue
video_id = video.get('videoId')
if not video_id:
continue
title = try_get(video, lambda x: x['title']['runs'][0]['text'], compat_str)
description = try_get(video, lambda x: x['descriptionSnippet']['runs'][0]['text'], compat_str)
duration = parse_duration(try_get(video, lambda x: x['lengthText']['simpleText'], compat_str))
view_count_text = try_get(video, lambda x: x['viewCountText']['simpleText'], compat_str) or ''
view_count = int_or_none(self._search_regex(
r'^(\d+)', re.sub(r'\s', '', view_count_text),
'view count', default=None))
uploader = try_get(video, lambda x: x['ownerText']['runs'][0]['text'], compat_str)
total += 1
yield {
'_type': 'url_transparent',
'ie_key': YoutubeIE.ie_key(),
'id': video_id,
'url': video_id,
'title': title,
'description': description,
'duration': duration,
'view_count': view_count,
'uploader': uploader,
}
if total == n:
return
token = try_get(
slr_contents,
lambda x: x[1]['continuationItemRenderer']['continuationEndpoint']['continuationCommand']['token'],
compat_str)
if not token:
break
data['continuation'] = token
def _get_n_results(self, query, n):
"""Get a specified number of results for a query"""
return self.playlist_result(self._entries(query, n), query)
videos = []
limit = n
url_query = {
'search_query': query.encode('utf-8'),
}
url_query.update(self._EXTRA_QUERY_ARGS)
result_url = 'https://www.youtube.com/results?' + compat_urllib_parse_urlencode(url_query)
for pagenum in itertools.count(1):
data = self._download_json(
result_url, video_id='query "%s"' % query,
note='Downloading page %s' % pagenum,
errnote='Unable to download API page',
query={'spf': 'navigate'})
html_content = data[1]['body']['content']
if 'class="search-message' in html_content:
raise ExtractorError(
'[youtube] No video results', expected=True)
new_videos = list(self._process_page(html_content))
videos += new_videos
if not new_videos or len(videos) > limit:
break
next_link = self._html_search_regex(
r'href="(/results\?[^"]*\bsp=[^"]+)"[^>]*>\s*<span[^>]+class="[^"]*\byt-uix-button-content\b[^"]*"[^>]*>Next',
html_content, 'next link', default=None)
if next_link is None:
break
result_url = compat_urlparse.urljoin('https://www.youtube.com/', next_link)
if len(videos) > n:
videos = videos[:n]
return self.playlist_result(videos, query)
class YoutubeSearchDateIE(YoutubeSearchIE):
IE_NAME = YoutubeSearchIE.IE_NAME + ':date'
_SEARCH_KEY = 'ytsearchdate'
IE_DESC = 'YouTube.com searches, newest videos first'
_SEARCH_PARAMS = 'CAI%3D'
_EXTRA_QUERY_ARGS = {'search_sort': 'video_date_uploaded'}
class YoutubeSearchURLIE(YoutubeSearchBaseInfoExtractor):
@ -3240,7 +3286,7 @@ class YoutubeFeedsInfoExtractor(YoutubeBaseInfoExtractor):
break
more = self._download_json(
'https://www.youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
'https://youtube.com/%s' % mobj.group('more'), self._PLAYLIST_TITLE,
'Downloading page #%s' % page_num,
transform_source=uppercase_escape,
headers=self._YOUTUBE_CLIENT_HEADERS)

View File

@ -13,7 +13,6 @@ from ..utils import (
encodeFilename,
PostProcessingError,
prepend_extension,
replace_extension,
shell_quote
)
@ -42,38 +41,6 @@ class EmbedThumbnailPP(FFmpegPostProcessor):
'Skipping embedding the thumbnail because the file is missing.')
return [], info
def is_webp(path):
with open(encodeFilename(path), 'rb') as f:
b = f.read(12)
return b[0:4] == b'RIFF' and b[8:] == b'WEBP'
# Correct extension for WebP file with wrong extension (see #25687, #25717)
_, thumbnail_ext = os.path.splitext(thumbnail_filename)
if thumbnail_ext:
thumbnail_ext = thumbnail_ext[1:].lower()
if thumbnail_ext != 'webp' and is_webp(thumbnail_filename):
self._downloader.to_screen(
'[ffmpeg] Correcting extension to webp and escaping path for thumbnail "%s"' % thumbnail_filename)
thumbnail_webp_filename = replace_extension(thumbnail_filename, 'webp')
os.rename(encodeFilename(thumbnail_filename), encodeFilename(thumbnail_webp_filename))
thumbnail_filename = thumbnail_webp_filename
thumbnail_ext = 'webp'
# Convert unsupported thumbnail formats to JPEG (see #25687, #25717)
if thumbnail_ext not in ['jpg', 'png']:
# NB: % is supposed to be escaped with %% but this does not work
# for input files so working around with standard substitution
escaped_thumbnail_filename = thumbnail_filename.replace('%', '#')
os.rename(encodeFilename(thumbnail_filename), encodeFilename(escaped_thumbnail_filename))
escaped_thumbnail_jpg_filename = replace_extension(escaped_thumbnail_filename, 'jpg')
self._downloader.to_screen('[ffmpeg] Converting thumbnail "%s" to JPEG' % escaped_thumbnail_filename)
self.run_ffmpeg(escaped_thumbnail_filename, escaped_thumbnail_jpg_filename, ['-bsf:v', 'mjpeg2jpeg'])
os.remove(encodeFilename(escaped_thumbnail_filename))
thumbnail_jpg_filename = replace_extension(thumbnail_filename, 'jpg')
# Rename back to unescaped for further processing
os.rename(encodeFilename(escaped_thumbnail_jpg_filename), encodeFilename(thumbnail_jpg_filename))
thumbnail_filename = thumbnail_jpg_filename
if info['ext'] == 'mp3':
options = [
'-c', 'copy', '-map', '0', '-map', '1',

View File

@ -4088,12 +4088,12 @@ def js_to_json(code):
'\\\n': '',
'\\x': '\\u00',
}.get(m.group(0), m.group(0)), v[1:-1])
else:
for regex, base in INTEGER_TABLE:
im = re.match(regex, v)
if im:
i = int(im.group(1), base)
return '"%d":' % i if v.endswith(':') else '%d' % i
for regex, base in INTEGER_TABLE:
im = re.match(regex, v)
if im:
i = int(im.group(1), base)
return '"%d":' % i if v.endswith(':') else '%d' % i
return '"%s"' % v
@ -4198,7 +4198,6 @@ def mimetype2ext(mt):
'vnd.ms-sstr+xml': 'ism',
'quicktime': 'mov',
'mp2t': 'ts',
'x-wav': 'wav',
}.get(res, res)

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals
__version__ = '2020.11.10'
__version__ = '2020.06.16.1'