1
0
mirror of https://github.com/l1ving/youtube-dl synced 2020-11-18 19:53:54 -08:00

Merge branch 'mark-facebook-live-videos' into fix-facebook-date

# Conflicts:
#	youtube_dl/extractor/facebook.py
This commit is contained in:
Avi Peretz 2019-04-03 14:51:44 +03:00
commit ad8df8889d
18 changed files with 406 additions and 38 deletions

View File

@ -6,8 +6,8 @@
--- ---
### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.03.09*. If it's not, read [this FAQ entry](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected. ### Make sure you are using the *latest* version: run `youtube-dl --version` and ensure your version is *2019.03.18*. If it's not, read [this FAQ entry](https://github.com/ytdl-org/youtube-dl/blob/master/README.md#how-do-i-update-youtube-dl) and update. Issues with outdated version will be rejected.
- [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.03.09** - [ ] I've **verified** and **I assure** that I'm running youtube-dl **2019.03.18**
### Before submitting an *issue* make sure you have: ### Before submitting an *issue* make sure you have:
- [ ] At least skimmed through the [README](https://github.com/ytdl-org/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/ytdl-org/youtube-dl#faq) and [BUGS](https://github.com/ytdl-org/youtube-dl#bugs) sections - [ ] At least skimmed through the [README](https://github.com/ytdl-org/youtube-dl/blob/master/README.md), **most notably** the [FAQ](https://github.com/ytdl-org/youtube-dl#faq) and [BUGS](https://github.com/ytdl-org/youtube-dl#bugs) sections
@ -36,7 +36,7 @@ Add the `-v` flag to **your command line** you run youtube-dl with (`youtube-dl
[debug] User config: [] [debug] User config: []
[debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj'] [debug] Command-line args: [u'-v', u'http://www.youtube.com/watch?v=BaW_jenozKcj']
[debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251 [debug] Encodings: locale cp1251, fs mbcs, out cp866, pref cp1251
[debug] youtube-dl version 2019.03.09 [debug] youtube-dl version 2019.03.18
[debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2 [debug] Python version 2.7.11 - Windows-2003Server-5.2.3790-SP2
[debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4 [debug] exe versions: ffmpeg N-75573-g1d0487f, ffprobe N-75573-g1d0487f, rtmpdump 2.4
[debug] Proxy map: {} [debug] Proxy map: {}

View File

@ -1,3 +1,37 @@
version 2019.03.18
Core
* [extractor/common] Improve HTML5 entries extraction
+ [utils] Introduce parse_bitrate
* [update] Hide update URLs behind redirect
* [extractor/common] Fix url meta field for unfragmented DASH formats (#20346)
Extractors
+ [yandexvideo] Add extractor
* [openload] Improve embed detection
+ [corus] Add support for bigbrothercanada.ca (#20357)
+ [orf:radio] Extract series (#20012)
+ [cbc:watch] Add support for gem.cbc.ca (#20251, #20359)
- [anysex] Remove extractor (#19279)
+ [ciscolive] Add support for new URL schema (#20320, #20351)
+ [youtube] Add support for invidiou.sh (#20309)
- [anitube] Remove extractor (#20334)
- [ruleporn] Remove extractor (#15344, #20324)
* [npr] Fix extraction (#10793, #13440)
* [biqle] Fix extraction (#11471, #15313)
* [viddler] Modernize
* [moevideo] Fix extraction
* [primesharetv] Remove extractor
* [hypem] Modernize and extract more metadata (#15320)
* [veoh] Fix extraction
* [escapist] Modernize
- [videomega] Remove extractor (#10108)
+ [beeg] Add support for beeg.porn (#20306)
* [vimeo:review] Improve config url extraction and extract original format
(#20305)
* [fox] Detect geo restriction and authentication errors (#20208)
version 2019.03.09 version 2019.03.09
Core Core

View File

@ -44,9 +44,7 @@
- **AmericasTestKitchen** - **AmericasTestKitchen**
- **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl - **anderetijden**: npo.nl, ntr.nl, omroepwnl.nl, zapp.nl and npo3.nl
- **AnimeOnDemand** - **AnimeOnDemand**
- **anitube.se**
- **Anvato** - **Anvato**
- **AnySex**
- **APA** - **APA**
- **Aparat** - **Aparat**
- **AppleConnect** - **AppleConnect**
@ -698,7 +696,6 @@
- **PornoXO** - **PornoXO**
- **PornTube** - **PornTube**
- **PressTV** - **PressTV**
- **PrimeShareTV**
- **PromptFile** - **PromptFile**
- **prosiebensat1**: ProSiebenSat.1 Digital - **prosiebensat1**: ProSiebenSat.1 Digital
- **puhutv** - **puhutv**
@ -718,7 +715,7 @@
- **radio.de** - **radio.de**
- **radiobremen** - **radiobremen**
- **radiocanada** - **radiocanada**
- **RadioCanadaAudioVideo** - **radiocanada:audiovideo**
- **radiofrance** - **radiofrance**
- **RadioJavan** - **RadioJavan**
- **Rai** - **Rai**
@ -765,7 +762,6 @@
- **RTVS** - **RTVS**
- **Rudo** - **Rudo**
- **RUHD** - **RUHD**
- **RulePorn**
- **rutube**: Rutube videos - **rutube**: Rutube videos
- **rutube:channel**: Rutube channels - **rutube:channel**: Rutube channels
- **rutube:embed**: Rutube embedded videos - **rutube:embed**: Rutube embedded videos
@ -1010,7 +1006,6 @@
- **video.mit.edu** - **video.mit.edu**
- **VideoDetective** - **VideoDetective**
- **videofy.me** - **videofy.me**
- **VideoMega**
- **videomore** - **videomore**
- **videomore:season** - **videomore:season**
- **videomore:video** - **videomore:video**
@ -1127,6 +1122,7 @@
- **yandexmusic:album**: Яндекс.Музыка - Альбом - **yandexmusic:album**: Яндекс.Музыка - Альбом
- **yandexmusic:playlist**: Яндекс.Музыка - Плейлист - **yandexmusic:playlist**: Яндекс.Музыка - Плейлист
- **yandexmusic:track**: Яндекс.Музыка - Трек - **yandexmusic:track**: Яндекс.Музыка - Трек
- **YandexVideo**
- **YapFiles** - **YapFiles**
- **YesJapan** - **YesJapan**
- **yinyuetai:video**: 音悦Tai - **yinyuetai:video**: 音悦Tai

View File

@ -33,11 +33,13 @@ from youtube_dl.utils import (
ExtractorError, ExtractorError,
find_xpath_attr, find_xpath_attr,
fix_xml_ampersands, fix_xml_ampersands,
float_or_none,
get_element_by_class, get_element_by_class,
get_element_by_attribute, get_element_by_attribute,
get_elements_by_class, get_elements_by_class,
get_elements_by_attribute, get_elements_by_attribute,
InAdvancePagedList, InAdvancePagedList,
int_or_none,
intlist_to_bytes, intlist_to_bytes,
is_html, is_html,
js_to_json, js_to_json,
@ -468,6 +470,21 @@ class TestUtil(unittest.TestCase):
shell_quote(args), shell_quote(args),
"""ffmpeg -i 'ñ€ß'"'"'.mp4'""" if compat_os_name != 'nt' else '''ffmpeg -i "ñ€ß'.mp4"''') """ffmpeg -i 'ñ€ß'"'"'.mp4'""" if compat_os_name != 'nt' else '''ffmpeg -i "ñ€ß'.mp4"''')
def test_float_or_none(self):
self.assertEqual(float_or_none('42.42'), 42.42)
self.assertEqual(float_or_none('42'), 42.0)
self.assertEqual(float_or_none(''), None)
self.assertEqual(float_or_none(None), None)
self.assertEqual(float_or_none([]), None)
self.assertEqual(float_or_none(set()), None)
def test_int_or_none(self):
self.assertEqual(int_or_none('42'), 42)
self.assertEqual(int_or_none(''), None)
self.assertEqual(int_or_none(None), None)
self.assertEqual(int_or_none([]), None)
self.assertEqual(int_or_none(set()), None)
def test_str_to_int(self): def test_str_to_int(self):
self.assertEqual(str_to_int('123,456'), 123456) self.assertEqual(str_to_int('123,456'), 123456)
self.assertEqual(str_to_int('123.456'), 123456) self.assertEqual(str_to_int('123.456'), 123456)

View File

@ -166,6 +166,8 @@ def _real_main(argv=None):
if opts.max_sleep_interval is not None: if opts.max_sleep_interval is not None:
if opts.max_sleep_interval < 0: if opts.max_sleep_interval < 0:
parser.error('max sleep interval must be positive or 0') parser.error('max sleep interval must be positive or 0')
if opts.sleep_interval is None:
parser.error('min sleep interval must be specified, use --min-sleep-interval')
if opts.max_sleep_interval < opts.sleep_interval: if opts.max_sleep_interval < opts.sleep_interval:
parser.error('max sleep interval must be greater than or equal to min sleep interval') parser.error('max sleep interval must be greater than or equal to min sleep interval')
else: else:

View File

@ -79,7 +79,7 @@ class CWTVIE(InfoExtractor):
season = str_or_none(video_data.get('season')) season = str_or_none(video_data.get('season'))
episode = str_or_none(video_data.get('episode')) episode = str_or_none(video_data.get('episode'))
if episode and season: if episode and season:
episode = episode.lstrip(season) episode = episode[len(season):]
return { return {
'_type': 'url_transparent', '_type': 'url_transparent',

View File

@ -632,7 +632,10 @@ from .massengeschmacktv import MassengeschmackTVIE
from .matchtv import MatchTVIE from .matchtv import MatchTVIE
from .mdr import MDRIE from .mdr import MDRIE
from .mediaset import MediasetIE from .mediaset import MediasetIE
from .mediasite import MediasiteIE from .mediasite import (
MediasiteIE,
MediasiteCatalogIE,
)
from .medici import MediciIE from .medici import MediciIE
from .megaphone import MegaphoneIE from .megaphone import MegaphoneIE
from .meipai import MeipaiIE from .meipai import MeipaiIE
@ -1114,6 +1117,7 @@ from .teachertube import (
) )
from .teachingchannel import TeachingChannelIE from .teachingchannel import TeachingChannelIE
from .teamcoco import TeamcocoIE from .teamcoco import TeamcocoIE
from .teamtreehouse import TeamTreeHouseIE
from .techtalks import TechTalksIE from .techtalks import TechTalksIE
from .ted import TEDIE from .ted import TEDIE
from .tele5 import Tele5IE from .tele5 import Tele5IE

View File

@ -218,6 +218,7 @@ class FacebookIE(InfoExtractor):
'ext': 'mp4', 'ext': 'mp4',
'title': '#ESLOne VoD - Birmingham Finals Day#1 Fnatic vs. @Evil Geniuses', 'title': '#ESLOne VoD - Birmingham Finals Day#1 Fnatic vs. @Evil Geniuses',
'uploader': 'ESL One Dota 2', 'uploader': 'ESL One Dota 2',
'is_live': False
}, },
'params': { 'params': {
'skip_download': True, 'skip_download': True,
@ -378,6 +379,8 @@ class FacebookIE(InfoExtractor):
if not video_data: if not video_data:
raise ExtractorError('Cannot parse data') raise ExtractorError('Cannot parse data')
is_live = video_data[0].get('is_broadcast', False) and video_data[0].get('is_live_stream', False)
formats = [] formats = []
for f in video_data: for f in video_data:
format_id = f['stream_type'] format_id = f['stream_type']
@ -462,6 +465,7 @@ class FacebookIE(InfoExtractor):
'thumbnail': thumbnail, 'thumbnail': thumbnail,
'view_count': view_count, 'view_count': view_count,
'uploader_id': uploader_id 'uploader_id': uploader_id
'is_live': is_live
} }
return webpage, info_dict return webpage, info_dict

View File

@ -1,36 +1,83 @@
# coding: utf-8 # coding: utf-8
from __future__ import unicode_literals from __future__ import unicode_literals
import re
from .common import InfoExtractor from .common import InfoExtractor
from ..utils import (
determine_ext,
int_or_none,
strip_or_none,
xpath_attr,
xpath_text,
)
class InaIE(InfoExtractor): class InaIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?ina\.fr/video/(?P<id>I?[A-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?ina\.fr/(?:video|audio)/(?P<id>[A-Z0-9_]+)'
_TEST = { _TESTS = [{
'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html', 'url': 'http://www.ina.fr/video/I12055569/francois-hollande-je-crois-que-c-est-clair-video.html',
'md5': 'a667021bf2b41f8dc6049479d9bb38a3', 'md5': 'a667021bf2b41f8dc6049479d9bb38a3',
'info_dict': { 'info_dict': {
'id': 'I12055569', 'id': 'I12055569',
'ext': 'mp4', 'ext': 'mp4',
'title': 'François Hollande "Je crois que c\'est clair"', 'title': 'François Hollande "Je crois que c\'est clair"',
'description': 'md5:3f09eb072a06cb286b8f7e4f77109663',
} }
} }, {
'url': 'https://www.ina.fr/video/S806544_001/don-d-organes-des-avancees-mais-d-importants-besoins-video.html',
'only_matching': True,
}, {
'url': 'https://www.ina.fr/audio/P16173408',
'only_matching': True,
}, {
'url': 'https://www.ina.fr/video/P16173408-video.html',
'only_matching': True,
}]
def _real_extract(self, url): def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url) video_id = self._match_id(url)
info_doc = self._download_xml(
'http://player.ina.fr/notices/%s.mrss' % video_id, video_id)
item = info_doc.find('channel/item')
title = xpath_text(item, 'title', fatal=True)
media_ns_xpath = lambda x: self._xpath_ns(x, 'http://search.yahoo.com/mrss/')
content = item.find(media_ns_xpath('content'))
video_id = mobj.group('id') get_furl = lambda x: xpath_attr(content, media_ns_xpath(x), 'url')
mrss_url = 'http://player.ina.fr/notices/%s.mrss' % video_id formats = []
info_doc = self._download_xml(mrss_url, video_id) for q, w, h in (('bq', 400, 300), ('mq', 512, 384), ('hq', 768, 576)):
q_url = get_furl(q)
if not q_url:
continue
formats.append({
'format_id': q,
'url': q_url,
'width': w,
'height': h,
})
if not formats:
furl = get_furl('player') or content.attrib['url']
ext = determine_ext(furl)
formats = [{
'url': furl,
'vcodec': 'none' if ext == 'mp3' else None,
'ext': ext,
}]
self.report_extraction(video_id) thumbnails = []
for thumbnail in content.findall(media_ns_xpath('thumbnail')):
video_url = info_doc.find('.//{http://search.yahoo.com/mrss/}player').attrib['url'] thumbnail_url = thumbnail.get('url')
if not thumbnail_url:
continue
thumbnails.append({
'url': thumbnail_url,
'height': int_or_none(thumbnail.get('height')),
'width': int_or_none(thumbnail.get('width')),
})
return { return {
'id': video_id, 'id': video_id,
'url': video_url, 'formats': formats,
'title': info_doc.find('.//title').text, 'title': title,
'description': strip_or_none(xpath_text(item, 'description')),
'thumbnails': thumbnails,
} }

View File

@ -13,6 +13,8 @@ from ..utils import (
ExtractorError, ExtractorError,
float_or_none, float_or_none,
mimetype2ext, mimetype2ext,
str_or_none,
try_get,
unescapeHTML, unescapeHTML,
unsmuggle_url, unsmuggle_url,
url_or_none, url_or_none,
@ -20,8 +22,11 @@ from ..utils import (
) )
_ID_RE = r'[0-9a-f]{32,34}'
class MediasiteIE(InfoExtractor): class MediasiteIE(InfoExtractor):
_VALID_URL = r'(?xi)https?://[^/]+/Mediasite/(?:Play|Showcase/(?:default|livebroadcast)/Presentation)/(?P<id>[0-9a-f]{32,34})(?P<query>\?[^#]+|)' _VALID_URL = r'(?xi)https?://[^/]+/Mediasite/(?:Play|Showcase/(?:default|livebroadcast)/Presentation)/(?P<id>%s)(?P<query>\?[^#]+|)' % _ID_RE
_TESTS = [ _TESTS = [
{ {
'url': 'https://hitsmediaweb.h-its.org/mediasite/Play/2db6c271681e4f199af3c60d1f82869b1d', 'url': 'https://hitsmediaweb.h-its.org/mediasite/Play/2db6c271681e4f199af3c60d1f82869b1d',
@ -109,7 +114,7 @@ class MediasiteIE(InfoExtractor):
return [ return [
unescapeHTML(mobj.group('url')) unescapeHTML(mobj.group('url'))
for mobj in re.finditer( for mobj in re.finditer(
r'(?xi)<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:(?:https?:)?//[^/]+)?/Mediasite/Play/[0-9a-f]{32,34}(?:\?.*?)?)\1', r'(?xi)<iframe\b[^>]+\bsrc=(["\'])(?P<url>(?:(?:https?:)?//[^/]+)?/Mediasite/Play/%s(?:\?.*?)?)\1' % _ID_RE,
webpage)] webpage)]
def _real_extract(self, url): def _real_extract(self, url):
@ -221,3 +226,110 @@ class MediasiteIE(InfoExtractor):
'formats': formats, 'formats': formats,
'thumbnails': thumbnails, 'thumbnails': thumbnails,
} }
class MediasiteCatalogIE(InfoExtractor):
_VALID_URL = r'''(?xi)
(?P<url>https?://[^/]+/Mediasite)
/Catalog/Full/
(?P<catalog_id>{0})
(?:
/(?P<current_folder_id>{0})
/(?P<root_dynamic_folder_id>{0})
)?
'''.format(_ID_RE)
_TESTS = [{
'url': 'http://events7.mediasite.com/Mediasite/Catalog/Full/631f9e48530d454381549f955d08c75e21',
'info_dict': {
'id': '631f9e48530d454381549f955d08c75e21',
'title': 'WCET Summit: Adaptive Learning in Higher Ed: Improving Outcomes Dynamically',
},
'playlist_count': 6,
'expected_warnings': ['is not a supported codec'],
}, {
# with CurrentFolderId and RootDynamicFolderId
'url': 'https://medaudio.medicine.iu.edu/Mediasite/Catalog/Full/9518c4a6c5cf4993b21cbd53e828a92521/97a9db45f7ab47428c77cd2ed74bb98f14/9518c4a6c5cf4993b21cbd53e828a92521',
'info_dict': {
'id': '9518c4a6c5cf4993b21cbd53e828a92521',
'title': 'IUSM Family and Friends Sessions',
},
'playlist_count': 2,
}, {
'url': 'http://uipsyc.mediasite.com/mediasite/Catalog/Full/d5d79287c75243c58c50fef50174ec1b21',
'only_matching': True,
}, {
# no AntiForgeryToken
'url': 'https://live.libraries.psu.edu/Mediasite/Catalog/Full/8376d4b24dd1457ea3bfe4cf9163feda21',
'only_matching': True,
}, {
'url': 'https://medaudio.medicine.iu.edu/Mediasite/Catalog/Full/9518c4a6c5cf4993b21cbd53e828a92521/97a9db45f7ab47428c77cd2ed74bb98f14/9518c4a6c5cf4993b21cbd53e828a92521',
'only_matching': True,
}]
def _real_extract(self, url):
mobj = re.match(self._VALID_URL, url)
mediasite_url = mobj.group('url')
catalog_id = mobj.group('catalog_id')
current_folder_id = mobj.group('current_folder_id') or catalog_id
root_dynamic_folder_id = mobj.group('root_dynamic_folder_id')
webpage = self._download_webpage(url, catalog_id)
# AntiForgeryToken is optional (e.g. [1])
# 1. https://live.libraries.psu.edu/Mediasite/Catalog/Full/8376d4b24dd1457ea3bfe4cf9163feda21
anti_forgery_token = self._search_regex(
r'AntiForgeryToken\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'anti forgery token', default=None, group='value')
if anti_forgery_token:
anti_forgery_header = self._search_regex(
r'AntiForgeryHeaderName\s*:\s*(["\'])(?P<value>(?:(?!\1).)+)\1',
webpage, 'anti forgery header name',
default='X-SOFO-AntiForgeryHeader', group='value')
data = {
'IsViewPage': True,
'IsNewFolder': True,
'AuthTicket': None,
'CatalogId': catalog_id,
'CurrentFolderId': current_folder_id,
'RootDynamicFolderId': root_dynamic_folder_id,
'ItemsPerPage': 1000,
'PageIndex': 0,
'PermissionMask': 'Execute',
'CatalogSearchType': 'SearchInFolder',
'SortBy': 'Date',
'SortDirection': 'Descending',
'StartDate': None,
'EndDate': None,
'StatusFilterList': None,
'PreviewKey': None,
'Tags': [],
}
headers = {
'Content-Type': 'application/json; charset=UTF-8',
'Referer': url,
'X-Requested-With': 'XMLHttpRequest',
}
if anti_forgery_token:
headers[anti_forgery_header] = anti_forgery_token
catalog = self._download_json(
'%s/Catalog/Data/GetPresentationsForFolder' % mediasite_url,
catalog_id, data=json.dumps(data).encode(), headers=headers)
entries = []
for video in catalog['PresentationDetailsList']:
if not isinstance(video, dict):
continue
video_id = str_or_none(video.get('Id'))
if not video_id:
continue
entries.append(self.url_result(
'%s/Play/%s' % (mediasite_url, video_id),
ie=MediasiteIE.ie_key(), video_id=video_id))
title = try_get(
catalog, lambda x: x['CurrentFolder']['Name'], compat_str)
return self.playlist_result(entries, catalog_id, title,)

View File

@ -181,10 +181,7 @@ class NPOIE(NPOBaseIE):
def _real_extract(self, url): def _real_extract(self, url):
video_id = self._match_id(url) video_id = self._match_id(url)
try: return self._get_info(url, video_id) or self._get_old_info(video_id)
return self._get_info(url, video_id)
except ExtractorError:
return self._get_old_info(video_id)
def _get_info(self, url, video_id): def _get_info(self, url, video_id):
token = self._download_json( token = self._download_json(
@ -206,6 +203,7 @@ class NPOIE(NPOBaseIE):
player_token = player['token'] player_token = player['token']
drm = False
format_urls = set() format_urls = set()
formats = [] formats = []
for profile in ('hls', 'dash-widevine', 'dash-playready', 'smooth'): for profile in ('hls', 'dash-widevine', 'dash-playready', 'smooth'):
@ -227,7 +225,8 @@ class NPOIE(NPOBaseIE):
if not stream_url or stream_url in format_urls: if not stream_url or stream_url in format_urls:
continue continue
format_urls.add(stream_url) format_urls.add(stream_url)
if stream.get('protection') is not None: if stream.get('protection') is not None or stream.get('keySystemOptions') is not None:
drm = True
continue continue
stream_type = stream.get('type') stream_type = stream.get('type')
stream_ext = determine_ext(stream_url) stream_ext = determine_ext(stream_url)
@ -246,6 +245,11 @@ class NPOIE(NPOBaseIE):
'url': stream_url, 'url': stream_url,
}) })
if not formats:
if drm:
raise ExtractorError('This video is DRM protected.', expected=True)
return
self._sort_formats(formats) self._sort_formats(formats)
info = { info = {

View File

@ -14,6 +14,7 @@ from ..compat import (
) )
from .openload import PhantomJSwrapper from .openload import PhantomJSwrapper
from ..utils import ( from ..utils import (
determine_ext,
ExtractorError, ExtractorError,
int_or_none, int_or_none,
orderedSet, orderedSet,
@ -275,6 +276,10 @@ class PornHubIE(PornHubBaseIE):
r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None) r'/(\d{6}/\d{2})/', video_url, 'upload data', default=None)
if upload_date: if upload_date:
upload_date = upload_date.replace('/', '') upload_date = upload_date.replace('/', '')
if determine_ext(video_url) == 'mpd':
formats.extend(self._extract_mpd_formats(
video_url, video_id, mpd_id='dash', fatal=False))
continue
tbr = None tbr = None
mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url) mobj = re.search(r'(?P<height>\d+)[pP]?_(?P<tbr>\d+)[kK]', video_url)
if mobj: if mobj:

View File

@ -185,7 +185,7 @@ class SVTPlayIE(SVTPlayBaseIE):
def _extract_by_video_id(self, video_id, webpage=None): def _extract_by_video_id(self, video_id, webpage=None):
data = self._download_json( data = self._download_json(
'https://api.svt.se/videoplayer-api/video/%s' % video_id, 'https://api.svt.se/video/%s' % video_id,
video_id, headers=self.geo_verification_headers()) video_id, headers=self.geo_verification_headers())
info_dict = self._extract_video(data, video_id) info_dict = self._extract_video(data, video_id)
if not info_dict.get('title'): if not info_dict.get('title'):

View File

@ -0,0 +1,140 @@
# coding: utf-8
from __future__ import unicode_literals
import re
from .common import InfoExtractor
from ..utils import (
clean_html,
determine_ext,
ExtractorError,
float_or_none,
get_element_by_class,
get_element_by_id,
parse_duration,
remove_end,
urlencode_postdata,
urljoin,
)
class TeamTreeHouseIE(InfoExtractor):
_VALID_URL = r'https?://(?:www\.)?teamtreehouse\.com/library/(?P<id>[^/]+)'
_TESTS = [{
# Course
'url': 'https://teamtreehouse.com/library/introduction-to-user-authentication-in-php',
'info_dict': {
'id': 'introduction-to-user-authentication-in-php',
'title': 'Introduction to User Authentication in PHP',
'description': 'md5:405d7b4287a159b27ddf30ca72b5b053',
},
'playlist_mincount': 24,
}, {
# WorkShop
'url': 'https://teamtreehouse.com/library/deploying-a-react-app',
'info_dict': {
'id': 'deploying-a-react-app',
'title': 'Deploying a React App',
'description': 'md5:10a82e3ddff18c14ac13581c9b8e5921',
},
'playlist_mincount': 4,
}, {
# Video
'url': 'https://teamtreehouse.com/library/application-overview-2',
'info_dict': {
'id': 'application-overview-2',
'ext': 'mp4',
'title': 'Application Overview',
'description': 'md5:4b0a234385c27140a4378de5f1e15127',
},
'expected_warnings': ['This is just a preview'],
}]
_NETRC_MACHINE = 'teamtreehouse'
def _real_initialize(self):
email, password = self._get_login_info()
if email is None:
return
signin_page = self._download_webpage(
'https://teamtreehouse.com/signin',
None, 'Downloading signin page')
data = self._form_hidden_inputs('new_user_session', signin_page)
data.update({
'user_session[email]': email,
'user_session[password]': password,
})
error_message = get_element_by_class('error-message', self._download_webpage(
'https://teamtreehouse.com/person_session',
None, 'Logging in', data=urlencode_postdata(data)))
if error_message:
raise ExtractorError(clean_html(error_message), expected=True)
def _real_extract(self, url):
display_id = self._match_id(url)
webpage = self._download_webpage(url, display_id)
title = self._html_search_meta(['og:title', 'twitter:title'], webpage)
description = self._html_search_meta(
['description', 'og:description', 'twitter:description'], webpage)
entries = self._parse_html5_media_entries(url, webpage, display_id)
if entries:
info = entries[0]
for subtitles in info.get('subtitles', {}).values():
for subtitle in subtitles:
subtitle['ext'] = determine_ext(subtitle['url'], 'srt')
is_preview = 'data-preview="true"' in webpage
if is_preview:
self.report_warning(
'This is just a preview. You need to be signed in with a Basic account to download the entire video.', display_id)
duration = 30
else:
duration = float_or_none(self._search_regex(
r'data-duration="(\d+)"', webpage, 'duration'), 1000)
if not duration:
duration = parse_duration(get_element_by_id(
'video-duration', webpage))
info.update({
'id': display_id,
'title': title,
'description': description,
'duration': duration,
})
return info
else:
def extract_urls(html, extract_info=None):
for path in re.findall(r'<a[^>]+href="([^"]+)"', html):
page_url = urljoin(url, path)
entry = {
'_type': 'url_transparent',
'id': self._match_id(page_url),
'url': page_url,
'id_key': self.ie_key(),
}
if extract_info:
entry.update(extract_info)
entries.append(entry)
workshop_videos = self._search_regex(
r'(?s)<ul[^>]+id="workshop-videos"[^>]*>(.+?)</ul>',
webpage, 'workshop videos', default=None)
if workshop_videos:
extract_urls(workshop_videos)
else:
stages_path = self._search_regex(
r'(?s)<div[^>]+id="syllabus-stages"[^>]+data-url="([^"]+)"',
webpage, 'stages path')
if stages_path:
stages_page = self._download_webpage(
urljoin(url, stages_path), display_id, 'Downloading stages page')
for chapter_number, (chapter, steps_list) in enumerate(re.findall(r'(?s)<h2[^>]*>\s*(.+?)\s*</h2>.+?<ul[^>]*>(.+?)</ul>', stages_page), 1):
extract_urls(steps_list, {
'chapter': chapter,
'chapter_number': chapter_number,
})
title = remove_end(title, ' Course')
return self.playlist_result(
entries, display_id, title, description)

View File

@ -19,7 +19,7 @@ from ..utils import (
class WeiboIE(InfoExtractor): class WeiboIE(InfoExtractor):
_VALID_URL = r'https?://weibo\.com/[0-9]+/(?P<id>[a-zA-Z0-9]+)' _VALID_URL = r'https?://(?:www\.)?weibo\.com/[0-9]+/(?P<id>[a-zA-Z0-9]+)'
_TEST = { _TEST = {
'url': 'https://weibo.com/6275294458/Fp6RGfbff?type=comment', 'url': 'https://weibo.com/6275294458/Fp6RGfbff?type=comment',
'info_dict': { 'info_dict': {

View File

@ -20,7 +20,7 @@ from ..utils import (
class XHamsterIE(InfoExtractor): class XHamsterIE(InfoExtractor):
_VALID_URL = r'''(?x) _VALID_URL = r'''(?x)
https?:// https?://
(?:.+?\.)?xhamster\.com/ (?:.+?\.)?xhamster\.(?:com|one)/
(?: (?:
movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html| movies/(?P<id>\d+)/(?P<display_id>[^/]*)\.html|
videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+) videos/(?P<display_id_2>[^/]*)-(?P<id_2>\d+)
@ -91,6 +91,9 @@ class XHamsterIE(InfoExtractor):
# new URL schema # new URL schema
'url': 'https://pt.xhamster.com/videos/euro-pedal-pumping-7937821', 'url': 'https://pt.xhamster.com/videos/euro-pedal-pumping-7937821',
'only_matching': True, 'only_matching': True,
}, {
'url': 'https://xhamster.one/videos/femaleagent-shy-beauty-takes-the-bait-1509445',
'only_matching': True,
}] }]
def _real_extract(self, url): def _real_extract(self, url):

View File

@ -1922,7 +1922,7 @@ def int_or_none(v, scale=1, default=None, get_attr=None, invscale=1):
return default return default
try: try:
return int(v) * invscale // scale return int(v) * invscale // scale
except ValueError: except (ValueError, TypeError):
return default return default
@ -1943,7 +1943,7 @@ def float_or_none(v, scale=1, invscale=1, default=None):
return default return default
try: try:
return float(v) * invscale / scale return float(v) * invscale / scale
except ValueError: except (ValueError, TypeError):
return default return default

View File

@ -1,3 +1,3 @@
from __future__ import unicode_literals from __future__ import unicode_literals
__version__ = '2019.03.09' __version__ = '2019.03.18'