Browse Source

Merge pull request #38 from aaronpk/library-refactor

Refactors into a library that can be used separately from the API
pull/39/head v1.1.0
Aaron Parecki 7 years ago
committed by GitHub
parent
commit
11977e6746
39 changed files with 2225 additions and 1204 deletions
  1. +2
    -1
      .gitignore
  2. +18
    -4
      LICENSE.txt
  3. +91
    -5
      README.md
  4. +22
    -19
      composer.json
  5. +1207
    -100
      composer.lock
  6. +2
    -2
      controllers/Certbot.php
  7. +1
    -1
      controllers/Main.php
  8. +47
    -312
      controllers/Parse.php
  9. +8
    -46
      controllers/Rels.php
  10. +1
    -1
      controllers/Token.php
  11. +0
    -122
      lib/Formats/GitHub.php
  12. +0
    -56
      lib/HTTP.php
  13. +0
    -127
      lib/HTTPCurl.php
  14. +0
    -138
      lib/HTTPStream.php
  15. +0
    -92
      lib/HTTPTest.php
  16. +42
    -0
      lib/XRay.php
  17. +169
    -0
      lib/XRay/Fetcher.php
  18. +36
    -0
      lib/XRay/Formats/Format.php
  19. +166
    -0
      lib/XRay/Formats/GitHub.php
  20. +132
    -0
      lib/XRay/Formats/HTML.php
  21. +1
    -1
      lib/XRay/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php
  22. +22
    -15
      lib/XRay/Formats/Instagram.php
  23. +26
    -48
      lib/XRay/Formats/Mf2.php
  24. +50
    -31
      lib/XRay/Formats/Twitter.php
  25. +12
    -25
      lib/XRay/Formats/XKCD.php
  26. +41
    -0
      lib/XRay/Parser.php
  27. +63
    -0
      lib/XRay/Rels.php
  28. +2
    -1
      lib/helpers.php
  29. BIN
      public/images/xkcd.png
  30. +1
    -1
      tests/AuthorTest.php
  31. +1
    -1
      tests/FeedTest.php
  32. +1
    -1
      tests/FetchTest.php
  33. +1
    -1
      tests/GitHubTest.php
  34. +8
    -2
      tests/HelpersTest.php
  35. +5
    -5
      tests/InstagramTest.php
  36. +27
    -26
      tests/ParseTest.php
  37. +1
    -1
      tests/SanitizeTest.php
  38. +1
    -1
      tests/TokenTest.php
  39. +18
    -18
      tests/TwitterTest.php

+ 2
- 1
.gitignore View File

@ -1,4 +1,5 @@
.DS_Store
config.php
vendor/
XRay-*.json
php_errors.log
XRay-*.json

+ 18
- 4
LICENSE.txt View File

@ -1,7 +1,21 @@
Copyright 2016 by Aaron Parecki
MIT License
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
Copyright (c) 2017 Aaron Parecki
http://www.apache.org/licenses/LICENSE-2.0
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

+ 91
- 5
README.md View File

@ -9,13 +9,66 @@ XRay parses structured content from a URL.
The contents of the URL is checked in the following order:
* A silo URL from one of the following websites:
** Instagram
** Twitter
** (more coming soon)
* h-entry, h-event, h-card
* Instagram
* Twitter
* GitHub
* XKCD
* (more coming soon)
* Microformats
* h-card
* h-entry
* h-event
* h-review
* h-recipe
* h-product
## Library
XRay can be used as a library in your PHP project. The easiest way to install it and its dependencies is via composer.
```
composer require p3k/xray
```
Basic usage:
```php
$xray = new p3k\XRay();
$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/');
```
If you already have an HTML or JSON document you want to parse, you can pass it as a string in the second parameter.
```php
$xray = new p3k\XRay();
$html = '<html>....</html>';
$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', $html);
```
In both cases, you can add an additional parameter to configure various options of how XRay will behave. Below is a list of the options.
## Parse API
* `timeout` - The timeout in seconds to wait for any HTTP requests
* `max_redirects` - The maximum number of redirects to follow
* `include_original` - Will also return the full document fetched
* `target` - Specify a target URL, and XRay will first check if that URL is on the page, and only if it is, will continue to parse the page. This is useful when you're using XRay to verify an incoming webmention.
Additionally, the following parameters are supported when making requests that use the Twitter or GitHub API. See the authentication section below for details.
```php
$xray = new p3k\XRay();
$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', [
'timeout' => 30
]);
$parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', $html, [
'target' => 'http://example.com/'
]);
```
## API
XRay can also be used as an API to provide its parsing capabilities over an HTTP service.
To parse a page and return structured data for the contents of the page, simply pass a url to the parse route.
@ -33,6 +86,26 @@ In both cases, the response will be a JSON object containing a key of "type". If
You can also make a POST request with the same parameter names.
If you already have an HTML or JSON document you want to parse, you can include that in the parameter `body`. This POST request would look like the below:
```
POST /parse
Content-type: application/x-www-form-urlencoded
url=https://aaronparecki.com/2016/01/16/11/
&body=<html>....</html>
```
or for Twitter/GitHub where you might have JSON,
```
POST /parse
Content-type: application/x-www-form-urlencoded
url=https://github.com/aaronpk/XRay
&body={"repo":......}
```
### Authentication
If the URL you are fetching requires authentication, include the access token in the parameter "token", and it will be included in an "Authorization" header when fetching the URL. (It is recommended to use a POST request in this case, to avoid the access token potentially being logged as part of the query string.) This is useful for [Private Webmention](https://indieweb.org/Private-Webmention) verification.
@ -57,6 +130,13 @@ You should only send Twitter credentials when the URL you are trying to parse is
* twitter_access_token_secret - Your Twitter secret access token
### GitHub Authentication
XRay uses the GitHub API to fetch GitHub URLs, which provides higher rate limits when used with authentication. You can pass a GitHub access token along with the request and XRay will use it when making requests to the API.
* github_access_token - A GitHub access token
### Error Response
```json
@ -119,8 +199,14 @@ The primary object on the page is returned in the `data` property. This will ind
If a property supports multiple values, it will always be returned as an array. The following properties support multiple values:
* in-reply-to
* like-of
* repost-of
* bookmark-of
* syndication
* photo (of entry, not of a card)
* video
* audio
* category
The content will be an object that always contains a "text" property and may contain an "html" property if the source documented published HTML content. The "text" property must always be HTML escaped before displaying it as HTML, as it may include unescaped characters such as `<` and `>`.

+ 22
- 19
composer.json View File

@ -1,36 +1,39 @@
{
"name": "p3k/xray",
"type": "library",
"license": "MIT",
"homepage": "https://github.com/aaronpk/XRay",
"description": "X-Ray returns structured data from any URL",
"require": {
"league/plates": "3.*",
"league/route": "1.*",
"mf2/mf2": "~0.3",
"ezyang/htmlpurifier": "4.*",
"indieweb/link-rel-parser": "0.1.*",
"dg/twitter-php": "^3.6",
"dg/twitter-php": "3.6.*",
"p3k/timezone": "*",
"cebe/markdown": "~1.1.1"
"p3k/http": "0.1.*",
"cebe/markdown": "1.1.*"
},
"autoload": {
"psr-4": {
"p3k\\XRay\\": "lib/XRay"
},
"files": [
"lib/helpers.php",
"controllers/Main.php",
"controllers/Parse.php",
"controllers/Token.php",
"controllers/Rels.php",
"controllers/Certbot.php",
"lib/HTTPCurl.php",
"lib/HTTPStream.php",
"lib/HTTP.php",
"lib/Formats/Mf2.php",
"lib/Formats/Instagram.php",
"lib/Formats/GitHub.php",
"lib/Formats/Twitter.php",
"lib/Formats/XKCD.php",
"lib/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php"
"lib/XRay.php"
]
},
"require-dev": {
"league/plates": "3.*",
"league/route": "1.*",
"phpunit/phpunit": "4.8.*"
},
"autoload-dev": {
"files": [
"lib/HTTPTest.php"
"controllers/Main.php",
"controllers/Parse.php",
"controllers/Token.php",
"controllers/Rels.php",
"controllers/Certbot.php"
]
}
}

+ 1207
- 100
composer.lock
File diff suppressed because it is too large
View File


+ 2
- 2
controllers/Certbot.php View File

@ -13,7 +13,7 @@ class Certbot {
$state = mt_rand(10000,99999);
$_SESSION['state'] = $state;
$response->setContent(view('certbot', [
$response->setContent(p3k\XRay\view('certbot', [
'title' => 'X-Ray',
'state' => $state
]));
@ -109,7 +109,7 @@ class Certbot {
'challenge' => $challenge
]), 0, 600);
$response->setContent(view('certbot', [
$response->setContent(p3k\XRay\view('certbot', [
'title' => 'X-Ray',
'challenge' => $challenge,
'token' => $token,

+ 1
- 1
controllers/Main.php View File

@ -5,7 +5,7 @@ use Symfony\Component\HttpFoundation\Response;
class Main {
public function index(Request $request, Response $response) {
$response->setContent(view('index', [
$response->setContent(p3k\XRay\view('index', [
'title' => 'X-Ray'
]));
return $response;

+ 47
- 312
controllers/Parse.php View File

@ -2,7 +2,7 @@
use Symfony\Component\HttpFoundation\Request;
use Symfony\Component\HttpFoundation\Response;
use XRay\Formats;
use p3k\XRay\Formats;
class Parse {
@ -12,11 +12,11 @@ class Parse {
private $_pretty = false;
public static function useragent() {
return 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 XRay/1.0.0 ('.\Config::$base.')';
return 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 XRay/1.0.0 ('.\Config::$base.')';
}
public function __construct() {
$this->http = new p3k\HTTP();
$this->http = new p3k\HTTP(self::useragent());
if(Config::$cache && class_exists('Memcache')) {
$this->mc = new Memcache();
$this->mc->addServer('127.0.0.1');
@ -41,19 +41,20 @@ class Parse {
return $response;
}
private static function toHtmlEntities($input) {
return mb_convert_encoding($input, 'HTML-ENTITIES', mb_detect_encoding($input));
}
public function parse(Request $request, Response $response) {
$opts = [];
if($request->get('timeout')) {
// We might make 2 HTTP requests, so each request gets half the desired timeout
$this->http->timeout = $request->get('timeout') / 2;
$opts['timeout'] = $request->get('timeout') / 2;
}
if($request->get('max_redirects')) {
$this->http->max_redirects = (int)$request->get('max_redirects');
if($request->get('max_redirects') !== null) {
$opts['max_redirects'] = (int)$request->get('max_redirects');
}
if($request->get('target')) {
$opts['target'] = $request->get('target');
}
if($request->get('pretty')) {
@ -61,12 +62,12 @@ class Parse {
}
$url = $request->get('url');
$html = $request->get('html');
$html = $request->get('html') ?: $request->get('body');
if(!$url && !$html) {
return $this->respond($response, 400, [
'error' => 'missing_url',
'error_description' => 'Provide a URL or HTML to fetch'
'error_description' => 'Provide a URL or HTML to fetch',
]);
}
@ -74,319 +75,53 @@ class Parse {
// If HTML is provided in the request, parse that, and use the URL provided as the base URL for mf2 resolving
$result['body'] = $html;
$result['url'] = $url;
$result['code'] = null;
} else {
// Attempt some basic URL validation
$scheme = parse_url($url, PHP_URL_SCHEME);
if(!in_array($scheme, ['http','https'])) {
return $this->respond($response, 400, [
'error' => 'invalid_url',
'error_description' => 'Only http and https URLs are supported'
]);
}
$host = parse_url($url, PHP_URL_HOST);
if(!$host) {
return $this->respond($response, 400, [
'error' => 'invalid_url',
'error_description' => 'The URL provided was not valid'
]);
$fetcher = new p3k\XRay\Fetcher($this->http);
$fields = [
'twitter_api_key','twitter_api_secret','twitter_access_token','twitter_access_token_secret',
'github_access_token',
'token'
];
foreach($fields as $f) {
if($v=$request->get($f))
$opts[$f] = $v;
}
$url = \normalize_url($url);
// Check if this is a Twitter URL and if they've provided API credentials, use the API
if(preg_match('/https?:\/\/(?:mobile\.twitter\.com|twitter\.com|twtr\.io)\/(?:[a-z0-9_\/!#]+statuse?s?\/([0-9]+)|([a-zA-Z0-9_]+))/i', $url, $match)) {
return $this->parseTwitterURL($request, $response, $url, $match);
}
if($host == 'github.com') {
return $this->parseGitHubURL($request, $response, $url);
}
$result = $fetcher->fetch($url, $opts);
// Now fetch the URL and check for any curl errors
// Don't cache the response if a token is used to fetch it
if($this->mc && !$request->get('token')) {
$cacheKey = 'xray-'.md5($url);
if($cached=$this->mc->get($cacheKey)) {
$result = json_decode($cached, true);
self::debug('using HTML from cache', 'X-Cache-Debug');
} else {
$result = $this->http->get($url, [self::useragent()]);
$cacheData = json_encode($result);
// App Engine limits the size of cached items, so don't cache ones larger than that
if(strlen($cacheData) < 1000000)
$this->mc->set($cacheKey, $cacheData, MEMCACHE_COMPRESSED, $this->_cacheTime);
}
} else {
$headers = [self::useragent()];
if($request->get('token')) {
$headers[] = 'Authorization: Bearer ' . $request->get('token');
}
$result = $this->http->get($url, $headers);
}
if($result['error']) {
return $this->respond($response, 200, [
'error' => $result['error'],
'error_description' => $result['error_description'],
'url' => $result['url'],
'code' => $result['code']
]);
}
if(trim($result['body']) == '') {
if($result['code'] == 410) {
// 410 Gone responses are valid and should not return an error
return $this->respond($response, 200, [
'data' => [
'type' => 'unknown'
],
'url' => $result['url'],
'code' => $result['code']
]);
}
return $this->respond($response, 200, [
'error' => 'no_content',
'error_description' => 'We did not get a response body when fetching the URL',
'url' => $result['url'],
'code' => $result['code']
]);
}
// Check for HTTP 401/403
if($result['code'] == 401) {
return $this->respond($response, 200, [
'error' => 'unauthorized',
'error_description' => 'The URL returned "HTTP 401 Unauthorized"',
'url' => $result['url'],
'code' => 401
]);
}
if($result['code'] == 403) {
return $this->respond($response, 200, [
'error' => 'forbidden',
'error_description' => 'The URL returned "HTTP 403 Forbidden"',
'url' => $result['url'],
'code' => 403
]);
if(!empty($result['error'])) {
$error_code = isset($result['error_code']) ? $result['error_code'] : 200;
unset($result['error_code']);
return $this->respond($response, $error_code, $result);
}
}
// Check for known services
$host = parse_url($result['url'], PHP_URL_HOST);
if(in_array($host, ['www.instagram.com','instagram.com'])) {
list($data, $parsed) = Formats\Instagram::parse($result['body'], $result['url'], $this->http);
if($request->get('include_original'))
$data['original'] = $parsed;
$data['url'] = $result['url'];
$data['code'] = $result['code'];
return $this->respond($response, 200, $data);
}
if($host == 'xkcd.com' && parse_url($url, PHP_URL_PATH) != '/') {
$data = Formats\XKCD::parse($result['body'], $url);
$data['url'] = $result['url'];
$data['code'] = $result['code'];
return $this->respond($response, 200, $data);
}
// attempt to parse the page as HTML
$doc = new DOMDocument();
@$doc->loadHTML(self::toHtmlEntities($result['body']));
if(!$doc) {
return $this->respond($response, 200, [
'error' => 'invalid_content',
'error_description' => 'The document could not be parsed as HTML'
]);
}
$xpath = new DOMXPath($doc);
$parser = new p3k\XRay\Parser($this->http);
$parsed = $parser->parse($result['body'], $result['url'], $opts);
// Check for meta http equiv and replace the status code if present
foreach($xpath->query('//meta[translate(@http-equiv,\'STATUS\',\'status\')=\'status\']') as $el) {
$equivStatus = ''.$el->getAttribute('content');
if($equivStatus && is_string($equivStatus)) {
if(preg_match('/^(\d+)/', $equivStatus, $match)) {
$result['code'] = (int)$match[1];
}
}
}
// Allow the parser to override the HTTP response code, e.g. a meta-equiv tag
if(isset($parsed['code']))
$result['code'] = $parsed['code'];
// If a target parameter was provided, make sure a link to it exists on the page
if($target=$request->get('target')) {
$found = [];
if($target) {
self::xPathFindNodeWithAttribute($xpath, 'a', 'href', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
self::xPathFindNodeWithAttribute($xpath, 'img', 'src', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
self::xPathFindNodeWithAttribute($xpath, 'video', 'src', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
self::xPathFindNodeWithAttribute($xpath, 'audio', 'src', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
}
if(!$found) {
return $this->respond($response, 200, [
'error' => 'no_link_found',
'error_description' => 'The source document does not have a link to the target URL',
'url' => $result['url'],
'code' => $result['code'],
]);
}
}
// If the URL has a fragment ID, find the DOM starting at that node and parse it instead
$html = $result['body'];
$fragment = parse_url($url, PHP_URL_FRAGMENT);
if($fragment) {
$fragElement = self::xPathGetElementById($xpath, $fragment);
if($fragElement) {
$html = $doc->saveHTML($fragElement);
$foundFragment = true;
} else {
$foundFragment = false;
}
}
// Now start pulling in the data from the page. Start by looking for microformats2
$mf2 = mf2\Parse($html, $result['url']);
if($mf2 && count($mf2['items']) > 0) {
$data = Formats\Mf2::parse($mf2, $result['url'], $this->http);
if($data) {
if($fragment) {
$data['info'] = [
'found_fragment' => $foundFragment
];
}
if($request->get('include_original'))
$data['original'] = $html;
$data['url'] = $result['url']; // this will be the effective URL after following redirects
$data['code'] = $result['code'];
return $this->respond($response, 200, $data);
}
}
// TODO: look for other content like OEmbed or other known services later
return $this->respond($response, 200, [
'data' => [
'type' => 'unknown',
],
'url' => $result['url'],
'code' => $result['code']
]);
}
private static function xPathFindNodeWithAttribute($xpath, $node, $attr, $callback) {
foreach($xpath->query('//'.$node.'[@'.$attr.']') as $el) {
$v = $el->getAttribute($attr);
$callback($v);
}
}
private static function xPathGetElementById($xpath, $id) {
$element = null;
foreach($xpath->query("//*[@id='$id']") as $el) {
$element = $el;
}
return $element;
}
private function parseTwitterURL(&$request, &$response, $url, $match) {
$fields = ['twitter_api_key','twitter_api_secret','twitter_access_token','twitter_access_token_secret'];
$creds = [];
foreach($fields as $f) {
if($v=$request->get($f))
$creds[$f] = $v;
}
$data = false;
if(count($creds) == 4) {
list($data, $parsed) = Formats\Twitter::parse($url, $match[1], $creds);
} elseif(count($creds) > 0) {
// If only some Twitter credentials were present, return an error
return $this->respond($response, 400, [
'error' => 'missing_parameters',
'error_description' => 'All 4 Twitter credentials must be included in the request'
]);
} else {
// Accept Tweet JSON and parse that if provided
$json = $request->get('json');
if($json) {
list($data, $parsed) = Formats\Twitter::parse($url, $match[1], null, $json);
}
// Skip parsing from the Twitter API if they didn't include credentials
}
if($data) {
if($request->get('include_original'))
$data['original'] = $parsed;
$data['url'] = $url;
$data['code'] = 200;
return $this->respond($response, 200, $data);
} else {
return $this->respond($response, 200, [
'data' => [
'type' => 'unknown'
],
'url' => $url,
'code' => 0
]);
}
}
private function parseGitHubURL(&$request, &$response, $url) {
$fields = ['github_access_token'];
$creds = [];
foreach($fields as $f) {
if($v=$request->get($f))
$creds[$f] = $v;
}
$data = false;
$json = $request->get('json');
if($json) {
// Accept GitHub JSON and parse that if provided
list($data, $json, $code) = Formats\GitHub::parse($this->http, $url, null, $json);
if(!empty($parsed['error'])) {
$error_code = isset($parsed['error_code']) ? $parsed['error_code'] : 200;
unset($parsed['error_code']);
return $this->respond($response, $error_code, $parsed);
} else {
// Otherwise fetch the post unauthenticated or with the provided access token
list($data, $json, $code) = Formats\GitHub::parse($this->http, $url, $creds);
}
$data = [
'data' => $parsed['data'],
'url' => $result['url'],
'code' => $result['code']
];
if(isset($parsed['info']))
$data['info'] = $parsed['info'];
if($request->get('include_original') && isset($parsed['original']))
$data['original'] = $parsed['original'];
if($data) {
if($request->get('include_original'))
$data['original'] = $json;
$data['url'] = $url;
$data['code'] = $code;
return $this->respond($response, 200, $data);
} else {
return $this->respond($response, 200, [
'data' => [
'type' => 'unknown'
],
'url' => $url,
'code' => $code
]);
}
}
}

+ 8
- 46
controllers/Rels.php View File

@ -24,13 +24,15 @@ class Rels {
}
public function fetch(Request $request, Response $response) {
$opts = [];
if($request->get('timeout')) {
// We might make 2 HTTP requests, so each request gets half the desired timeout
$this->http->timeout = $request->get('timeout') / 2;
$opts['timeout'] = $request->get('timeout') / 2;
}
if($request->get('max_redirects')) {
$this->http->max_redirects = (int)$request->get('max_redirects');
$opts['max_redirects'] = (int)$request->get('max_redirects');
}
if($request->get('pretty')) {
@ -46,51 +48,11 @@ class Rels {
]);
}
// Attempt some basic URL validation
$scheme = parse_url($url, PHP_URL_SCHEME);
if(!in_array($scheme, ['http','https'])) {
return $this->respond($response, 400, [
'error' => 'invalid_url',
'error_description' => 'Only http and https URLs are supported'
]);
}
$host = parse_url($url, PHP_URL_HOST);
if(!$host) {
return $this->respond($response, 400, [
'error' => 'invalid_url',
'error_description' => 'The URL provided was not valid'
]);
}
$url = \normalize_url($url);
$result = $this->http->get($url);
$html = $result['body'];
$mf2 = mf2\Parse($html, $result['url']);
$rels = p3k\HTTP::link_rels($result['headers']);
if(isset($mf2['rels'])) {
$rels = array_merge($rels, $mf2['rels']);
}
// Resolve all relative URLs
foreach($rels as $rel=>$values) {
foreach($values as $i=>$value) {
$value = \mf2\resolveUrl($result['url'], $value);
$rels[$rel][$i] = $value;
}
}
if(count($rels) == 0)
$rels = new StdClass;
$xray = new p3k\XRay();
$xray->http = $this->http;
$res = $xray->rels($url, $opts);
return $this->respond($response, 200, [
'url' => $result['url'],
'code' => $result['code'],
'rels' => $rels
]);
return $this->respond($response, !empty($res['error']) ? 400 : 200, $res);
}
}

+ 1
- 1
controllers/Token.php View File

@ -55,7 +55,7 @@ class Token {
if(is_string($head['headers']['Link']))
$head['headers']['Link'] = [$head['headers']['Link']];
$rels = p3k\HTTP::link_rels($head['headers']);
$rels = $head['rels'];
$endpoint = false;
if(array_key_exists('token_endpoint', $rels)) {

+ 0
- 122
lib/Formats/GitHub.php View File

@ -1,122 +0,0 @@
<?php
namespace XRay\Formats;
use DateTime, DateTimeZone;
use Parse, Config;
use cebe\markdown\GithubMarkdown;
class GitHub {
public static function parse($http, $url, $creds, $json=null) {
if(!$json) {
// Transform the GitHub URL to an API request
if(preg_match('~https://github.com/([^/]+)/([^/]+)/pull/(\d+)$~', $url, $match)) {
$type = 'pull';
$org = $match[1];
$repo = $match[2];
$pull = $match[3];
$apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo.'/pulls/'.$pull;
} elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)$~', $url, $match)) {
$type = 'issue';
$org = $match[1];
$repo = $match[2];
$issue = $match[3];
$apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo.'/issues/'.$issue;
} elseif(preg_match('~https://github.com/([^/]+)/([^/]+)$~', $url, $match)) {
$type = 'repo';
$org = $match[1];
$repo = $match[2];
$apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo;
} elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)#issuecomment-(\d+)~', $url, $match)) {
$type = 'comment';
$org = $match[1];
$repo = $match[2];
$issue = $match[3];
$comment = $match[4];
$apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo.'/issues/comments/'.$comment;
} else {
return [null, null, 0];
}
$response = $http->get($apiurl, ['User-Agent: XRay ('.Config::$base.')']);
if($response['code'] != 200) {
return [null, $response['body'], $response['code']];
}
$data = json_decode($response['body'], true);
} else {
$data = json_decode($json, true);
}
if(!$data) {
return [null, null, 0];
}
// Start building the h-entry
$entry = array(
'type' => ($type == 'repo' ? 'repo' : 'entry'),
'url' => $url,
'author' => [
'type' => 'card',
'name' => null,
'photo' => null,
'url' => null
]
);
if($type == 'repo')
$authorkey = 'owner';
else
$authorkey = 'user';
$entry['author']['name'] = $data[$authorkey]['login'];
$entry['author']['photo'] = $data[$authorkey]['avatar_url'];
$entry['author']['url'] = $data[$authorkey]['html_url'];
if($type == 'pull') {
$entry['name'] = '#' . $pull . ' ' . $data['title'];
} elseif($type == 'issue') {
$entry['name'] = '#' . $issue . ' ' . $data['title'];
} elseif($type == 'repo') {
$entry['name'] = $data['name'];
}
if($type == 'repo') {
if(!empty($data['description']))
$entry['summary'] = $data['description'];
}
if($type != 'repo' && !empty($data['body'])) {
$parser = new GithubMarkdown();
$entry['content'] = [
'text' => $data['body'],
'html' => $parser->parse($data['body'])
];
}
if($type == 'comment') {
$entry['in-reply-to'] = ['https://github.com/'.$org.'/'.$repo.'/issues/'.$issue];
}
if(!empty($data['labels'])) {
$entry['category'] = array_map(function($l){
return $l['name'];
}, $data['labels']);
}
$entry['published'] = $data['created_at'];
$r = [
'data' => $entry
];
return [$r, $json, $response['code']];
}
}

+ 0
- 56
lib/HTTP.php View File

@ -1,56 +0,0 @@
<?php
namespace p3k;
class HTTP {
public $timeout = 4;
public $max_redirects = 8;
public function get($url, $headers=[]) {
$class = $this->_class($url);
$http = new $class($url);
$http->timeout = $this->timeout;
$http->max_redirects = $this->max_redirects;
return $http->get($url, $headers);
}
public function post($url, $body, $headers=[]) {
$class = $this->_class($url);
$http = new $class($url);
$http->timeout = $this->timeout;
$http->max_redirects = $this->max_redirects;
return $http->post($url, $body, $headers);
}
public function head($url) {
$class = $this->_class($url);
$http = new $class($url);
$http->timeout = $this->timeout;
$http->max_redirects = $this->max_redirects;
return $http->head($url);
}
private function _class($url) {
if(!should_follow_redirects($url)) {
return 'p3k\HTTPStream';
} else {
return 'p3k\HTTPCurl';
}
}
public static function link_rels($header_array) {
$headers = '';
foreach($header_array as $k=>$header) {
if(is_string($header)) {
$headers .= $k . ': ' . $header . "\r\n";
} else {
foreach($header as $h) {
$headers .= $k . ': ' . $h . "\r\n";
}
}
}
$rels = \IndieWeb\http_rels($headers);
return $rels;
}
}

+ 0
- 127
lib/HTTPCurl.php View File

@ -1,127 +0,0 @@
<?php
namespace p3k;
class HTTPCurl {
public $timeout = 4;
public $max_redirects = 8;
public function get($url, $headers=[]) {
$ch = curl_init($url);
$this->_set_curlopts($ch, $url);
if($headers)
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
return array(
'code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
'headers' => self::parse_headers(trim(substr($response, 0, $header_size))),
'body' => substr($response, $header_size),
'error' => self::error_string_from_code(curl_errno($ch)),
'error_description' => curl_error($ch),
'error_code' => curl_errno($ch),
'url' => curl_getinfo($ch, CURLINFO_EFFECTIVE_URL),
);
}
public function post($url, $body, $headers=[]) {
$ch = curl_init($url);
$this->_set_curlopts($ch, $url);
curl_setopt($ch, CURLOPT_POST, true);
curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
if($headers)
curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
$response = curl_exec($ch);
$header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
return array(
'code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
'headers' => self::parse_headers(trim(substr($response, 0, $header_size))),
'body' => substr($response, $header_size),
'error' => self::error_string_from_code(curl_errno($ch)),
'error_description' => curl_error($ch),
'error_code' => curl_errno($ch),
'url' => curl_getinfo($ch, CURLINFO_EFFECTIVE_URL),
);
}
public function head($url) {
$ch = curl_init($url);
$this->_set_curlopts($ch, $url);
curl_setopt($ch, CURLOPT_NOBODY, true);
$response = curl_exec($ch);
return array(
'code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
'headers' => self::parse_headers(trim($response)),
'error' => self::error_string_from_code(curl_errno($ch)),
'error_description' => curl_error($ch),
'error_code' => curl_errno($ch),
'url' => curl_getinfo($ch, CURLINFO_EFFECTIVE_URL),
);
}
private function _set_curlopts($ch, $url) {
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
// Special-case appspot.com URLs to not follow redirects.
// https://cloud.google.com/appengine/docs/php/urlfetch/
if(should_follow_redirects($url)) {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
curl_setopt($ch, CURLOPT_MAXREDIRS, $this->max_redirects);
} else {
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
}
curl_setopt($ch, CURLOPT_TIMEOUT_MS, round($this->timeout * 1000));
curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 2000);
}
public static function error_string_from_code($code) {
switch($code) {
case 0:
return '';
case CURLE_COULDNT_RESOLVE_HOST:
return 'dns_error';
case CURLE_COULDNT_CONNECT:
return 'connect_error';
case CURLE_OPERATION_TIMEDOUT:
return 'timeout';
case CURLE_SSL_CONNECT_ERROR:
return 'ssl_error';
case CURLE_SSL_CERTPROBLEM:
return 'ssl_cert_error';
case CURLE_SSL_CIPHER:
return 'ssl_unsupported_cipher';
case CURLE_SSL_CACERT:
return 'ssl_cert_error';
case CURLE_TOO_MANY_REDIRECTS:
return 'too_many_redirects';
default:
return 'unknown';
}
}
public static function parse_headers($headers) {
$retVal = array();
$fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $headers));
foreach($fields as $field) {
if(preg_match('/([^:]+): (.+)/m', $field, $match)) {
$match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
return strtoupper($m[0]);
}, strtolower(trim($match[1])));
// If there's already a value set for the header name being returned, turn it into an array and add the new value
$match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
return strtoupper($m[0]);
}, strtolower(trim($match[1])));
if(isset($retVal[$match[1]])) {
if(!is_array($retVal[$match[1]]))
$retVal[$match[1]] = array($retVal[$match[1]]);
$retVal[$match[1]][] = $match[2];
} else {
$retVal[$match[1]] = trim($match[2]);
}
}
}
return $retVal;
}
}

+ 0
- 138
lib/HTTPStream.php View File

@ -1,138 +0,0 @@
<?php
namespace p3k;
class HTTPStream {
public $timeout = 4;
public $max_redirects = 8;
public static function exception_error_handler($severity, $message, $file, $line) {
if (!(error_reporting() & $severity)) {
// This error code is not included in error_reporting
return;
}
throw new \ErrorException($message, 0, $severity, $file, $line);
}
public function get($url, $headers=[]) {
set_error_handler("p3k\HTTPStream::exception_error_handler");
$context = $this->_stream_context('GET', $url, false, $headers);
return $this->_fetch($url, $context);
}
public function post($url, $body, $headers=[]) {
set_error_handler("p3k\HTTPStream::exception_error_handler");
$context = $this->_stream_context('POST', $url, $body, $headers);
return $this->_fetch($url, $context);
}
public function head($url) {
set_error_handler("p3k\HTTPStream::exception_error_handler");
$context = $this->_stream_context('HEAD', $url);
return $this->_fetch($url, $context);
}
private function _fetch($url, $context) {
$error = false;
try {
$body = file_get_contents($url, false, $context);
// This sets $http_response_header
// see http://php.net/manual/en/reserved.variables.httpresponseheader.php
} catch(\Exception $e) {
$body = false;
$http_response_header = [];
$description = str_replace('file_get_contents(): ', '', $e->getMessage());
$code = 'unknown';
if(preg_match('/getaddrinfo failed/', $description)) {
$code = 'dns_error';
$description = str_replace('php_network_getaddresses: ', '', $description);
}
if(preg_match('/timed out|request failed/', $description)) {
$code = 'timeout';
}
if(preg_match('/certificate/', $description)) {
$code = 'ssl_error';
}
$error = [
'description' => $description,
'code' => $code
];
}
return array(
'code' => self::parse_response_code($http_response_header),
'headers' => self::parse_headers($http_response_header),
'body' => $body,
'error' => $error ? $error['code'] : false,
'error_description' => $error ? $error['description'] : false,
'url' => $url,
);
}
private function _stream_context($method, $url, $body=false, $headers=[]) {
$options = [
'method' => $method,
'timeout' => $this->timeout,
'ignore_errors' => true,
];
if($body) {
$options['content'] = $body;
}
if($headers) {
$options['header'] = implode("\r\n", $headers);
}
// Special-case appspot.com URLs to not follow redirects.
// https://cloud.google.com/appengine/docs/php/urlfetch/
if(should_follow_redirects($url)) {
$options['follow_location'] = 1;
$options['max_redirects'] = $this->max_redirects;
} else {
$options['follow_location'] = 0;
}
return stream_context_create(['http' => $options]);
}
public static function parse_response_code($headers) {
// When a response is a redirect, we want to find the last occurrence of the HTTP code
$code = false;
foreach($headers as $field) {
if(preg_match('/HTTP\/\d\.\d (\d+)/', $field, $match)) {
$code = $match[1];
}
}
return $code;
}
public static function parse_headers($headers) {
$retVal = array();
foreach($headers as $field) {
if(preg_match('/([^:]+): (.+)/m', $field, $match)) {
$match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
return strtoupper($m[0]);
}, strtolower(trim($match[1])));
// If there's already a value set for the header name being returned, turn it into an array and add the new value
$match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
return strtoupper($m[0]);
}, strtolower(trim($match[1])));
if(isset($retVal[$match[1]])) {
if(!is_array($retVal[$match[1]]))
$retVal[$match[1]] = array($retVal[$match[1]]);
$retVal[$match[1]][] = $match[2];
} else {
$retVal[$match[1]] = trim($match[2]);
}
}
}
return $retVal;
}
}

+ 0
- 92
lib/HTTPTest.php View File

@ -1,92 +0,0 @@
<?php
namespace p3k;
class HTTPTest extends HTTPCurl {
private $_testDataPath;
private $_redirects_remaining;
public function __construct($testDataPath) {
$this->_testDataPath = $testDataPath;
}
public function get($url, $headers=[]) {
$this->_redirects_remaining = $this->max_redirects;
$parts = parse_url($url);
unset($parts['fragment']);
$url = \build_url($parts);
return $this->_read_file($url);
}
public function post($url, $body, $headers=[]) {
return $this->_read_file($url);
}
public function head($url) {
$response = $this->_read_file($url);
return array(
'code' => $response['code'],
'headers' => $response['headers'],
'error' => '',
'error_description' => '',
'url' => $response['url']
);
}
private function _read_file($url) {
$parts = parse_url($url);
if($parts['path']) {
$parts['path'] = '/'.str_replace('/','_',substr($parts['path'],1));
$url = \build_url($parts);
}
$filename = $this->_testDataPath.preg_replace('/https?:\/\//', '', $url);
if(!file_exists($filename)) {
$filename = $this->_testDataPath.'404.response.txt';
}
$response = file_get_contents($filename);
$split = explode("\r\n\r\n", $response);
if(count($split) < 2) {
throw new \Exception("Invalid file contents in test data, check that newlines are CRLF: $url");
}
$headers = array_shift($split);
$body = implode("\r\n", $split);
if(preg_match('/HTTP\/1\.1 (\d+)/', $headers, $match)) {
$code = $match[1];
}
$headers = preg_replace('/HTTP\/1\.1 \d+ .+/', '', $headers);
$parsedHeaders = self::parse_headers($headers);
if(array_key_exists('Location', $parsedHeaders)) {
$effectiveUrl = \mf2\resolveUrl($url, $parsedHeaders['Location']);
if($this->_redirects_remaining > 0) {
$this->_redirects_remaining--;
return $this->_read_file($effectiveUrl);
} else {
return [
'code' => 0,
'headers' => $parsedHeaders,
'body' => $body,
'error' => 'too_many_redirects',
'error_description' => '',
'url' => $effectiveUrl
];
}
} else {
$effectiveUrl = $url;
}
return array(
'code' => $code,
'headers' => $parsedHeaders,
'body' => $body,
'error' => (isset($parsedHeaders['X-Test-Error']) ? $parsedHeaders['X-Test-Error'] : ''),
'error_description' => '',
'url' => $effectiveUrl
);
}
}

+ 42
- 0
lib/XRay.php View File

@ -0,0 +1,42 @@
<?php
namespace p3k;
class XRay {
public $http;
public function __construct() {
$this->http = new HTTP();
}
public function rels($url, $opts=[]) {
$rels = new XRay\Rels($this->http);
return $rels->parse($url, $opts);
}
public function parse($url, $opts_or_body=false, $opts_for_body=[]) {
if(!$opts_or_body || is_array($opts_or_body)) {
$fetch = new XRay\Fetcher($this->http);
$response = $fetch->fetch($url, $opts_or_body);
if(!empty($response['error']))
return $response;
$body = $response['body'];
$url = $response['url'];
$code = $response['code'];
$opts = is_array($opts_or_body) ? $opts_or_body : $opts_for_body;
} else {
$body = $opts_or_body;
$opts = $opts_for_body;
$code = null;
}
$parser = new XRay\Parser($this->http);
$result = $parser->parse($body, $url, $opts);
if(!isset($opts['include_original']) || !$opts['include_original'])
unset($result['original']);
$result['url'] = $url;
$result['code'] = isset($result['code']) ? $result['code'] : $code;
return $result;
}
}

+ 169
- 0
lib/XRay/Fetcher.php View File

@ -0,0 +1,169 @@
<?php
namespace p3k\XRay;
class Fetcher {
private $http;
public function __construct($http) {
$this->http = $http;
}
public function fetch($url, $opts=[]) {
if($opts == false) $opts = [];
if(isset($opts['timeout']))
$this->http->set_timeout($opts['timeout']);
if(isset($opts['max_redirects']))
$this->http->set_max_redirects($opts['max_redirects']);
// Attempt some basic URL validation
$scheme = parse_url($url, PHP_URL_SCHEME);
if(!in_array($scheme, ['http','https'])) {
return [
'error_code' => 400,
'error' => 'invalid_url',
'error_description' => 'Only http and https URLs are supported'
];
}
$host = parse_url($url, PHP_URL_HOST);
if(!$host) {
return [
'error_code' => 400,
'error' => 'invalid_url',
'error_description' => 'The URL provided was not valid'
];
}
$url = normalize_url($url);
$host = parse_url($url, PHP_URL_HOST);
// Check if this is a Twitter URL and use the API
if(Formats\Twitter::matches_host($url)) {
return $this->_fetch_tweet($url, $opts);
}
// Transform the HTML GitHub URL into an GitHub API request and fetch the API response
if(Formats\GitHub::matches_host($url)) {
return $this->_fetch_github($url, $opts);
}
// All other URLs are fetched normally
// Special-case appspot.com URLs to not follow redirects.
// https://cloud.google.com/appengine/docs/php/urlfetch/
if(!should_follow_redirects($url)) {
$this->http->set_max_redirects(0);
$this->http->set_transport(new \p3k\HTTP\Stream());
} else {
$this->http->set_transport(new \p3k\HTTP\Curl());
}
$headers = [];
if(isset($opts['token']))
$headers[] = 'Authorization: Bearer ' . $opts['token'];
$result = $this->http->get($url, $headers);
if($result['error']) {
return [
'error' => $result['error'],
'error_description' => $result['error_description'],
'url' => $result['url'],
'code' => $result['code'],
];
}
if(trim($result['body']) == '') {
if($result['code'] == 410) {
// 410 Gone responses are valid and should not return an error
return $this->respond($response, 200, [
'data' => [
'type' => 'unknown'
],
'url' => $result['url'],
'code' => $result['code']
]);
}
return [
'error' => 'no_content',
'error_description' => 'We did not get a response body when fetching the URL',
'url' => $result['url'],
'code' => $result['code']
];
}
// Check for HTTP 401/403
if($result['code'] == 401) {
return [
'error' => 'unauthorized',
'error_description' => 'The URL returned "HTTP 401 Unauthorized"',
'url' => $result['url'],
'code' => $result['code']
];
}
if($result['code'] == 403) {
return [
'error' => 'forbidden',
'error_description' => 'The URL returned "HTTP 403 Forbidden"',
'url' => $result['url'],
'code' => $result['code']
];
}
// If the original URL had a fragment, include it in the final URL
if(($fragment=parse_url($url, PHP_URL_FRAGMENT)) && !parse_url($result['url'], PHP_URL_FRAGMENT)) {
$result['url'] .= '#'.$fragment;
}
return [
'url' => $result['url'],
'body' => $result['body'],
'code' => $result['code'],
];
}
private function _fetch_tweet($url, $opts) {
$fields = ['twitter_api_key','twitter_api_secret','twitter_access_token','twitter_access_token_secret'];
$creds = [];
foreach($fields as $f) {
if(isset($opts[$f]))
$creds[$f] = $opts[$f];
}
if(count($creds) < 4) {
return [
'error_code' => 400,
'error' => 'missing_parameters',
'error_description' => 'All 4 Twitter credentials must be included in the request'
];
}
$tweet = Formats\Twitter::fetch($url, $creds);
if(!$tweet) {
return [
'error' => 'twitter_error',
'error_description' => $e->getMessage()
];
}
return [
'url' => $url,
'body' => $tweet,
'code' => 200,
];
}
private function _fetch_github($url, $opts) {
$fields = ['github_access_token'];
$creds = [];
foreach($fields as $f) {
if(isset($opts[$f]))
$creds[$f] = $opts[$f];
}
return Formats\GitHub::fetch($this->http, $url, $creds);
}
}

+ 36
- 0
lib/XRay/Formats/Format.php View File

@ -0,0 +1,36 @@
<?php
namespace p3k\XRay\Formats;
use DOMDocument, DOMXPath;
interface iFormat {
public static function matches_host($url);
public static function matches($url);
}
abstract class Format implements iFormat {
protected static function _unknown() {
return [
'data' => [
'type' => 'unknown'
]
];
}
protected static function _loadHTML($html) {
$doc = new DOMDocument();
@$doc->loadHTML($html);
if(!$doc) {
return [null, null];
}
$xpath = new DOMXPath($doc);
return [$doc, $xpath];
}
}

+ 166
- 0
lib/XRay/Formats/GitHub.php View File

@ -0,0 +1,166 @@
<?php
namespace p3k\XRay\Formats;
use DateTime, DateTimeZone;
use Config;
use cebe\markdown\GithubMarkdown;
class GitHub extends Format {
public static function matches_host($url) {
$host = parse_url($url, PHP_URL_HOST);
return $host == 'github.com';
}
public static function matches($url) {
return preg_match('~https://github.com/([^/]+)/([^/]+)/pull/(\d+)$~', $url, $match)
|| preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)$~', $url, $match)
|| preg_match('~https://github.com/([^/]+)/([^/]+)$~', $url, $match)
|| preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)#issuecomment-(\d+)~', $url, $match);
}
private static function extract_url_parts($url) {
$response = false;
if(preg_match('~https://github.com/([^/]+)/([^/]+)/pull/(\d+)$~', $url, $match)) {
$response = [];
$response['type'] = 'pull';
$response['org'] = $match[1];
$response['repo'] = $match[2];
$response['pull'] = $match[3];
$response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'].'/pulls/'.$response['pull'];
} elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)$~', $url, $match)) {
$response = [];
$response['type'] = 'issue';
$response['org'] = $match[1];
$response['repo'] = $match[2];
$response['issue'] = $match[3];
$response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'].'/issues/'.$response['issue'];
} elseif(preg_match('~https://github.com/([^/]+)/([^/]+)$~', $url, $match)) {
$response = [];
$response['type'] = 'repo';
$response['org'] = $match[1];
$response['repo'] = $match[2];
$response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'];
} elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)#issuecomment-(\d+)~', $url, $match)) {
$response = [];
$response['type'] = 'comment';
$response['org'] = $match[1];
$response['repo'] = $match[2];
$response['issue'] = $match[3];
$response['comment'] = $match[4];
$response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'].'/issues/comments/'.$response['comment'];
}
return $response;
}
public static function fetch($http, $url, $creds) {
$parts = self::extract_url_parts($url);
if(!$parts) {
return [
'error' => 'unsupported_url',
'error_description' => 'This GitHub URL is not supported',
'error_code' => 400,
];
}
$headers = [];
if(isset($creds['github_access_token'])) {
$headers[] = 'Authorization: Bearer ' . $creds['github_access_token'];
}
$response = $http->get($parts['apiurl'], $headers);
if($response['code'] != 200) {
return [
'error' => 'github_error',
'error_description' => $response['body'],
'code' => $response['code'],
];
}
return [
'url' => $url,
'body' => $response['body'],
'code' => $response['code'],
];
}
public static function parse($json, $url) {
$data = @json_decode($json, true);
if(!$data)
return self::_unknown();
$parts = self::extract_url_parts($url);
if(!$parts)
return self::_unknown();
// Start building the h-entry
$entry = array(
'type' => ($parts['type'] == 'repo' ? 'repo' : 'entry'),
'url' => $url,
'author' => [
'type' => 'card',
'name' => null,
'photo' => null,
'url' => null
]
);
if($parts['type'] == 'repo')
$authorkey = 'owner';
else
$authorkey = 'user';
$entry['author']['name'] = $data[$authorkey]['login'];
$entry['author']['photo'] = $data[$authorkey]['avatar_url'];
$entry['author']['url'] = $data[$authorkey]['html_url'];
if($parts['type'] == 'pull') {
$entry['name'] = '#' . $parts['pull'] . ' ' . $data['title'];
} elseif($parts['type'] == 'issue') {
$entry['name'] = '#' . $parts['issue'] . ' ' . $data['title'];
} elseif($parts['type'] == 'repo') {
$entry['name'] = $data['name'];
}
if($parts['type'] == 'repo') {
if(!empty($data['description']))
$entry['summary'] = $data['description'];
}
if($parts['type'] != 'repo' && !empty($data['body'])) {
$parser = new GithubMarkdown();
$entry['content'] = [
'text' => $data['body'],
'html' => $parser->parse($data['body'])
];
}
if($parts['type'] == 'comment') {
$entry['in-reply-to'] = ['https://github.com/'.$parts['org'].'/'.$parts['repo'].'/issues/'.$parts['issue']];
}
if(!empty($data['labels'])) {
$entry['category'] = array_map(function($l){
return $l['name'];
}, $data['labels']);
}
$entry['published'] = $data['created_at'];
return [
'data' => $entry,
'original' => $json
];
}
}

+ 132
- 0
lib/XRay/Formats/HTML.php View File

@ -0,0 +1,132 @@
<?php
namespace p3k\XRay\Formats;
use HTMLPurifier, HTMLPurifier_Config;
use DOMDocument, DOMXPath;
use p3k\XRay\Formats;
class HTML extends Format {
public static function matches_host($url) { return true; }
public static function matches($url) { return true; }
public static function parse($http, $html, $url, $opts=[]) {
$result = [
'data' => [
'type' => 'unknown',
],
'url' => $url,
];
// attempt to parse the page as HTML
$doc = new DOMDocument();
@$doc->loadHTML(self::toHtmlEntities($html));
if(!$doc) {
return [
'error' => 'invalid_content',
'error_description' => 'The document could not be parsed as HTML'
];
}
$xpath = new DOMXPath($doc);
// Check for meta http equiv and replace the status code if present
foreach($xpath->query('//meta[translate(@http-equiv,\'STATUS\',\'status\')=\'status\']') as $el) {
$equivStatus = ''.$el->getAttribute('content');
if($equivStatus && is_string($equivStatus)) {
if(preg_match('/^(\d+)/', $equivStatus, $match)) {
$result['code'] = (int)$match[1];
}
}
}
// If a target parameter was provided, make sure a link to it exists on the page
if(isset($opts['target'])) {
$target = $opts['target'];
$found = [];
if($target) {
self::xPathFindNodeWithAttribute($xpath, 'a', 'href', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
self::xPathFindNodeWithAttribute($xpath, 'img', 'src', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
self::xPathFindNodeWithAttribute($xpath, 'video', 'src', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
self::xPathFindNodeWithAttribute($xpath, 'audio', 'src', function($u) use($target, &$found){
if($u == $target) {
$found[$u] = null;
}
});
}
if(!$found) {
return [
'error' => 'no_link_found',
'error_description' => 'The source document does not have a link to the target URL',
'code' => isset($result['code']) ? $result['code'] : 200,
'url' => $url
];
}
}
// If the URL has a fragment ID, find the DOM starting at that node and parse it instead
$fragment = parse_url($url, PHP_URL_FRAGMENT);
if($fragment) {
$fragElement = self::xPathGetElementById($xpath, $fragment);
if($fragElement) {
$html = $doc->saveHTML($fragElement);
$foundFragment = true;
} else {
$foundFragment = false;
}
}
// Now start pulling in the data from the page. Start by looking for microformats2
$mf2 = \mf2\Parse($html, $url);
if($mf2 && count($mf2['items']) > 0) {
$data = Formats\Mf2::parse($mf2, $url, $http);
$result = array_merge($result, $data);
if($data) {
if($fragment) {
$result['info'] = [
'found_fragment' => $foundFragment
];
}
$result['original'] = $html;
$result['url'] = $url; // this will be the effective URL after following redirects
}
}
return $result;
}
private static function toHtmlEntities($input) {
return mb_convert_encoding($input, 'HTML-ENTITIES', mb_detect_encoding($input));
}
private static function xPathFindNodeWithAttribute($xpath, $node, $attr, $callback) {
foreach($xpath->query('//'.$node.'[@'.$attr.']') as $el) {
$v = $el->getAttribute($attr);
$callback($v);
}
}
private static function xPathGetElementById($xpath, $id) {
$element = null;
foreach($xpath->query("//*[@id='$id']") as $el) {
$element = $el;
}
return $element;
}
}

lib/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php → lib/XRay/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php View File

@ -1,5 +1,5 @@
<?php
namespace XRay\Formats;
namespace p3k\XRay\Formats;
/**
* Allows Microformats2 classes but rejects any others

lib/Formats/Instagram.php → lib/XRay/Formats/Instagram.php View File

@ -1,18 +1,26 @@
<?php
namespace XRay\Formats;
namespace p3k\XRay\Formats;
use DOMDocument, DOMXPath;
use DateTime, DateTimeZone;
use Parse;
class Instagram {
class Instagram extends Format {
public static function parse($html, $url, $http) {
public static function matches_host($url) {
$host = parse_url($url, PHP_URL_HOST);
return in_array($host, ['www.instagram.com','instagram.com']);
}
public static function matches($url) {
return self::matches_host($url);
}
public static function parse($http, $html, $url) {
$photoData = self::_extractPhotoDataFromPhotoPage($html);
if(!$photoData)
return false;
return self::_unknown();
// Start building the h-entry
$entry = array(
@ -131,19 +139,18 @@ class Instagram {
$entry['published'] = $published->format('c');
$response = [
'data' => $entry
];
if(count($refs)) {
$response['refs'] = $refs;
$entry['refs'] = $refs;
}
return [$response, [
'photo' => $photoData,
'profiles' => $profiles,
'locations' => $locations
]];
return [
'data' => $entry,
'original' => json_encode([
'photo' => $photoData,
'profiles' => $profiles,
'locations' => $locations
])
];
}
private static function _buildHCardFromInstagramProfile($profile) {

lib/Formats/Mf2.php → lib/XRay/Formats/Mf2.php View File

@ -1,8 +1,7 @@
<?php
namespace XRay\Formats;
namespace p3k\XRay\Formats;
use HTMLPurifier, HTMLPurifier_Config;
use Parse;
class Mf2 {
@ -14,31 +13,31 @@ class Mf2 {
if(count($mf2['items']) == 1) {
$item = $mf2['items'][0];
if(in_array('h-entry', $item['type']) || in_array('h-cite', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-entry it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-entry it is the only item on the page");
return self::parseAsHEntry($mf2, $item, $http);
}
if(in_array('h-event', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-event it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-event it is the only item on the page");
return self::parseAsHEvent($mf2, $item, $http);
}
if(in_array('h-review', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-review it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-review it is the only item on the page");
return self::parseAsHReview($mf2, $item, $http);
}
if(in_array('h-recipe', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-recipe it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-recipe it is the only item on the page");
return self::parseAsHRecipe($mf2, $item, $http);
}
if(in_array('h-product', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-product it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-product it is the only item on the page");
return self::parseAsHProduct($mf2, $item, $http);
}
if(in_array('h-feed', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-feed because it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-feed because it is the only item on the page");
return self::parseAsHFeed($mf2, $http);
}
if(in_array('h-card', $item['type'])) {
Parse::debug("mf2:0: Recognized $url as an h-card it is the only item on the page");
#Parse::debug("mf2:0: Recognized $url as an h-card it is the only item on the page");
return self::parseAsHCard($item, $http, $url);
}
}
@ -48,9 +47,9 @@ class Mf2 {
foreach($mf2['items'] as $item) {
if(array_key_exists('url', $item['properties'])) {
$urls = $item['properties']['url'];
$urls = array_map('self::normalize_url', $urls);
$urls = array_map('\p3k\XRay\normalize_url', $urls);
if(in_array($url, $urls)) {
Parse::debug("mf2:1: Recognized $url as a permalink because an object on the page matched the URL of the request");
#Parse::debug("mf2:1: Recognized $url as a permalink because an object on the page matched the URL of the request");
if(in_array('h-card', $item['type'])) {
return self::parseAsHCard($item, $http, $url);
} elseif(in_array('h-entry', $item['type']) || in_array('h-cite', $item['type'])) {
@ -64,7 +63,7 @@ class Mf2 {
} elseif(in_array('h-product', $item['type'])) {
return self::parseAsHProduct($mf2, $item, $http);
} else {
Parse::debug('This object was not a recognized type.');
#Parse::debug('This object was not a recognized type.');
return false;
}
}
@ -77,7 +76,7 @@ class Mf2 {
foreach($mf2['items'] as $card) {
if(in_array('h-card', $card['type']) && array_key_exists('url', $card['properties'])) {
$urls = $card['properties']['url'];
$urls = array_map('self::normalize_url', $urls);
$urls = array_map('\p3k\XRay\normalize_url', $urls);
if(count(array_intersect($urls, $mf2['rels']['author'])) > 0) {
// There is an author h-card on this page
// Now look for the first h-* object other than an h-card and use that as the object
@ -106,7 +105,7 @@ class Mf2 {
if(count(array_filter($mf2['items'], function($item){
return in_array('h-entry', $item['type']);
})) > 1) {
Parse::debug("mf2:2: Recognized $url as an h-feed because there are more than one object on the page");
#Parse::debug("mf2:2: Recognized $url as an h-feed because there are more than one object on the page");
return self::parseAsHFeed($mf2, $http);
}
}
@ -114,7 +113,7 @@ class Mf2 {
// If the first item is an h-feed, parse as a feed
$first = $mf2['items'][0];
if(in_array('h-feed', $first['type'])) {
Parse::debug("mf2:3: Recognized $url as an h-feed because the first item is an h-feed");
#Parse::debug("mf2:3: Recognized $url as an h-feed because the first item is an h-feed");
return self::parseAsHFeed($mf2, $http);
}
@ -122,24 +121,24 @@ class Mf2 {
foreach($mf2['items'] as $item) {
// Otherwise check for a recognized h-entr* object
if(in_array('h-entry', $item['type']) || in_array('h-cite', $item['type'])) {
Parse::debug("mf2:6: $url is falling back to the first h-entry on the page");
#Parse::debug("mf2:6: $url is falling back to the first h-entry on the page");
return self::parseAsHEntry($mf2, $item, $http);
} elseif(in_array('h-event', $item['type'])) {
Parse::debug("mf2:6: $url is falling back to the first h-event on the page");
#Parse::debug("mf2:6: $url is falling back to the first h-event on the page");
return self::parseAsHEvent($mf2, $item, $http);
} elseif(in_array('h-review', $item['type'])) {
Parse::debug("mf2:6: $url is falling back to the first h-review on the page");
#Parse::debug("mf2:6: $url is falling back to the first h-review on the page");
return self::parseAsHReview($mf2, $item, $http);
} elseif(in_array('h-recipe', $item['type'])) {
Parse::debug("mf2:6: $url is falling back to the first h-recipe on the page");
#Parse::debug("mf2:6: $url is falling back to the first h-recipe on the page");
return self::parseAsHReview($mf2, $item, $http);
} elseif(in_array('h-product', $item['type'])) {
Parse::debug("mf2:6: $url is falling back to the first h-product on the page");
#Parse::debug("mf2:6: $url is falling back to the first h-product on the page");
return self::parseAsHProduct($mf2, $item, $http);
}
}
Parse::debug("mf2:E: No object at $url was recognized");
#Parse::debug("mf2:E: No object at $url was recognized");
return false;
}
@ -311,7 +310,7 @@ class Mf2 {
];
if(count($refs)) {
$response['refs'] = $refs;
$response['data']['refs'] = $refs;
}
return $response;
@ -345,7 +344,7 @@ class Mf2 {
];
if(count($refs)) {
$response['refs'] = $refs;
$response['data']['refs'] = $refs;
}
return $response;
@ -376,7 +375,7 @@ class Mf2 {
];
if(count($refs)) {
$response['refs'] = $refs;
$response['data']['refs'] = $refs;
}
return $response;
@ -403,7 +402,7 @@ class Mf2 {
];
if(count($refs)) {
$response['refs'] = $refs;
$response['data']['refs'] = $refs;
}
return $response;
@ -457,7 +456,7 @@ class Mf2 {
];
if(count($refs)) {
$response['refs'] = $refs;
$response['data']['refs'] = $refs;
}
return $response;
@ -496,7 +495,7 @@ class Mf2 {
$found = false;
foreach($item['properties']['url'] as $url) {
if(self::isURL($url)) {
$url = self::normalize_url($url);
$url = \p3k\XRay\normalize_url($url);
if($url == $authorURL) {
$data['url'] = $url;
$found = true;
@ -723,25 +722,4 @@ class Mf2 {
}
return \mf2\Parse($result['body'], $url);
}
private static function normalize_url($url) {
$parts = parse_url($url);
if(empty($parts['path']))
$parts['path'] = '/';
$parts['host'] = strtolower($parts['host']);
return self::build_url($parts);
}
private static function build_url($parsed_url) {
$scheme = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
$host = isset($parsed_url['host']) ? $parsed_url['host'] : '';
$port = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
$user = isset($parsed_url['user']) ? $parsed_url['user'] : '';
$pass = isset($parsed_url['pass']) ? ':' . $parsed_url['pass'] : '';
$pass = ($user || $pass) ? "$pass@" : '';
$path = isset($parsed_url['path']) ? $parsed_url['path'] : '';
$query = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
$fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
return "$scheme$user$pass$host$port$path$query$fragment";
}
}

lib/Formats/Twitter.php → lib/XRay/Formats/Twitter.php View File

@ -1,34 +1,54 @@
<?php
namespace XRay\Formats;
namespace p3k\XRay\Formats;
use DateTime, DateTimeZone;
use Parse;
class Twitter {
class Twitter extends Format {
public static function parse($url, $tweet_id, $creds, $json=null) {
public static function matches_host($url) {
$host = parse_url($url, PHP_URL_HOST);
return in_array($host, ['mobile.twitter.com','twitter.com','www.twitter.com','twtr.io']);
}
public static function matches($url) {
if(preg_match('/https?:\/\/(?:mobile\.twitter\.com|twitter\.com|twtr\.io)\/(?:[a-z0-9_\/!#]+statuse?s?\/([0-9]+)|([a-zA-Z0-9_]+))/i', $url, $match))
return $match;
else
return false;
}
public static function fetch($url, $creds) {
if(!($match = self::matches($url))) {
return false;
}
$tweet_id = $match[1];
$host = parse_url($url, PHP_URL_HOST);
if($host == 'twtr.io') {
$tweet_id = self::b60to10($tweet_id);
}
if($json) {
if(is_string($json))
$tweet = json_decode($json);
else
$tweet = $json;
} else {
$twitter = new \Twitter($creds['twitter_api_key'], $creds['twitter_api_secret'], $creds['twitter_access_token'], $creds['twitter_access_token_secret']);
try {
$tweet = $twitter->request('statuses/show/'.$tweet_id, 'GET', ['tweet_mode'=>'extended']);
} catch(\TwitterException $e) {
return [false, false];
}
$twitter = new \Twitter($creds['twitter_api_key'], $creds['twitter_api_secret'], $creds['twitter_access_token'], $creds['twitter_access_token_secret']);
try {
$tweet = $twitter->request('statuses/show/'.$tweet_id, 'GET', ['tweet_mode'=>'extended']);
} catch(\TwitterException $e) {
return false;
}
if(!$tweet)
return [false, false];
return $tweet;
}
public static function parse($json, $url) {
if(is_string($json))
$tweet = json_decode($json);
else
$tweet = $json;
if(!$tweet) {
return self::_unknown();
}
$entry = array(
'type' => 'entry',
@ -56,9 +76,9 @@ class Twitter {
$repostOf = 'https://twitter.com/' . $reposted->user->screen_name . '/status/' . $reposted->id_str;
$entry['repost-of'] = $repostOf;
list($repostedEntry) = self::parse($repostOf, $reposted->id_str, null, $reposted);
if(isset($repostedEntry['refs'])) {
foreach($repostedEntry['refs'] as $k=>$v) {
$repostedEntry = self::parse($reposted, $repostOf);
if(isset($repostedEntry['data']['refs'])) {
foreach($repostedEntry['data']['refs'] as $k=>$v) {
$refs[$k] = $v;
}
}
@ -141,28 +161,27 @@ class Twitter {
// Quoted Status
if(property_exists($tweet, 'quoted_status')) {
$quoteOf = 'https://twitter.com/' . $tweet->quoted_status->user->screen_name . '/status/' . $tweet->quoted_status_id_str;
list($quoted) = self::parse($quoteOf, $tweet->quoted_status_id_str, null, $tweet->quoted_status);
if(isset($quoted['refs'])) {
foreach($quoted['refs'] as $k=>$v) {
$quotedEntry = self::parse($tweet->quoted_status, $quoteOf);
if(isset($quotedEntry['data']['refs'])) {
foreach($quotedEntry['data']['refs'] as $k=>$v) {
$refs[$k] = $v;
}
}
$refs[$quoteOf] = $quoted['data'];
$refs[$quoteOf] = $quotedEntry['data'];
}
if($author = self::_buildHCardFromTwitterProfile($tweet->user)) {
$entry['author'] = $author;
}
$response = [
'data' => $entry
];
if(count($refs)) {
$response['refs'] = $refs;
$entry['refs'] = $refs;
}
return [$response, $tweet];
return [
'data' => $entry,
'original' => $tweet,
];
}
private static function _buildHCardFromTwitterProfile($profile) {

lib/Formats/XKCD.php → lib/XRay/Formats/XKCD.php View File

@ -1,11 +1,19 @@
<?php
namespace XRay\Formats;
namespace p3k\XRay\Formats;
use DOMDocument, DOMXPath;
use DateTime, DateTimeZone;
use Parse, Config;
use Config;
class XKCD {
class XKCD extends Format {
public static function matches_host($url) {
$host = parse_url($url, PHP_URL_HOST);
return $host == 'xkcd.com';
}
public static function matches($url) {
return self::matches_host($url) && parse_url($url, PHP_URL_PATH) != '/';
}
public static function parse($html, $url) {
list($doc, $xpath) = self::_loadHTML($html);
@ -56,25 +64,4 @@ class XKCD {
return $response;
}
private static function _unknown() {
return [
'data' => [
'type' => 'unknown'
]
];
}
private static function _loadHTML($html) {
$doc = new DOMDocument();
@$doc->loadHTML($html);
if(!$doc) {
return [null, null];
}
$xpath = new DOMXPath($doc);
return [$doc, $xpath];
}
}

+ 41
- 0
lib/XRay/Parser.php View File

@ -0,0 +1,41 @@
<?php
namespace p3k\XRay;
use p3k\XRay\Formats;
class Parser {
private $http;
public function __construct($http) {
$this->http = $http;
}
public function parse($body, $url, $opts=[]) {
if(isset($opts['timeout']))
$this->http->set_timeout($opts['timeout']);
if(isset($opts['max_redirects']))
$this->http->set_max_redirects($opts['max_redirects']);
// Check if the URL matches a special parser
if(Formats\Instagram::matches($url)) {
return Formats\Instagram::parse($this->http, $body, $url);
}
if(Formats\GitHub::matches($url)) {
return Formats\GitHub::parse($body, $url);
}
if(Formats\Twitter::matches($url)) {
return Formats\Twitter::parse($body, $url);
}
if(Formats\XKCD::matches($url)) {
return Formats\XKCD::parse($body, $url);
}
// No special parsers matched, parse for Microformats now
return Formats\HTML::parse($this->http, $body, $url, $opts);
}
}

+ 63
- 0
lib/XRay/Rels.php View File

@ -0,0 +1,63 @@
<?php
namespace p3k\XRay;
class Rels {
private $http;
public function __construct($http) {
$this->http = $http;
}
public function parse($url, $opts=[]) {
if(isset($opts['timeout']))
$this->http->set_timeout($opts['timeout']);
if(isset($opts['max_redirects']))
$this->http->set_max_redirects($opts['max_redirects']);
$scheme = parse_url($url, PHP_URL_SCHEME);
if(!in_array($scheme, ['http','https'])) {
return [
'error' => 'invalid_url',
'error_description' => 'Only http and https URLs are supported'
];
}
$host = parse_url($url, PHP_URL_HOST);
if(!$host) {
return [
'error' => 'invalid_url',
'error_description' => 'The URL provided was not valid'
];
}
$url = normalize_url($url);
$result = $this->http->get($url);
$html = $result['body'];
$mf2 = \mf2\Parse($html, $result['url']);
$rels = $result['rels'];
if(isset($mf2['rels'])) {
$rels = array_merge($rels, $mf2['rels']);
}
// Resolve all relative URLs
foreach($rels as $rel=>$values) {
foreach($values as $i=>$value) {
$value = \mf2\resolveUrl($result['url'], $value);
$rels[$rel][$i] = $value;
}
}
if(count($rels) == 0)
$rels = new \StdClass;
return [
'url' => $result['url'],
'code' => $result['code'],
'rels' => $rels
];
}
}

+ 2
- 1
lib/helpers.php View File

@ -1,4 +1,5 @@
<?php
namespace p3k\XRay;
function view($template, $data=[]) {
global $templates;
@ -34,4 +35,4 @@ function should_follow_redirects($url) {
} else {
return true;
}
}
}

BIN
public/images/xkcd.png View File

Before After
Width: 352  |  Height: 352  |  Size: 37 KiB

+ 1
- 1
tests/AuthorTest.php View File

@ -8,7 +8,7 @@ class AuthorTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}

+ 1
- 1
tests/FeedTest.php View File

@ -8,7 +8,7 @@ class FeedTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}

+ 1
- 1
tests/FetchTest.php View File

@ -8,7 +8,7 @@ class FetchTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}

+ 1
- 1
tests/GitHubTest.php View File

@ -8,7 +8,7 @@ class GitHubTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}

+ 8
- 2
tests/HelpersTest.php View File

@ -3,14 +3,20 @@ class HelpersTest extends PHPUnit_Framework_TestCase {
public function testLowercaseHostname() {
$url = 'http://Example.com/';
$result = normalize_url($url);
$result = p3k\XRay\normalize_url($url);
$this->assertEquals('http://example.com/', $result);
}
public function testAddsSlashToBareDomain() {
$url = 'http://example.com';
$result = normalize_url($url);
$result = p3k\XRay\normalize_url($url);
$this->assertEquals('http://example.com/', $result);
}
public function testDoesNotModify() {
$url = 'https://example.com/';
$result = p3k\XRay\normalize_url($url);
$this->assertEquals('https://example.com/', $result);
}
}

+ 5
- 5
tests/InstagramTest.php View File

@ -8,7 +8,7 @@ class InstagramTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}
@ -71,8 +71,8 @@ class InstagramTest extends PHPUnit_Framework_TestCase {
$this->assertEquals(2, count($data['data']['category']));
$this->assertContains('http://tinyletter.com/kmikeym', $data['data']['category']);
$this->assertArrayHasKey('http://tinyletter.com/kmikeym', $data['refs']);
$this->assertEquals(['type'=>'card','name'=>'Mike Merrill','url'=>'http://tinyletter.com/kmikeym','photo'=>'https://instagram.fsjc1-3.fna.fbcdn.net/t51.2885-19/s320x320/12627953_686238411518831_1544976311_a.jpg'], $data['refs']['http://tinyletter.com/kmikeym']);
$this->assertArrayHasKey('http://tinyletter.com/kmikeym', $data['data']['refs']);
$this->assertEquals(['type'=>'card','name'=>'Mike Merrill','url'=>'http://tinyletter.com/kmikeym','photo'=>'https://instagram.fsjc1-3.fna.fbcdn.net/t51.2885-19/s320x320/12627953_686238411518831_1544976311_a.jpg'], $data['data']['refs']['http://tinyletter.com/kmikeym']);
}
public function testInstagramPhotoWithVenue() {
@ -86,8 +86,8 @@ class InstagramTest extends PHPUnit_Framework_TestCase {
$this->assertEquals(1, count($data['data']['location']));
$this->assertContains('https://www.instagram.com/explore/locations/109284789535230/', $data['data']['location']);
$this->assertArrayHasKey('https://www.instagram.com/explore/locations/109284789535230/', $data['refs']);
$venue = $data['refs']['https://www.instagram.com/explore/locations/109284789535230/'];
$this->assertArrayHasKey('https://www.instagram.com/explore/locations/109284789535230/', $data['data']['refs']);
$venue = $data['data']['refs']['https://www.instagram.com/explore/locations/109284789535230/'];
$this->assertEquals('XOXO Outpost', $venue['name']);
$this->assertEquals('45.5261002', $venue['latitude']);
$this->assertEquals('-122.6558081', $venue['longitude']);

+ 27
- 26
tests/ParseTest.php View File

@ -8,7 +8,7 @@ class ParseTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}
@ -205,9 +205,9 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$data = json_decode($body, true);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('http://example.com/100', $data['data']['in-reply-to'][0]);
$this->assertArrayHasKey('http://example.com/100', $data['refs']);
$this->assertEquals('Example Post', $data['refs']['http://example.com/100']['name']);
$this->assertEquals('http://example.com/100', $data['refs']['http://example.com/100']['url']);
$this->assertArrayHasKey('http://example.com/100', $data['data']['refs']);
$this->assertEquals('Example Post', $data['data']['refs']['http://example.com/100']['name']);
$this->assertEquals('http://example.com/100', $data['data']['refs']['http://example.com/100']['url']);
}
public function testPersonTagIsURL() {
@ -230,10 +230,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$data = json_decode($body, true);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('http://alice.example.com/', $data['data']['category'][0]);
$this->assertArrayHasKey('http://alice.example.com/', $data['refs']);
$this->assertEquals('card', $data['refs']['http://alice.example.com/']['type']);
$this->assertEquals('http://alice.example.com/', $data['refs']['http://alice.example.com/']['url']);
$this->assertEquals('Alice', $data['refs']['http://alice.example.com/']['name']);
$this->assertArrayHasKey('http://alice.example.com/', $data['data']['refs']);
$this->assertEquals('card', $data['data']['refs']['http://alice.example.com/']['type']);
$this->assertEquals('http://alice.example.com/', $data['data']['refs']['http://alice.example.com/']['url']);
$this->assertEquals('Alice', $data['data']['refs']['http://alice.example.com/']['name']);
}
public function testSyndicationIsURL() {
@ -372,10 +372,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$this->assertEquals($url, $data['data']['url']);
$this->assertEquals('2016-02-09T18:30', $data['data']['start']);
$this->assertEquals('2016-02-09T19:30', $data['data']['end']);
$this->assertArrayHasKey('http://source.example.com/venue', $data['refs']);
$this->assertEquals('card', $data['refs']['http://source.example.com/venue']['type']);
$this->assertEquals('http://source.example.com/venue', $data['refs']['http://source.example.com/venue']['url']);
$this->assertEquals('Venue', $data['refs']['http://source.example.com/venue']['name']);
$this->assertArrayHasKey('http://source.example.com/venue', $data['data']['refs']);
$this->assertEquals('card', $data['data']['refs']['http://source.example.com/venue']['type']);
$this->assertEquals('http://source.example.com/venue', $data['data']['refs']['http://source.example.com/venue']['url']);
$this->assertEquals('Venue', $data['data']['refs']['http://source.example.com/venue']['name']);
}
public function testMf2ReviewOfProduct() {
@ -395,10 +395,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$this->assertContains('red', $data['data']['category']);
$this->assertContains('blue', $data['data']['category']);
$this->assertContains('http://product.example.com/', $data['data']['item']);
$this->assertArrayHasKey('http://product.example.com/', $data['refs']);
$this->assertEquals('product', $data['refs']['http://product.example.com/']['type']);
$this->assertEquals('The Reviewed Product', $data['refs']['http://product.example.com/']['name']);
$this->assertEquals('http://product.example.com/', $data['refs']['http://product.example.com/']['url']);
$this->assertArrayHasKey('http://product.example.com/', $data['data']['refs']);
$this->assertEquals('product', $data['data']['refs']['http://product.example.com/']['type']);
$this->assertEquals('The Reviewed Product', $data['data']['refs']['http://product.example.com/']['name']);
$this->assertEquals('http://product.example.com/', $data['data']['refs']['http://product.example.com/']['url']);
}
public function testMf2ReviewOfHCard() {
@ -416,10 +416,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$this->assertEquals('5', $data['data']['best']);
$this->assertEquals('This is the full text of the review', $data['data']['content']['text']);
$this->assertContains('http://business.example.com/', $data['data']['item']);
$this->assertArrayHasKey('http://business.example.com/', $data['refs']);
$this->assertEquals('card', $data['refs']['http://business.example.com/']['type']);
$this->assertEquals('The Reviewed Business', $data['refs']['http://business.example.com/']['name']);
$this->assertEquals('http://business.example.com/', $data['refs']['http://business.example.com/']['url']);
$this->assertArrayHasKey('http://business.example.com/', $data['data']['refs']);
$this->assertEquals('card', $data['data']['refs']['http://business.example.com/']['type']);
$this->assertEquals('The Reviewed Business', $data['data']['refs']['http://business.example.com/']['name']);
$this->assertEquals('http://business.example.com/', $data['data']['refs']['http://business.example.com/']['url']);
}
public function testMf1Review() {
@ -438,10 +438,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$this->assertEquals('5', $data['data']['best']);
$this->assertEquals('This is the full text of the review', $data['data']['content']['text']);
// $this->assertContains('http://product.example.com/', $data['data']['item']);
// $this->assertArrayHasKey('http://product.example.com/', $data['refs']);
// $this->assertEquals('product', $data['refs']['http://product.example.com/']['type']);
// $this->assertEquals('The Reviewed Product', $data['refs']['http://product.example.com/']['name']);
// $this->assertEquals('http://product.example.com/', $data['refs']['http://product.example.com/']['url']);
// $this->assertArrayHasKey('http://product.example.com/', $data['data']['refs']);
// $this->assertEquals('product', $data['data']['refs']['http://product.example.com/']['type']);
// $this->assertEquals('The Reviewed Product', $data['data']['refs']['http://product.example.com/']['name']);
// $this->assertEquals('http://product.example.com/', $data['data']['refs']['http://product.example.com/']['url']);
}
@ -473,8 +473,8 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('https://www.facebook.com/555707837940351#tantek', $data['data']['url']);
$this->assertContains('https://www.facebook.com/tantek.celik', $data['data']['invitee']);
$this->assertArrayHasKey('https://www.facebook.com/tantek.celik', $data['refs']);
$this->assertEquals('Tantek Çelik', $data['refs']['https://www.facebook.com/tantek.celik']['name']);
$this->assertArrayHasKey('https://www.facebook.com/tantek.celik', $data['data']['refs']);
$this->assertEquals('Tantek Çelik', $data['data']['refs']['https://www.facebook.com/tantek.celik']['name']);
}
public function testEntryAtFragmentID() {
@ -485,6 +485,7 @@ class ParseTest extends PHPUnit_Framework_TestCase {
$this->assertEquals(200, $response->getStatusCode());
$data = json_decode($body, true);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Comment text', $data['data']['content']['text']);
$this->assertEquals('http://source.example.com/fragment-id#comment-1000', $data['data']['url']);
$this->assertTrue($data['info']['found_fragment']);
}

+ 1
- 1
tests/SanitizeTest.php View File

@ -8,7 +8,7 @@ class SanitizeTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Parse();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
$this->client->mc = null;
}

+ 1
- 1
tests/TokenTest.php View File

@ -8,7 +8,7 @@ class TokenTest extends PHPUnit_Framework_TestCase {
public function setUp() {
$this->client = new Token();
$this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
$this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
}
private function token($params) {

+ 18
- 18
tests/TwitterTest.php View File

@ -29,7 +29,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testBasicProfileInfo() {
list($url, $json) = $this->loadTweet('818912506496229376');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('aaronpk dev', $data['data']['author']['name']);
@ -43,7 +43,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testProfileWithNonExpandedURL() {
list($url, $json) = $this->loadTweet('791704641046052864');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('http://agiletortoise.com', $data['data']['author']['url']);
}
@ -51,9 +51,9 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testBasicTestStuff() {
list($url, $json) = $this->loadTweet('818913630569664512');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals(200, $data['code']);
$this->assertEquals(null, $data['code']); // no code is expected if we pass in the body
$this->assertEquals('https://twitter.com/pkdev/status/818913630569664512', $data['url']);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('A tweet with a URL https://indieweb.org/ #and #some #hashtags', $data['data']['content']['text']);
@ -67,14 +67,14 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testPositiveTimezone() {
list($url, $json) = $this->loadTweet('719914707566649344');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals("2016-04-12T16:46:56+01:00", $data['data']['published']);
}
public function testTweetWithEmoji() {
list($url, $json) = $this->loadTweet('818943244553699328');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Here 🎉 have an emoji', $data['data']['content']['text']);
@ -83,7 +83,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testHTMLEscaping() {
list($url, $json) = $this->loadTweet('818928092383166465');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Double escaping &amp; & amp', $data['data']['content']['text']);
@ -92,7 +92,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testTweetWithPhoto() {
list($url, $json) = $this->loadTweet('818912506496229376');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Tweet with a photo and a location', $data['data']['content']['text']);
@ -102,7 +102,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testTweetWithTwoPhotos() {
list($url, $json) = $this->loadTweet('818935308813103104');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Two photos', $data['data']['content']['text']);
@ -113,7 +113,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testTweetWithVideo() {
list($url, $json) = $this->loadTweet('818913178260160512');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Tweet with a video', $data['data']['content']['text']);
@ -123,12 +123,12 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testTweetWithLocation() {
list($url, $json) = $this->loadTweet('818912506496229376');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Tweet with a photo and a location', $data['data']['content']['text']);
$this->assertEquals('https://api.twitter.com/1.1/geo/id/ac88a4f17a51c7fc.json', $data['data']['location']);
$location = $data['refs']['https://api.twitter.com/1.1/geo/id/ac88a4f17a51c7fc.json'];
$location = $data['data']['refs']['https://api.twitter.com/1.1/geo/id/ac88a4f17a51c7fc.json'];
$this->assertEquals('adr', $location['type']);
$this->assertEquals('Portland', $location['locality']);
$this->assertEquals('United States', $location['country-name']);
@ -138,38 +138,38 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
public function testRetweet() {
list($url, $json) = $this->loadTweet('818913351623245824');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertArrayNotHasKey('content', $data['data']);
$repostOf = 'https://twitter.com/aaronpk/status/817414679131660288';
$this->assertEquals($repostOf, $data['data']['repost-of']);
$tweet = $data['refs'][$repostOf];
$tweet = $data['data']['refs'][$repostOf];
$this->assertEquals('Yeah that\'s me http://xkcd.com/1782/', $tweet['content']['text']);
}
public function testRetweetWithPhoto() {
list($url, $json) = $this->loadTweet('820039442773798912');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertArrayNotHasKey('content', $data['data']);
$this->assertArrayNotHasKey('photo', $data['data']);
$repostOf = 'https://twitter.com/phlaimeaux/status/819943954724556800';
$this->assertEquals($repostOf, $data['data']['repost-of']);
$tweet = $data['refs'][$repostOf];
$tweet = $data['data']['refs'][$repostOf];
$this->assertEquals('this headline is such a rollercoaster', $tweet['content']['text']);
}
public function testQuotedTweet() {
list($url, $json) = $this->loadTweet('818913488609251331');
$data = $this->parse(['url' => $url, 'json' => $json]);
$data = $this->parse(['url' => $url, 'body' => $json]);
$this->assertEquals('entry', $data['data']['type']);
$this->assertEquals('Quoted tweet with a #hashtag https://twitter.com/aaronpk/status/817414679131660288', $data['data']['content']['text']);
$tweet = $data['refs']['https://twitter.com/aaronpk/status/817414679131660288'];
$tweet = $data['data']['refs']['https://twitter.com/aaronpk/status/817414679131660288'];
$this->assertEquals('Yeah that\'s me http://xkcd.com/1782/', $tweet['content']['text']);
}

Loading…
Cancel
Save