Merge pull request #38 from aaronpk/library-refactor

Refactors into a library that can be used separately from the API
8 years ago · 11977e6746
--- a/.gitignore
+++ b/.gitignore
@ -1,4 +1,5 @@
 .DS_Store
 config.php
 vendor/
 XRay-*.json
 php_errors.log
 XRay-*.json
--- a/LICENSE.txt
+++ b/LICENSE.txt
@ -1,7 +1,21 @@
 Copyright 2016 by Aaron Parecki
 MIT License

 Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
 Copyright (c) 2017 Aaron Parecki

 http://www.apache.org/licenses/LICENSE-2.0
 Permission is hereby granted, free of charge, to any person obtaining a copy
 of this software and associated documentation files (the "Software"), to deal
 in the Software without restriction, including without limitation the rights
 to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 copies of the Software, and to permit persons to whom the Software is
 furnished to do so, subject to the following conditions:

 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
 The above copyright notice and this permission notice shall be included in all
 copies or substantial portions of the Software.

 THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
 IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
 FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
 AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
 LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
 OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
 SOFTWARE.
--- a/README.md
+++ b/README.md
@ -9,13 +9,66 @@ XRay parses structured content from a URL.
 The contents of the URL is checked in the following order:

 * A silo URL from one of the following websites:
 ** Instagram
 ** Twitter
 ** (more coming soon)
 * h-entry, h-event, h-card
  * Instagram
  * Twitter
  * GitHub
  * XKCD
  * (more coming soon)
 * Microformats
  * h-card
  * h-entry
  * h-event
  * h-review
  * h-recipe
  * h-product

 ## Library

 XRay can be used as a library in your PHP project. The easiest way to install it and its dependencies is via composer.

 ```
 composer require p3k/xray
 ```

 Basic usage:

 ```php
 $xray = new p3k\XRay();
 $parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/');
 ```

 If you already have an HTML or JSON document you want to parse, you can pass it as a string in the second parameter.

 ```php
 $xray = new p3k\XRay();
 $html = '<html>....</html>';
 $parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', $html);
 ```

 In both cases, you can add an additional parameter to configure various options of how XRay will behave. Below is a list of the options.

 ## Parse API
 * `timeout` - The timeout in seconds to wait for any HTTP requests
 * `max_redirects` - The maximum number of redirects to follow
 * `include_original` - Will also return the full document fetched
 * `target` - Specify a target URL, and XRay will first check if that URL is on the page, and only if it is, will continue to parse the page. This is useful when you're using XRay to verify an incoming webmention.

 Additionally, the following parameters are supported when making requests that use the Twitter or GitHub API. See the authentication section below for details.

 ```php
 $xray = new p3k\XRay();

 $parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', [
  'timeout' => 30
 ]);

 $parsed = $xray->parse('https://aaronparecki.com/2017/04/28/9/', $html, [
  'target' => 'http://example.com/'
 ]);
 ```

 ## API

 XRay can also be used as an API to provide its parsing capabilities over an HTTP service.

 To parse a page and return structured data for the contents of the page, simply pass a url to the parse route.

@ -33,6 +86,26 @@ In both cases, the response will be a JSON object containing a key of "type". If

 You can also make a POST request with the same parameter names.

 If you already have an HTML or JSON document you want to parse, you can include that in the parameter `body`. This POST request would look like the below:

 ```
 POST /parse
 Content-type: application/x-www-form-urlencoded

 url=https://aaronparecki.com/2016/01/16/11/
 &body=<html>....</html>
 ```

 or for Twitter/GitHub where you might have JSON,

 ```
 POST /parse
 Content-type: application/x-www-form-urlencoded

 url=https://github.com/aaronpk/XRay
 &body={"repo":......}
 ```

 ### Authentication

 If the URL you are fetching requires authentication, include the access token in the parameter "token", and it will be included in an "Authorization" header when fetching the URL. (It is recommended to use a POST request in this case, to avoid the access token potentially being logged as part of the query string.) This is useful for [Private Webmention](https://indieweb.org/Private-Webmention) verification.
@ -57,6 +130,13 @@ You should only send Twitter credentials when the URL you are trying to parse is
 * twitter_access_token_secret - Your Twitter secret access token


 ### GitHub Authentication

 XRay uses the GitHub API to fetch GitHub URLs, which provides higher rate limits when used with authentication. You can pass a GitHub access token along with the request and XRay will use it when making requests to the API.

 * github_access_token - A GitHub access token


 ### Error Response

 ```json
@ -119,8 +199,14 @@ The primary object on the page is returned in the `data` property. This will ind
 If a property supports multiple values, it will always be returned as an array. The following properties support multiple values:

 * in-reply-to
 * like-of
 * repost-of
 * bookmark-of
 * syndication
 * photo (of entry, not of a card)
 * video
 * audio
 * category

 The content will be an object that always contains a "text" property and may contain an "html" property if the source documented published HTML content. The "text" property must always be HTML escaped before displaying it as HTML, as it may include unescaped characters such as `<` and `>`.

--- a/composer.json
+++ b/composer.json
@ -1,36 +1,39 @@
 {
  "name": "p3k/xray",
  "type": "library",
  "license": "MIT",
  "homepage": "https://github.com/aaronpk/XRay",
  "description": "X-Ray returns structured data from any URL",
  "require": {
    "league/plates": "3.*",
    "league/route": "1.*",
    "mf2/mf2": "~0.3",
    "ezyang/htmlpurifier": "4.*",
    "indieweb/link-rel-parser": "0.1.*",
    "dg/twitter-php": "^3.6",
    "dg/twitter-php": "3.6.*",
    "p3k/timezone": "*",
    "cebe/markdown": "~1.1.1"
    "p3k/http": "0.1.*",
    "cebe/markdown": "1.1.*"
  },
  "autoload": {
    "psr-4": {
      "p3k\\XRay\\": "lib/XRay"
    },
    "files": [
      "lib/helpers.php",
      "controllers/Main.php",
      "controllers/Parse.php",
      "controllers/Token.php",
      "controllers/Rels.php",
      "controllers/Certbot.php",
      "lib/HTTPCurl.php",
      "lib/HTTPStream.php",
      "lib/HTTP.php",
      "lib/Formats/Mf2.php",
      "lib/Formats/Instagram.php",
      "lib/Formats/GitHub.php",
      "lib/Formats/Twitter.php",
      "lib/Formats/XKCD.php",
      "lib/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php"
      "lib/XRay.php"
    ]
  },
  "require-dev": {
    "league/plates": "3.*",
    "league/route": "1.*",
    "phpunit/phpunit": "4.8.*"
  },
  "autoload-dev": {
    "files": [
      "lib/HTTPTest.php"
      "controllers/Main.php",
      "controllers/Parse.php",
      "controllers/Token.php",
      "controllers/Rels.php",
      "controllers/Certbot.php"
    ]
  }
 }
--- a/composer.lock
+++ b/composer.lock
--- a/controllers/Certbot.php
+++ b/controllers/Certbot.php
@ -13,7 +13,7 @@ class Certbot {
    $state = mt_rand(10000,99999);
    $_SESSION['state'] = $state;

    $response->setContent(view('certbot', [
    $response->setContent(p3k\XRay\view('certbot', [
      'title' => 'X-Ray',
      'state' => $state
    ]));
@ -109,7 +109,7 @@ class Certbot {
      'challenge' => $challenge
    ]), 0, 600);

    $response->setContent(view('certbot', [
    $response->setContent(p3k\XRay\view('certbot', [
      'title' => 'X-Ray',
      'challenge' => $challenge,
      'token' => $token,
--- a/controllers/Main.php
+++ b/controllers/Main.php
@ -5,7 +5,7 @@ use Symfony\Component\HttpFoundation\Response;
 class Main {

  public function index(Request $request, Response $response) {
    $response->setContent(view('index', [
    $response->setContent(p3k\XRay\view('index', [
      'title' => 'X-Ray'
    ]));
    return $response;
--- a/controllers/Parse.php
+++ b/controllers/Parse.php
@ -2,7 +2,7 @@
 use Symfony\Component\HttpFoundation\Request;
 use Symfony\Component\HttpFoundation\Response;

 use XRay\Formats;
 use p3k\XRay\Formats;

 class Parse {

@ -12,11 +12,11 @@ class Parse {
  private $_pretty = false;

  public static function useragent() {
    return 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 XRay/1.0.0 ('.\Config::$base.')';
    return 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/57.0.2987.133 Safari/537.36 XRay/1.0.0 ('.\Config::$base.')';
  }

  public function __construct() {
    $this->http = new p3k\HTTP();
    $this->http = new p3k\HTTP(self::useragent());
    if(Config::$cache && class_exists('Memcache')) {
      $this->mc = new Memcache();
      $this->mc->addServer('127.0.0.1');
@ -41,19 +41,20 @@ class Parse {
    return $response;
  }

  private static function toHtmlEntities($input) {
    return mb_convert_encoding($input, 'HTML-ENTITIES', mb_detect_encoding($input));
  }

  public function parse(Request $request, Response $response) {
    $opts = [];

    if($request->get('timeout')) {
      // We might make 2 HTTP requests, so each request gets half the desired timeout
      $this->http->timeout = $request->get('timeout') / 2;
      $opts['timeout'] = $request->get('timeout') / 2;
    }

    if($request->get('max_redirects')) {
      $this->http->max_redirects = (int)$request->get('max_redirects');
    if($request->get('max_redirects') !== null) {
      $opts['max_redirects'] = (int)$request->get('max_redirects');
    }

    if($request->get('target')) {
      $opts['target'] = $request->get('target');
    }

    if($request->get('pretty')) {
@ -61,12 +62,12 @@ class Parse {
    }

    $url = $request->get('url');
    $html = $request->get('html');
    $html = $request->get('html') ?: $request->get('body');

    if(!$url && !$html) {
      return $this->respond($response, 400, [
        'error' => 'missing_url',
        'error_description' => 'Provide a URL or HTML to fetch'
        'error_description' => 'Provide a URL or HTML to fetch',
      ]);
    }

@ -74,319 +75,53 @@ class Parse {
      // If HTML is provided in the request, parse that, and use the URL provided as the base URL for mf2 resolving
      $result['body'] = $html;
      $result['url'] = $url;
      $result['code'] = null;
    } else {
      // Attempt some basic URL validation
      $scheme = parse_url($url, PHP_URL_SCHEME);
      if(!in_array($scheme, ['http','https'])) {
        return $this->respond($response, 400, [
          'error' => 'invalid_url',
          'error_description' => 'Only http and https URLs are supported'
        ]);
      }

      $host = parse_url($url, PHP_URL_HOST);
      if(!$host) {
        return $this->respond($response, 400, [
          'error' => 'invalid_url',
          'error_description' => 'The URL provided was not valid'
        ]);
      $fetcher = new p3k\XRay\Fetcher($this->http);

      $fields = [
        'twitter_api_key','twitter_api_secret','twitter_access_token','twitter_access_token_secret',
        'github_access_token',
        'token'
      ];
      foreach($fields as $f) {
        if($v=$request->get($f))
          $opts[$f] = $v;
      }

      $url = \normalize_url($url);

      // Check if this is a Twitter URL and if they've provided API credentials, use the API
      if(preg_match('/https?:\/\/(?:mobile\.twitter\.com|twitter\.com|twtr\.io)\/(?:[a-z0-9_\/!#]+statuse?s?\/([0-9]+)|([a-zA-Z0-9_]+))/i', $url, $match)) {
        return $this->parseTwitterURL($request, $response, $url, $match);
      }

      if($host == 'github.com') {
        return $this->parseGitHubURL($request, $response, $url);
      }
      $result = $fetcher->fetch($url, $opts);

      // Now fetch the URL and check for any curl errors
      // Don't cache the response if a token is used to fetch it
      if($this->mc && !$request->get('token')) {
        $cacheKey = 'xray-'.md5($url);
        if($cached=$this->mc->get($cacheKey)) {
          $result = json_decode($cached, true);
          self::debug('using HTML from cache', 'X-Cache-Debug');
        } else {
          $result = $this->http->get($url, [self::useragent()]);
          $cacheData = json_encode($result);
          // App Engine limits the size of cached items, so don't cache ones larger than that
          if(strlen($cacheData) < 1000000) 
            $this->mc->set($cacheKey, $cacheData, MEMCACHE_COMPRESSED, $this->_cacheTime);
        }
      } else {
        $headers = [self::useragent()];
        if($request->get('token')) {
          $headers[] = 'Authorization: Bearer ' . $request->get('token');
        }

        $result = $this->http->get($url, $headers);
      }

      if($result['error']) {
        return $this->respond($response, 200, [
          'error' => $result['error'],
          'error_description' => $result['error_description'],
          'url' => $result['url'],
          'code' => $result['code']
        ]);
      }

      if(trim($result['body']) == '') {
        if($result['code'] == 410) {
          // 410 Gone responses are valid and should not return an error
          return $this->respond($response, 200, [
            'data' => [
              'type' => 'unknown'
            ],
            'url' => $result['url'],
            'code' => $result['code']
          ]);
        }

        return $this->respond($response, 200, [
          'error' => 'no_content',
          'error_description' => 'We did not get a response body when fetching the URL',
          'url' => $result['url'],
          'code' => $result['code']
        ]);
      }

      // Check for HTTP 401/403
      if($result['code'] == 401) {
        return $this->respond($response, 200, [
          'error' => 'unauthorized',
          'error_description' => 'The URL returned "HTTP 401 Unauthorized"',
          'url' => $result['url'],
          'code' => 401
        ]);
      }
      if($result['code'] == 403) {
        return $this->respond($response, 200, [
          'error' => 'forbidden',
          'error_description' => 'The URL returned "HTTP 403 Forbidden"',
          'url' => $result['url'],
          'code' => 403
        ]);
      if(!empty($result['error'])) {
        $error_code = isset($result['error_code']) ? $result['error_code'] : 200;
        unset($result['error_code']);
        return $this->respond($response, $error_code, $result);
      }

    }

    // Check for known services
    $host = parse_url($result['url'], PHP_URL_HOST);

    if(in_array($host, ['www.instagram.com','instagram.com'])) {
      list($data, $parsed) = Formats\Instagram::parse($result['body'], $result['url'], $this->http);
      if($request->get('include_original'))
        $data['original'] = $parsed;
      $data['url'] = $result['url'];
      $data['code'] = $result['code'];
      return $this->respond($response, 200, $data);
    }

    if($host == 'xkcd.com' && parse_url($url, PHP_URL_PATH) != '/') {
      $data = Formats\XKCD::parse($result['body'], $url);
      $data['url'] = $result['url'];
      $data['code'] = $result['code'];
      return $this->respond($response, 200, $data);
    }

    // attempt to parse the page as HTML
    $doc = new DOMDocument();
    @$doc->loadHTML(self::toHtmlEntities($result['body']));

    if(!$doc) {
      return $this->respond($response, 200, [
        'error' => 'invalid_content',
        'error_description' => 'The document could not be parsed as HTML'
      ]);
    }

    $xpath = new DOMXPath($doc);
    $parser = new p3k\XRay\Parser($this->http);
    $parsed = $parser->parse($result['body'], $result['url'], $opts);

    // Check for meta http equiv and replace the status code if present
    foreach($xpath->query('//meta[translate(@http-equiv,\'STATUS\',\'status\')=\'status\']') as $el) {
      $equivStatus = ''.$el->getAttribute('content');
      if($equivStatus && is_string($equivStatus)) {
        if(preg_match('/^(\d+)/', $equivStatus, $match)) {
          $result['code'] = (int)$match[1];
        }
      }
    }
    // Allow the parser to override the HTTP response code, e.g. a meta-equiv tag
    if(isset($parsed['code']))
      $result['code'] = $parsed['code'];

    // If a target parameter was provided, make sure a link to it exists on the page
    if($target=$request->get('target')) {
      $found = [];
      if($target) {
        self::xPathFindNodeWithAttribute($xpath, 'a', 'href', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
        self::xPathFindNodeWithAttribute($xpath, 'img', 'src', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
        self::xPathFindNodeWithAttribute($xpath, 'video', 'src', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
        self::xPathFindNodeWithAttribute($xpath, 'audio', 'src', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
      }

      if(!$found) {
        return $this->respond($response, 200, [
          'error' => 'no_link_found',
          'error_description' => 'The source document does not have a link to the target URL',
          'url' => $result['url'],
          'code' => $result['code'],
        ]);
      }
    }

    // If the URL has a fragment ID, find the DOM starting at that node and parse it instead
    $html = $result['body'];

    $fragment = parse_url($url, PHP_URL_FRAGMENT);
    if($fragment) {
      $fragElement = self::xPathGetElementById($xpath, $fragment);
      if($fragElement) {
        $html = $doc->saveHTML($fragElement);
        $foundFragment = true;
      } else {
        $foundFragment = false;
      }
    }

    // Now start pulling in the data from the page. Start by looking for microformats2
    $mf2 = mf2\Parse($html, $result['url']);

    if($mf2 && count($mf2['items']) > 0) {
      $data = Formats\Mf2::parse($mf2, $result['url'], $this->http);
      if($data) {
        if($fragment) {
          $data['info'] = [
            'found_fragment' => $foundFragment
          ];
        }
        if($request->get('include_original'))
          $data['original'] = $html;
        $data['url'] = $result['url']; // this will be the effective URL after following redirects
        $data['code'] = $result['code'];
        return $this->respond($response, 200, $data);
      }
    }

    // TODO: look for other content like OEmbed or other known services later

    return $this->respond($response, 200, [
      'data' => [
        'type' => 'unknown',
      ],
      'url' => $result['url'],
      'code' => $result['code']
    ]);
  }

  private static function xPathFindNodeWithAttribute($xpath, $node, $attr, $callback) {
    foreach($xpath->query('//'.$node.'[@'.$attr.']') as $el) {
      $v = $el->getAttribute($attr);
      $callback($v);
    }
  }

  private static function xPathGetElementById($xpath, $id) {
    $element = null;
    foreach($xpath->query("//*[@id='$id']") as $el) {
      $element = $el;
    }
    return $element;
  }

  private function parseTwitterURL(&$request, &$response, $url, $match) {
    $fields = ['twitter_api_key','twitter_api_secret','twitter_access_token','twitter_access_token_secret'];
    $creds = [];
    foreach($fields as $f) {
      if($v=$request->get($f))
        $creds[$f] = $v;
    }
    $data = false;
    if(count($creds) == 4) {
      list($data, $parsed) = Formats\Twitter::parse($url, $match[1], $creds);
    } elseif(count($creds) > 0) {
      // If only some Twitter credentials were present, return an error  
      return $this->respond($response, 400, [
        'error' => 'missing_parameters',
        'error_description' => 'All 4 Twitter credentials must be included in the request'
      ]);
    } else {
      // Accept Tweet JSON and parse that if provided
      $json = $request->get('json');
      if($json) {
        list($data, $parsed) = Formats\Twitter::parse($url, $match[1], null, $json);
      }
      // Skip parsing from the Twitter API if they didn't include credentials
    }

    if($data) {
      if($request->get('include_original'))
        $data['original'] = $parsed;
      $data['url'] = $url;
      $data['code'] = 200;
      return $this->respond($response, 200, $data);
    } else {
      return $this->respond($response, 200, [
        'data' => [
          'type' => 'unknown'
        ],
        'url' => $url,
        'code' => 0
      ]);
    }
  }

  private function parseGitHubURL(&$request, &$response, $url) {
    $fields = ['github_access_token'];
    $creds = [];
    foreach($fields as $f) {
      if($v=$request->get($f))
        $creds[$f] = $v;
    }
    $data = false;
    $json = $request->get('json');
    if($json) {
      // Accept GitHub JSON and parse that if provided
      list($data, $json, $code) = Formats\GitHub::parse($this->http, $url, null, $json);
    if(!empty($parsed['error'])) {
      $error_code = isset($parsed['error_code']) ? $parsed['error_code'] : 200;
      unset($parsed['error_code']);
      return $this->respond($response, $error_code, $parsed);
    } else {
      // Otherwise fetch the post unauthenticated or with the provided access token
      list($data, $json, $code) = Formats\GitHub::parse($this->http, $url, $creds);
    }
      $data = [
        'data' => $parsed['data'],
        'url' => $result['url'],
        'code' => $result['code']
      ];
      if(isset($parsed['info']))
        $data['info'] = $parsed['info'];
      if($request->get('include_original') && isset($parsed['original']))
        $data['original'] = $parsed['original'];

    if($data) {
      if($request->get('include_original'))
        $data['original'] = $json;
      $data['url'] = $url;
      $data['code'] = $code;
      return $this->respond($response, 200, $data);
    } else {
      return $this->respond($response, 200, [
        'data' => [
          'type' => 'unknown'
        ],
        'url' => $url,
        'code' => $code
      ]);
    }
  }


 }
--- a/controllers/Rels.php
+++ b/controllers/Rels.php
@ -24,13 +24,15 @@ class Rels {
  }

  public function fetch(Request $request, Response $response) {
    $opts = [];

    if($request->get('timeout')) {
      // We might make 2 HTTP requests, so each request gets half the desired timeout
      $this->http->timeout = $request->get('timeout') / 2;
      $opts['timeout'] = $request->get('timeout') / 2;
    }

    if($request->get('max_redirects')) {
      $this->http->max_redirects = (int)$request->get('max_redirects');
      $opts['max_redirects'] = (int)$request->get('max_redirects');
    }

    if($request->get('pretty')) {
@ -46,51 +48,11 @@ class Rels {
      ]);
    }

    // Attempt some basic URL validation
    $scheme = parse_url($url, PHP_URL_SCHEME);
    if(!in_array($scheme, ['http','https'])) {
      return $this->respond($response, 400, [
        'error' => 'invalid_url',
        'error_description' => 'Only http and https URLs are supported'
      ]);
    }

    $host = parse_url($url, PHP_URL_HOST);
    if(!$host) {
      return $this->respond($response, 400, [
        'error' => 'invalid_url',
        'error_description' => 'The URL provided was not valid'
      ]);
    }

    $url = \normalize_url($url);

    $result = $this->http->get($url);

    $html = $result['body'];
    $mf2 = mf2\Parse($html, $result['url']);

    $rels = p3k\HTTP::link_rels($result['headers']);
    if(isset($mf2['rels'])) {
      $rels = array_merge($rels, $mf2['rels']);
    }

    // Resolve all relative URLs
    foreach($rels as $rel=>$values) {
      foreach($values as $i=>$value) {
        $value = \mf2\resolveUrl($result['url'], $value);
        $rels[$rel][$i] = $value;
      }
    }

    if(count($rels) == 0)
      $rels = new StdClass;
    $xray = new p3k\XRay();
    $xray->http = $this->http;
    $res = $xray->rels($url, $opts);

    return $this->respond($response, 200, [
      'url' => $result['url'],
      'code' => $result['code'],
      'rels' => $rels
    ]);
    return $this->respond($response, !empty($res['error']) ? 400 : 200, $res);
  }

 }
--- a/controllers/Token.php
+++ b/controllers/Token.php
@ -55,7 +55,7 @@ class Token {
    if(is_string($head['headers']['Link']))
      $head['headers']['Link'] = [$head['headers']['Link']];

    $rels = p3k\HTTP::link_rels($head['headers']);
    $rels = $head['rels'];

    $endpoint = false;
    if(array_key_exists('token_endpoint', $rels)) {
--- a/lib/Formats/GitHub.php
+++ b/lib/Formats/GitHub.php
@ -1,122 +0,0 @@
 <?php
 namespace XRay\Formats;

 use DateTime, DateTimeZone;
 use Parse, Config;
 use cebe\markdown\GithubMarkdown;

 class GitHub {

  public static function parse($http, $url, $creds, $json=null) {

    if(!$json) {
      // Transform the GitHub URL to an API request
      if(preg_match('~https://github.com/([^/]+)/([^/]+)/pull/(\d+)$~', $url, $match)) {
        $type = 'pull';
        $org = $match[1];
        $repo = $match[2];
        $pull = $match[3];
        $apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo.'/pulls/'.$pull;

      } elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)$~', $url, $match)) {
        $type = 'issue';
        $org = $match[1];
        $repo = $match[2];
        $issue = $match[3];
        $apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo.'/issues/'.$issue;

      } elseif(preg_match('~https://github.com/([^/]+)/([^/]+)$~', $url, $match)) {
        $type = 'repo';
        $org = $match[1];
        $repo = $match[2];
        $apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo;

      } elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)#issuecomment-(\d+)~', $url, $match)) {
        $type = 'comment';
        $org = $match[1];
        $repo = $match[2];
        $issue = $match[3];
        $comment = $match[4];
        $apiurl = 'https://api.github.com/repos/'.$org.'/'.$repo.'/issues/comments/'.$comment;

      } else {
        return [null, null, 0];
      }

      $response = $http->get($apiurl, ['User-Agent: XRay ('.Config::$base.')']);
      if($response['code'] != 200) {
        return [null, $response['body'], $response['code']];
      }

      $data = json_decode($response['body'], true);
    } else {
      $data = json_decode($json, true);
    }

    if(!$data) {
      return [null, null, 0];
    }

    // Start building the h-entry
    $entry = array(
      'type' => ($type == 'repo' ? 'repo' : 'entry'),
      'url' => $url,
      'author' => [
        'type' => 'card',
        'name' => null,
        'photo' => null,
        'url' => null
      ]
    );

    if($type == 'repo')
      $authorkey = 'owner';
    else
      $authorkey = 'user';

    $entry['author']['name'] = $data[$authorkey]['login'];
    $entry['author']['photo'] = $data[$authorkey]['avatar_url'];
    $entry['author']['url'] = $data[$authorkey]['html_url'];

    if($type == 'pull') {
      $entry['name'] = '#' . $pull . ' ' . $data['title'];
    } elseif($type == 'issue') {
      $entry['name'] = '#' . $issue . ' ' . $data['title'];
    } elseif($type == 'repo') {
      $entry['name'] = $data['name'];
    }

    if($type == 'repo') {
      if(!empty($data['description']))
        $entry['summary'] = $data['description'];
    }

    if($type != 'repo' && !empty($data['body'])) {
      $parser = new GithubMarkdown();

      $entry['content'] = [
        'text' => $data['body'],
        'html' => $parser->parse($data['body'])
      ];
    }

    if($type == 'comment') {
      $entry['in-reply-to'] = ['https://github.com/'.$org.'/'.$repo.'/issues/'.$issue];
    }

    if(!empty($data['labels'])) {
      $entry['category'] = array_map(function($l){
        return $l['name'];
      }, $data['labels']);
    }

    $entry['published'] = $data['created_at'];

    $r = [
      'data' => $entry
    ];

    return [$r, $json, $response['code']];
  }

 }
--- a/lib/HTTP.php
+++ b/lib/HTTP.php
@ -1,56 +0,0 @@
 <?php
 namespace p3k;

 class HTTP {

  public $timeout = 4;
  public $max_redirects = 8;

  public function get($url, $headers=[]) {
    $class = $this->_class($url);
    $http = new $class($url);
    $http->timeout = $this->timeout;
    $http->max_redirects = $this->max_redirects;
    return $http->get($url, $headers);
  }

  public function post($url, $body, $headers=[]) {
    $class = $this->_class($url);
    $http = new $class($url);
    $http->timeout = $this->timeout;
    $http->max_redirects = $this->max_redirects;
    return $http->post($url, $body, $headers);
  }

  public function head($url) {
    $class = $this->_class($url);
    $http = new $class($url);
    $http->timeout = $this->timeout;
    $http->max_redirects = $this->max_redirects;
    return $http->head($url);
  }

  private function _class($url) {
    if(!should_follow_redirects($url)) {
      return 'p3k\HTTPStream';
    } else {
      return 'p3k\HTTPCurl';
    }
  }

  public static function link_rels($header_array) {
    $headers = '';
    foreach($header_array as $k=>$header) {
      if(is_string($header)) {
        $headers .= $k . ': ' . $header . "\r\n";
      } else {
        foreach($header as $h) {
          $headers .= $k . ': ' . $h . "\r\n";
        }
      }
    }
    $rels = \IndieWeb\http_rels($headers);
    return $rels;
  }

 }
--- a/lib/HTTPCurl.php
+++ b/lib/HTTPCurl.php
@ -1,127 +0,0 @@
 <?php
 namespace p3k;

 class HTTPCurl {

  public $timeout = 4;
  public $max_redirects = 8;

  public function get($url, $headers=[]) {
    $ch = curl_init($url);
    $this->_set_curlopts($ch, $url);
    if($headers)
      curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    $response = curl_exec($ch);
    $header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
    return array(
      'code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
      'headers' => self::parse_headers(trim(substr($response, 0, $header_size))),
      'body' => substr($response, $header_size),
      'error' => self::error_string_from_code(curl_errno($ch)),
      'error_description' => curl_error($ch),
      'error_code' => curl_errno($ch),
      'url' => curl_getinfo($ch, CURLINFO_EFFECTIVE_URL),
    );
  }

  public function post($url, $body, $headers=[]) {
    $ch = curl_init($url);
    $this->_set_curlopts($ch, $url);
    curl_setopt($ch, CURLOPT_POST, true);
    curl_setopt($ch, CURLOPT_POSTFIELDS, $body);
    if($headers)
      curl_setopt($ch, CURLOPT_HTTPHEADER, $headers);
    $response = curl_exec($ch);
    $header_size = curl_getinfo($ch, CURLINFO_HEADER_SIZE);
    return array(
      'code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
      'headers' => self::parse_headers(trim(substr($response, 0, $header_size))),
      'body' => substr($response, $header_size),
      'error' => self::error_string_from_code(curl_errno($ch)),
      'error_description' => curl_error($ch),
      'error_code' => curl_errno($ch),
      'url' => curl_getinfo($ch, CURLINFO_EFFECTIVE_URL),
    );
  }

  public function head($url) {
    $ch = curl_init($url);
    $this->_set_curlopts($ch, $url);
    curl_setopt($ch, CURLOPT_NOBODY, true);
    $response = curl_exec($ch);
    return array(
      'code' => curl_getinfo($ch, CURLINFO_HTTP_CODE),
      'headers' => self::parse_headers(trim($response)),
      'error' => self::error_string_from_code(curl_errno($ch)),
      'error_description' => curl_error($ch),
      'error_code' => curl_errno($ch),
      'url' => curl_getinfo($ch, CURLINFO_EFFECTIVE_URL),
    );
  }

  private function _set_curlopts($ch, $url) {
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt($ch, CURLOPT_HEADER, true);

    // Special-case appspot.com URLs to not follow redirects.
    // https://cloud.google.com/appengine/docs/php/urlfetch/
    if(should_follow_redirects($url)) {
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION, true);
      curl_setopt($ch, CURLOPT_MAXREDIRS, $this->max_redirects);
    } else {
      curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
    }

    curl_setopt($ch, CURLOPT_TIMEOUT_MS, round($this->timeout * 1000));
    curl_setopt($ch, CURLOPT_CONNECTTIMEOUT_MS, 2000);
  }

  public static function error_string_from_code($code) {
    switch($code) {
      case 0:
        return '';
      case CURLE_COULDNT_RESOLVE_HOST:
        return 'dns_error';
      case CURLE_COULDNT_CONNECT:
        return 'connect_error';
      case CURLE_OPERATION_TIMEDOUT:
        return 'timeout';
      case CURLE_SSL_CONNECT_ERROR:
        return 'ssl_error';
      case CURLE_SSL_CERTPROBLEM:
        return 'ssl_cert_error';
      case CURLE_SSL_CIPHER:
        return 'ssl_unsupported_cipher';
      case CURLE_SSL_CACERT:
        return 'ssl_cert_error';
      case CURLE_TOO_MANY_REDIRECTS:
        return 'too_many_redirects';
      default:
        return 'unknown';
    }
  }

  public static function parse_headers($headers) {
    $retVal = array();
    $fields = explode("\r\n", preg_replace('/\x0D\x0A[\x09\x20]+/', ' ', $headers));
    foreach($fields as $field) {
      if(preg_match('/([^:]+): (.+)/m', $field, $match)) {
        $match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
          return strtoupper($m[0]);
        }, strtolower(trim($match[1])));
        // If there's already a value set for the header name being returned, turn it into an array and add the new value
        $match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
          return strtoupper($m[0]);
        }, strtolower(trim($match[1])));
        if(isset($retVal[$match[1]])) {
          if(!is_array($retVal[$match[1]]))
            $retVal[$match[1]] = array($retVal[$match[1]]);
          $retVal[$match[1]][] = $match[2];
        } else {
          $retVal[$match[1]] = trim($match[2]);
        }
      }
    }
    return $retVal;
  }
 }
--- a/lib/HTTPStream.php
+++ b/lib/HTTPStream.php
@ -1,138 +0,0 @@
 <?php
 namespace p3k;

 class HTTPStream {

  public $timeout = 4;
  public $max_redirects = 8;

  public static function exception_error_handler($severity, $message, $file, $line) {
    if (!(error_reporting() & $severity)) {
      // This error code is not included in error_reporting
      return;
    }
    throw new \ErrorException($message, 0, $severity, $file, $line);
  }

  public function get($url, $headers=[]) {
    set_error_handler("p3k\HTTPStream::exception_error_handler");
    $context = $this->_stream_context('GET', $url, false, $headers);
    return $this->_fetch($url, $context);
  }

  public function post($url, $body, $headers=[]) {
    set_error_handler("p3k\HTTPStream::exception_error_handler");
    $context = $this->_stream_context('POST', $url, $body, $headers);
    return $this->_fetch($url, $context);
  }

  public function head($url) {
    set_error_handler("p3k\HTTPStream::exception_error_handler");
    $context = $this->_stream_context('HEAD', $url);
    return $this->_fetch($url, $context);
  }

  private function _fetch($url, $context) {
    $error = false;

    try {
      $body = file_get_contents($url, false, $context);
      // This sets $http_response_header
      // see http://php.net/manual/en/reserved.variables.httpresponseheader.php
    } catch(\Exception $e) {
      $body = false;
      $http_response_header = [];
      $description = str_replace('file_get_contents(): ', '', $e->getMessage());
      $code = 'unknown';

      if(preg_match('/getaddrinfo failed/', $description)) {
        $code = 'dns_error';
        $description = str_replace('php_network_getaddresses: ', '', $description);
      }

      if(preg_match('/timed out|request failed/', $description)) {
        $code = 'timeout';
      }

      if(preg_match('/certificate/', $description)) {
        $code = 'ssl_error';
      }

      $error = [
        'description' => $description,
        'code' => $code
      ];
    }

    return array(
      'code' => self::parse_response_code($http_response_header),
      'headers' => self::parse_headers($http_response_header),
      'body' => $body,
      'error' => $error ? $error['code'] : false,
      'error_description' => $error ? $error['description'] : false,
      'url' => $url,
    );
  }

  private function _stream_context($method, $url, $body=false, $headers=[]) {
    $options = [
      'method' => $method,
      'timeout' => $this->timeout,
      'ignore_errors' => true,
    ];

    if($body) {
      $options['content'] = $body;
    }

    if($headers) {
      $options['header'] = implode("\r\n", $headers);
    }

    // Special-case appspot.com URLs to not follow redirects.
    // https://cloud.google.com/appengine/docs/php/urlfetch/
    if(should_follow_redirects($url)) {
      $options['follow_location'] = 1;
      $options['max_redirects'] = $this->max_redirects;
    } else {
      $options['follow_location'] = 0;
    }

    return stream_context_create(['http' => $options]);
  }

  public static function parse_response_code($headers) {
    // When a response is a redirect, we want to find the last occurrence of the HTTP code
    $code = false;
    foreach($headers as $field) {
      if(preg_match('/HTTP\/\d\.\d (\d+)/', $field, $match)) {
        $code = $match[1];
      }
    }    
    return $code;
  }

  public static function parse_headers($headers) {
    $retVal = array();
    foreach($headers as $field) {
      if(preg_match('/([^:]+): (.+)/m', $field, $match)) {
        $match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
          return strtoupper($m[0]);
        }, strtolower(trim($match[1])));
        // If there's already a value set for the header name being returned, turn it into an array and add the new value
        $match[1] = preg_replace_callback('/(?<=^|[\x09\x20\x2D])./', function($m) {
          return strtoupper($m[0]);
        }, strtolower(trim($match[1])));
        if(isset($retVal[$match[1]])) {
          if(!is_array($retVal[$match[1]]))
            $retVal[$match[1]] = array($retVal[$match[1]]);
          $retVal[$match[1]][] = $match[2];
        } else {
          $retVal[$match[1]] = trim($match[2]);
        }
      }
    }
    return $retVal;
  }

 }
--- a/lib/HTTPTest.php
+++ b/lib/HTTPTest.php
@ -1,92 +0,0 @@
 <?php
 namespace p3k;

 class HTTPTest extends HTTPCurl {

  private $_testDataPath;
  private $_redirects_remaining;

  public function __construct($testDataPath) {
    $this->_testDataPath = $testDataPath;
  }

  public function get($url, $headers=[]) {
    $this->_redirects_remaining = $this->max_redirects;
    $parts = parse_url($url);
    unset($parts['fragment']);
    $url = \build_url($parts);
    return $this->_read_file($url);
  }

  public function post($url, $body, $headers=[]) {
    return $this->_read_file($url);
  }

  public function head($url) {
    $response = $this->_read_file($url);
    return array(
      'code' => $response['code'],
      'headers' => $response['headers'],
      'error' => '',
      'error_description' => '',
      'url' => $response['url']
    );
  }

  private function _read_file($url) {
    $parts = parse_url($url);
    if($parts['path']) {
      $parts['path'] = '/'.str_replace('/','_',substr($parts['path'],1));
      $url = \build_url($parts);
    }

    $filename = $this->_testDataPath.preg_replace('/https?:\/\//', '', $url);
    if(!file_exists($filename)) {
      $filename = $this->_testDataPath.'404.response.txt';
    }
    $response = file_get_contents($filename);

    $split = explode("\r\n\r\n", $response);
    if(count($split) < 2) {
      throw new \Exception("Invalid file contents in test data, check that newlines are CRLF: $url");
    }
    $headers = array_shift($split);
    $body = implode("\r\n", $split);

    if(preg_match('/HTTP\/1\.1 (\d+)/', $headers, $match)) {
      $code = $match[1];
    }

    $headers = preg_replace('/HTTP\/1\.1 \d+ .+/', '', $headers);
    $parsedHeaders = self::parse_headers($headers);

    if(array_key_exists('Location', $parsedHeaders)) {
      $effectiveUrl = \mf2\resolveUrl($url, $parsedHeaders['Location']);
      if($this->_redirects_remaining > 0) {
        $this->_redirects_remaining--;
        return $this->_read_file($effectiveUrl);
      } else {
        return [
          'code' => 0,
          'headers' => $parsedHeaders,
          'body' => $body,
          'error' => 'too_many_redirects',
          'error_description' => '',
          'url' => $effectiveUrl
        ];
      }
    } else {
      $effectiveUrl = $url;
    }

    return array(
      'code' => $code,
      'headers' => $parsedHeaders,
      'body' => $body,
      'error' => (isset($parsedHeaders['X-Test-Error']) ? $parsedHeaders['X-Test-Error'] : ''),
      'error_description' => '',
      'url' => $effectiveUrl
    );
  }

 }
--- a/lib/XRay.php
+++ b/lib/XRay.php
@ -0,0 +1,42 @@
 <?php
 namespace p3k;

 class XRay {
  public $http;

  public function __construct() {
    $this->http = new HTTP();
  }

  public function rels($url, $opts=[]) {
    $rels = new XRay\Rels($this->http);
    return $rels->parse($url, $opts);
  }

  public function parse($url, $opts_or_body=false, $opts_for_body=[]) {
    if(!$opts_or_body || is_array($opts_or_body)) {
      $fetch = new XRay\Fetcher($this->http);
      $response = $fetch->fetch($url, $opts_or_body);
      if(!empty($response['error']))
        return $response;
      $body = $response['body'];
      $url = $response['url'];
      $code = $response['code'];
      $opts = is_array($opts_or_body) ? $opts_or_body : $opts_for_body;
    } else {
      $body = $opts_or_body;
      $opts = $opts_for_body;
      $code = null;
    }
    $parser = new XRay\Parser($this->http);

    $result = $parser->parse($body, $url, $opts);
    if(!isset($opts['include_original']) || !$opts['include_original'])
      unset($result['original']);
    $result['url'] = $url;
    $result['code'] = isset($result['code']) ? $result['code'] : $code;
    return $result;
  }

 }

--- a/lib/XRay/Fetcher.php
+++ b/lib/XRay/Fetcher.php
@ -0,0 +1,169 @@
 <?php
 namespace p3k\XRay;

 class Fetcher {
  private $http;

  public function __construct($http) {
    $this->http = $http;
  }

  public function fetch($url, $opts=[]) {
    if($opts == false) $opts = [];

    if(isset($opts['timeout']))
      $this->http->set_timeout($opts['timeout']);
    if(isset($opts['max_redirects']))
      $this->http->set_max_redirects($opts['max_redirects']);

    // Attempt some basic URL validation
    $scheme = parse_url($url, PHP_URL_SCHEME);
    if(!in_array($scheme, ['http','https'])) {
      return [
        'error_code' => 400,
        'error' => 'invalid_url',
        'error_description' => 'Only http and https URLs are supported'
      ];
    }

    $host = parse_url($url, PHP_URL_HOST);
    if(!$host) {
      return [
        'error_code' => 400,
        'error' => 'invalid_url',
        'error_description' => 'The URL provided was not valid'
      ];
    }

    $url = normalize_url($url);
    $host = parse_url($url, PHP_URL_HOST);

    // Check if this is a Twitter URL and use the API
    if(Formats\Twitter::matches_host($url)) {
      return $this->_fetch_tweet($url, $opts);
    }

    // Transform the HTML GitHub URL into an GitHub API request and fetch the API response
    if(Formats\GitHub::matches_host($url)) {
      return $this->_fetch_github($url, $opts);
    }

    // All other URLs are fetched normally

    // Special-case appspot.com URLs to not follow redirects.
    // https://cloud.google.com/appengine/docs/php/urlfetch/
    if(!should_follow_redirects($url)) {
      $this->http->set_max_redirects(0);
      $this->http->set_transport(new \p3k\HTTP\Stream());
    } else {
      $this->http->set_transport(new \p3k\HTTP\Curl());
    }

    $headers = [];
    if(isset($opts['token']))
      $headers[] = 'Authorization: Bearer ' . $opts['token'];

    $result = $this->http->get($url, $headers);

    if($result['error']) {
      return [
        'error' => $result['error'],
        'error_description' => $result['error_description'],
        'url' => $result['url'],
        'code' => $result['code'],
      ];
    }

    if(trim($result['body']) == '') {
      if($result['code'] == 410) {
        // 410 Gone responses are valid and should not return an error
        return $this->respond($response, 200, [
          'data' => [
            'type' => 'unknown'
          ],
          'url' => $result['url'],
          'code' => $result['code']
        ]);
      }

      return [
        'error' => 'no_content',
        'error_description' => 'We did not get a response body when fetching the URL',
        'url' => $result['url'],
        'code' => $result['code']
      ];
    }

    // Check for HTTP 401/403
    if($result['code'] == 401) {
      return [
        'error' => 'unauthorized',
        'error_description' => 'The URL returned "HTTP 401 Unauthorized"',
        'url' => $result['url'],
        'code' => $result['code']
      ];
    }
    if($result['code'] == 403) {
      return [
        'error' => 'forbidden',
        'error_description' => 'The URL returned "HTTP 403 Forbidden"',
        'url' => $result['url'],
        'code' => $result['code']
      ];
    }

    // If the original URL had a fragment, include it in the final URL
    if(($fragment=parse_url($url, PHP_URL_FRAGMENT)) && !parse_url($result['url'], PHP_URL_FRAGMENT)) {
      $result['url'] .= '#'.$fragment;
    }

    return [
      'url' => $result['url'],
      'body' => $result['body'],
      'code' => $result['code'],
    ];
  }

  private function _fetch_tweet($url, $opts) {
    $fields = ['twitter_api_key','twitter_api_secret','twitter_access_token','twitter_access_token_secret'];
    $creds = [];
    foreach($fields as $f) {
      if(isset($opts[$f]))
        $creds[$f] = $opts[$f];
    }

    if(count($creds) < 4) {
      return [
        'error_code' => 400,
        'error' => 'missing_parameters',
        'error_description' => 'All 4 Twitter credentials must be included in the request'
      ];
    }

    $tweet = Formats\Twitter::fetch($url, $creds);
    if(!$tweet) {
      return [
        'error' => 'twitter_error',
        'error_description' => $e->getMessage()
      ];
    }

    return [
      'url' => $url,
      'body' => $tweet,
      'code' => 200,
    ];
  }

  private function _fetch_github($url, $opts) {
    $fields = ['github_access_token'];
    $creds = [];
    foreach($fields as $f) {
      if(isset($opts[$f]))
        $creds[$f] = $opts[$f];
    }

    return Formats\GitHub::fetch($this->http, $url, $creds);
  }

 }
--- a/lib/XRay/Formats/Format.php
+++ b/lib/XRay/Formats/Format.php
@ -0,0 +1,36 @@
 <?php
 namespace p3k\XRay\Formats;

 use DOMDocument, DOMXPath;

 interface iFormat {

  public static function matches_host($url);
  public static function matches($url);

 }

 abstract class Format implements iFormat {

  protected static function _unknown() {
    return [
      'data' => [
        'type' => 'unknown'
      ]
    ];
  }

  protected static function _loadHTML($html) {
    $doc = new DOMDocument();
    @$doc->loadHTML($html);

    if(!$doc) {
      return [null, null];
    }

    $xpath = new DOMXPath($doc);

    return [$doc, $xpath];
  }

 }
--- a/lib/XRay/Formats/GitHub.php
+++ b/lib/XRay/Formats/GitHub.php
@ -0,0 +1,166 @@
 <?php
 namespace p3k\XRay\Formats;

 use DateTime, DateTimeZone;
 use Config;
 use cebe\markdown\GithubMarkdown;

 class GitHub extends Format {

  public static function matches_host($url) {
    $host = parse_url($url, PHP_URL_HOST);
    return $host == 'github.com';
  }

  public static function matches($url) {
    return preg_match('~https://github.com/([^/]+)/([^/]+)/pull/(\d+)$~', $url, $match)
      || preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)$~', $url, $match)
      || preg_match('~https://github.com/([^/]+)/([^/]+)$~', $url, $match)
      || preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)#issuecomment-(\d+)~', $url, $match);
  }

  private static function extract_url_parts($url) {
    $response = false;

    if(preg_match('~https://github.com/([^/]+)/([^/]+)/pull/(\d+)$~', $url, $match)) {
      $response = [];
      $response['type'] = 'pull';
      $response['org'] = $match[1];
      $response['repo'] = $match[2];
      $response['pull'] = $match[3];
      $response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'].'/pulls/'.$response['pull'];

    } elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)$~', $url, $match)) {
      $response = [];
      $response['type'] = 'issue';
      $response['org'] = $match[1];
      $response['repo'] = $match[2];
      $response['issue'] = $match[3];
      $response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'].'/issues/'.$response['issue'];

    } elseif(preg_match('~https://github.com/([^/]+)/([^/]+)$~', $url, $match)) {
      $response = [];
      $response['type'] = 'repo';
      $response['org'] = $match[1];
      $response['repo'] = $match[2];
      $response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'];

    } elseif(preg_match('~https://github.com/([^/]+)/([^/]+)/issues/(\d+)#issuecomment-(\d+)~', $url, $match)) {
      $response = [];
      $response['type'] = 'comment';
      $response['org'] = $match[1];
      $response['repo'] = $match[2];
      $response['issue'] = $match[3];
      $response['comment'] = $match[4];
      $response['apiurl'] = 'https://api.github.com/repos/'.$response['org'].'/'.$response['repo'].'/issues/comments/'.$response['comment'];

    }

    return $response;
  }

  public static function fetch($http, $url, $creds) {
    $parts = self::extract_url_parts($url);

    if(!$parts) {
      return [
        'error' => 'unsupported_url',
        'error_description' => 'This GitHub URL is not supported',
        'error_code' => 400,
      ];
    }

    $headers = [];
    if(isset($creds['github_access_token'])) {
      $headers[] = 'Authorization: Bearer ' . $creds['github_access_token'];
    }

    $response = $http->get($parts['apiurl'], $headers);
    if($response['code'] != 200) {
      return [
        'error' => 'github_error',
        'error_description' => $response['body'],
        'code' => $response['code'],
      ];
    }

    return [
      'url' => $url,
      'body' => $response['body'],
      'code' => $response['code'],
    ];
  }

  public static function parse($json, $url) {
    $data = @json_decode($json, true);

    if(!$data)
      return self::_unknown();

    $parts = self::extract_url_parts($url);

    if(!$parts)
      return self::_unknown();

    // Start building the h-entry
    $entry = array(
      'type' => ($parts['type'] == 'repo' ? 'repo' : 'entry'),
      'url' => $url,
      'author' => [
        'type' => 'card',
        'name' => null,
        'photo' => null,
        'url' => null
      ]
    );

    if($parts['type'] == 'repo')
      $authorkey = 'owner';
    else
      $authorkey = 'user';

    $entry['author']['name'] = $data[$authorkey]['login'];
    $entry['author']['photo'] = $data[$authorkey]['avatar_url'];
    $entry['author']['url'] = $data[$authorkey]['html_url'];

    if($parts['type'] == 'pull') {
      $entry['name'] = '#' . $parts['pull'] . ' ' . $data['title'];
    } elseif($parts['type'] == 'issue') {
      $entry['name'] = '#' . $parts['issue'] . ' ' . $data['title'];
    } elseif($parts['type'] == 'repo') {
      $entry['name'] = $data['name'];
    }

    if($parts['type'] == 'repo') {
      if(!empty($data['description']))
        $entry['summary'] = $data['description'];
    }

    if($parts['type'] != 'repo' && !empty($data['body'])) {
      $parser = new GithubMarkdown();

      $entry['content'] = [
        'text' => $data['body'],
        'html' => $parser->parse($data['body'])
      ];
    }

    if($parts['type'] == 'comment') {
      $entry['in-reply-to'] = ['https://github.com/'.$parts['org'].'/'.$parts['repo'].'/issues/'.$parts['issue']];
    }

    if(!empty($data['labels'])) {
      $entry['category'] = array_map(function($l){
        return $l['name'];
      }, $data['labels']);
    }

    $entry['published'] = $data['created_at'];

    return [
      'data' => $entry,
      'original' => $json
    ];
  }

 }
--- a/lib/XRay/Formats/HTML.php
+++ b/lib/XRay/Formats/HTML.php
@ -0,0 +1,132 @@
 <?php
 namespace p3k\XRay\Formats;

 use HTMLPurifier, HTMLPurifier_Config;
 use DOMDocument, DOMXPath;
 use p3k\XRay\Formats;

 class HTML extends Format {

  public static function matches_host($url) { return true; }
  public static function matches($url) { return true; }

  public static function parse($http, $html, $url, $opts=[]) {
    $result = [
      'data' => [
        'type' => 'unknown',
      ],
      'url' => $url,
    ];

    // attempt to parse the page as HTML
    $doc = new DOMDocument();
    @$doc->loadHTML(self::toHtmlEntities($html));

    if(!$doc) {
      return [
        'error' => 'invalid_content',
        'error_description' => 'The document could not be parsed as HTML'
      ];
    }

    $xpath = new DOMXPath($doc);

    // Check for meta http equiv and replace the status code if present
    foreach($xpath->query('//meta[translate(@http-equiv,\'STATUS\',\'status\')=\'status\']') as $el) {
      $equivStatus = ''.$el->getAttribute('content');
      if($equivStatus && is_string($equivStatus)) {
        if(preg_match('/^(\d+)/', $equivStatus, $match)) {
          $result['code'] = (int)$match[1];
        }
      }
    }

    // If a target parameter was provided, make sure a link to it exists on the page
    if(isset($opts['target'])) {
      $target = $opts['target'];

      $found = [];
      if($target) {
        self::xPathFindNodeWithAttribute($xpath, 'a', 'href', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
        self::xPathFindNodeWithAttribute($xpath, 'img', 'src', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
        self::xPathFindNodeWithAttribute($xpath, 'video', 'src', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
        self::xPathFindNodeWithAttribute($xpath, 'audio', 'src', function($u) use($target, &$found){
          if($u == $target) {
            $found[$u] = null;
          }
        });
      }

      if(!$found) {
        return [
          'error' => 'no_link_found',
          'error_description' => 'The source document does not have a link to the target URL',
          'code' => isset($result['code']) ? $result['code'] : 200,
          'url' => $url
        ];
      }
    }

    // If the URL has a fragment ID, find the DOM starting at that node and parse it instead
    $fragment = parse_url($url, PHP_URL_FRAGMENT);
    if($fragment) {
      $fragElement = self::xPathGetElementById($xpath, $fragment);
      if($fragElement) {
        $html = $doc->saveHTML($fragElement);
        $foundFragment = true;
      } else {
        $foundFragment = false;
      }
    }

    // Now start pulling in the data from the page. Start by looking for microformats2
    $mf2 = \mf2\Parse($html, $url);

    if($mf2 && count($mf2['items']) > 0) {
      $data = Formats\Mf2::parse($mf2, $url, $http);
      $result = array_merge($result, $data);
      if($data) {
        if($fragment) {
          $result['info'] = [
            'found_fragment' => $foundFragment
          ];
        }
        $result['original'] = $html;
        $result['url'] = $url; // this will be the effective URL after following redirects
      }
    }
    return $result;
  }

  private static function toHtmlEntities($input) {
    return mb_convert_encoding($input, 'HTML-ENTITIES', mb_detect_encoding($input));
  }

  private static function xPathFindNodeWithAttribute($xpath, $node, $attr, $callback) {
    foreach($xpath->query('//'.$node.'[@'.$attr.']') as $el) {
      $v = $el->getAttribute($attr);
      $callback($v);
    }
  }

  private static function xPathGetElementById($xpath, $id) {
    $element = null;
    foreach($xpath->query("//*[@id='$id']") as $el) {
      $element = $el;
    }
    return $element;
  }

 }
--- a/lib/XRay/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php
+++ b/lib/XRay/Formats/HTMLPurifier_AttrDef_HTML_Microformats2.php
@ -1,5 +1,5 @@
 <?php
 namespace XRay\Formats;
 namespace p3k\XRay\Formats;

 /**
 * Allows Microformats2 classes but rejects any others
--- a/lib/XRay/Formats/Instagram.php
+++ b/lib/XRay/Formats/Instagram.php
@ -1,18 +1,26 @@
 <?php
 namespace XRay\Formats;
 namespace p3k\XRay\Formats;

 use DOMDocument, DOMXPath;
 use DateTime, DateTimeZone;
 use Parse;

 class Instagram {
 class Instagram extends Format {

  public static function parse($html, $url, $http) {
  public static function matches_host($url) {
    $host = parse_url($url, PHP_URL_HOST);
    return in_array($host, ['www.instagram.com','instagram.com']);
  }

  public static function matches($url) {
    return self::matches_host($url);
  }

  public static function parse($http, $html, $url) {

    $photoData = self::_extractPhotoDataFromPhotoPage($html);

    if(!$photoData)
      return false;
      return self::_unknown();

    // Start building the h-entry
    $entry = array(
@ -131,19 +139,18 @@ class Instagram {

    $entry['published'] = $published->format('c');

    $response = [
      'data' => $entry
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $entry['refs'] = $refs;
    }

    return [$response, [
      'photo' => $photoData,
      'profiles' => $profiles,
      'locations' => $locations
    ]];
    return [
      'data' => $entry,
      'original' => json_encode([
        'photo' => $photoData,
        'profiles' => $profiles,
        'locations' => $locations
      ])
    ];
  }

  private static function _buildHCardFromInstagramProfile($profile) {
--- a/lib/XRay/Formats/Mf2.php
+++ b/lib/XRay/Formats/Mf2.php
@ -1,8 +1,7 @@
 <?php
 namespace XRay\Formats;
 namespace p3k\XRay\Formats;

 use HTMLPurifier, HTMLPurifier_Config;
 use Parse;

 class Mf2 {

@ -14,31 +13,31 @@ class Mf2 {
    if(count($mf2['items']) == 1) {
      $item = $mf2['items'][0];
      if(in_array('h-entry', $item['type']) || in_array('h-cite', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-entry it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-entry it is the only item on the page");
        return self::parseAsHEntry($mf2, $item, $http);
      }
      if(in_array('h-event', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-event it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-event it is the only item on the page");
        return self::parseAsHEvent($mf2, $item, $http);
      }
      if(in_array('h-review', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-review it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-review it is the only item on the page");
        return self::parseAsHReview($mf2, $item, $http);
      }
      if(in_array('h-recipe', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-recipe it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-recipe it is the only item on the page");
        return self::parseAsHRecipe($mf2, $item, $http);
      }
      if(in_array('h-product', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-product it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-product it is the only item on the page");
        return self::parseAsHProduct($mf2, $item, $http);
      }
      if(in_array('h-feed', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-feed because it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-feed because it is the only item on the page");
        return self::parseAsHFeed($mf2, $http);
      }
      if(in_array('h-card', $item['type'])) {
        Parse::debug("mf2:0: Recognized $url as an h-card it is the only item on the page");
        #Parse::debug("mf2:0: Recognized $url as an h-card it is the only item on the page");
        return self::parseAsHCard($item, $http, $url);
      }
    }
@ -48,9 +47,9 @@ class Mf2 {
    foreach($mf2['items'] as $item) {
      if(array_key_exists('url', $item['properties'])) {
        $urls = $item['properties']['url'];
        $urls = array_map('self::normalize_url', $urls);
        $urls = array_map('\p3k\XRay\normalize_url', $urls);
        if(in_array($url, $urls)) {
          Parse::debug("mf2:1: Recognized $url as a permalink because an object on the page matched the URL of the request");
          #Parse::debug("mf2:1: Recognized $url as a permalink because an object on the page matched the URL of the request");
          if(in_array('h-card', $item['type'])) {
            return self::parseAsHCard($item, $http, $url);
          } elseif(in_array('h-entry', $item['type']) || in_array('h-cite', $item['type'])) {
@ -64,7 +63,7 @@ class Mf2 {
          } elseif(in_array('h-product', $item['type'])) {
            return self::parseAsHProduct($mf2, $item, $http);
          } else {
            Parse::debug('This object was not a recognized type.');
            #Parse::debug('This object was not a recognized type.');
            return false;
          }
        }
@ -77,7 +76,7 @@ class Mf2 {
      foreach($mf2['items'] as $card) {
        if(in_array('h-card', $card['type']) && array_key_exists('url', $card['properties'])) {
          $urls = $card['properties']['url'];
          $urls = array_map('self::normalize_url', $urls);
          $urls = array_map('\p3k\XRay\normalize_url', $urls);
          if(count(array_intersect($urls, $mf2['rels']['author'])) > 0) {
            // There is an author h-card on this page
            // Now look for the first h-* object other than an h-card and use that as the object
@ -106,7 +105,7 @@ class Mf2 {
      if(count(array_filter($mf2['items'], function($item){
        return in_array('h-entry', $item['type']);
      })) > 1) {
        Parse::debug("mf2:2: Recognized $url as an h-feed because there are more than one object on the page");
        #Parse::debug("mf2:2: Recognized $url as an h-feed because there are more than one object on the page");
        return self::parseAsHFeed($mf2, $http);
      }
    }
@ -114,7 +113,7 @@ class Mf2 {
    // If the first item is an h-feed, parse as a feed
    $first = $mf2['items'][0];
    if(in_array('h-feed', $first['type'])) {
      Parse::debug("mf2:3: Recognized $url as an h-feed because the first item is an h-feed");
      #Parse::debug("mf2:3: Recognized $url as an h-feed because the first item is an h-feed");
      return self::parseAsHFeed($mf2, $http);
    }

@ -122,24 +121,24 @@ class Mf2 {
    foreach($mf2['items'] as $item) {
      // Otherwise check for a recognized h-entr* object
      if(in_array('h-entry', $item['type']) || in_array('h-cite', $item['type'])) {
        Parse::debug("mf2:6: $url is falling back to the first h-entry on the page");
        #Parse::debug("mf2:6: $url is falling back to the first h-entry on the page");
        return self::parseAsHEntry($mf2, $item, $http);
      } elseif(in_array('h-event', $item['type'])) {
        Parse::debug("mf2:6: $url is falling back to the first h-event on the page");
        #Parse::debug("mf2:6: $url is falling back to the first h-event on the page");
        return self::parseAsHEvent($mf2, $item, $http);
      } elseif(in_array('h-review', $item['type'])) {
        Parse::debug("mf2:6: $url is falling back to the first h-review on the page");
        #Parse::debug("mf2:6: $url is falling back to the first h-review on the page");
        return self::parseAsHReview($mf2, $item, $http);
      } elseif(in_array('h-recipe', $item['type'])) {
        Parse::debug("mf2:6: $url is falling back to the first h-recipe on the page");
        #Parse::debug("mf2:6: $url is falling back to the first h-recipe on the page");
        return self::parseAsHReview($mf2, $item, $http);
      } elseif(in_array('h-product', $item['type'])) {
        Parse::debug("mf2:6: $url is falling back to the first h-product on the page");
        #Parse::debug("mf2:6: $url is falling back to the first h-product on the page");
        return self::parseAsHProduct($mf2, $item, $http);
      }
    }

    Parse::debug("mf2:E: No object at $url was recognized");
    #Parse::debug("mf2:E: No object at $url was recognized");

    return false;
  }
@ -311,7 +310,7 @@ class Mf2 {
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $response['data']['refs'] = $refs;
    }

    return $response;
@ -345,7 +344,7 @@ class Mf2 {
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $response['data']['refs'] = $refs;
    }

    return $response;
@ -376,7 +375,7 @@ class Mf2 {
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $response['data']['refs'] = $refs;
    }

    return $response;
@ -403,7 +402,7 @@ class Mf2 {
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $response['data']['refs'] = $refs;
    }

    return $response;
@ -457,7 +456,7 @@ class Mf2 {
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $response['data']['refs'] = $refs;
    }

    return $response;
@ -496,7 +495,7 @@ class Mf2 {
        $found = false;
        foreach($item['properties']['url'] as $url) {
          if(self::isURL($url)) {
            $url = self::normalize_url($url);
            $url = \p3k\XRay\normalize_url($url);
            if($url == $authorURL) {
              $data['url'] = $url;
              $found = true;
@ -723,25 +722,4 @@ class Mf2 {
    }
    return \mf2\Parse($result['body'], $url);
  }

  private static function normalize_url($url) {
    $parts = parse_url($url);
    if(empty($parts['path']))
      $parts['path'] = '/';
    $parts['host'] = strtolower($parts['host']);
    return self::build_url($parts);
  }

  private static function build_url($parsed_url) {
    $scheme   = isset($parsed_url['scheme']) ? $parsed_url['scheme'] . '://' : '';
    $host     = isset($parsed_url['host']) ? $parsed_url['host'] : '';
    $port     = isset($parsed_url['port']) ? ':' . $parsed_url['port'] : '';
    $user     = isset($parsed_url['user']) ? $parsed_url['user'] : '';
    $pass     = isset($parsed_url['pass']) ? ':' . $parsed_url['pass']  : '';
    $pass     = ($user || $pass) ? "$pass@" : '';
    $path     = isset($parsed_url['path']) ? $parsed_url['path'] : '';
    $query    = isset($parsed_url['query']) ? '?' . $parsed_url['query'] : '';
    $fragment = isset($parsed_url['fragment']) ? '#' . $parsed_url['fragment'] : '';
    return "$scheme$user$pass$host$port$path$query$fragment";
  }
 }
--- a/lib/XRay/Formats/Twitter.php
+++ b/lib/XRay/Formats/Twitter.php
@ -1,34 +1,54 @@
 <?php
 namespace XRay\Formats;
 namespace p3k\XRay\Formats;

 use DateTime, DateTimeZone;
 use Parse;

 class Twitter {
 class Twitter extends Format {

  public static function parse($url, $tweet_id, $creds, $json=null) {
  public static function matches_host($url) {
    $host = parse_url($url, PHP_URL_HOST);
    return in_array($host, ['mobile.twitter.com','twitter.com','www.twitter.com','twtr.io']);
  }

  public static function matches($url) {
    if(preg_match('/https?:\/\/(?:mobile\.twitter\.com|twitter\.com|twtr\.io)\/(?:[a-z0-9_\/!#]+statuse?s?\/([0-9]+)|([a-zA-Z0-9_]+))/i', $url, $match))
      return $match;
    else
      return false;
  }

  public static function fetch($url, $creds) {
    if(!($match = self::matches($url))) {
      return false;
    }

    $tweet_id = $match[1];

    $host = parse_url($url, PHP_URL_HOST);
    if($host == 'twtr.io') {
      $tweet_id = self::b60to10($tweet_id);
    }

    if($json) {
      if(is_string($json))
        $tweet = json_decode($json);
      else
        $tweet = $json;
    } else {
      $twitter = new \Twitter($creds['twitter_api_key'], $creds['twitter_api_secret'], $creds['twitter_access_token'], $creds['twitter_access_token_secret']);
      try { 
        $tweet = $twitter->request('statuses/show/'.$tweet_id, 'GET', ['tweet_mode'=>'extended']);
      } catch(\TwitterException $e) {
        return [false, false];
      }
    $twitter = new \Twitter($creds['twitter_api_key'], $creds['twitter_api_secret'], $creds['twitter_access_token'], $creds['twitter_access_token_secret']);
    try { 
      $tweet = $twitter->request('statuses/show/'.$tweet_id, 'GET', ['tweet_mode'=>'extended']);
    } catch(\TwitterException $e) {
      return false;
    }

    if(!$tweet)
      return [false, false];
    return $tweet;
  }

  public static function parse($json, $url) {

    if(is_string($json))
      $tweet = json_decode($json);
    else
      $tweet = $json;

    if(!$tweet) {
      return self::_unknown();
    }

    $entry = array(
      'type' => 'entry',
@ -56,9 +76,9 @@ class Twitter {
      $repostOf = 'https://twitter.com/' . $reposted->user->screen_name . '/status/' . $reposted->id_str;
      $entry['repost-of'] = $repostOf;

      list($repostedEntry) = self::parse($repostOf, $reposted->id_str, null, $reposted);
      if(isset($repostedEntry['refs'])) {
        foreach($repostedEntry['refs'] as $k=>$v) {
      $repostedEntry = self::parse($reposted, $repostOf);
      if(isset($repostedEntry['data']['refs'])) {
        foreach($repostedEntry['data']['refs'] as $k=>$v) {
          $refs[$k] = $v;
        }
      }
@ -141,28 +161,27 @@ class Twitter {
    // Quoted Status
    if(property_exists($tweet, 'quoted_status')) {
      $quoteOf = 'https://twitter.com/' . $tweet->quoted_status->user->screen_name . '/status/' . $tweet->quoted_status_id_str;
      list($quoted) = self::parse($quoteOf, $tweet->quoted_status_id_str, null, $tweet->quoted_status);
      if(isset($quoted['refs'])) {
        foreach($quoted['refs'] as $k=>$v) {
      $quotedEntry = self::parse($tweet->quoted_status, $quoteOf);
      if(isset($quotedEntry['data']['refs'])) {
        foreach($quotedEntry['data']['refs'] as $k=>$v) {
          $refs[$k] = $v;
        }
      }
      $refs[$quoteOf] = $quoted['data'];
      $refs[$quoteOf] = $quotedEntry['data'];
    }

    if($author = self::_buildHCardFromTwitterProfile($tweet->user)) {
      $entry['author'] = $author;
    }

    $response = [
      'data' => $entry
    ];

    if(count($refs)) {
      $response['refs'] = $refs;
      $entry['refs'] = $refs;
    }

    return [$response, $tweet];
    return [
      'data' => $entry,
      'original' => $tweet,
    ];
  }

  private static function _buildHCardFromTwitterProfile($profile) {
--- a/lib/XRay/Formats/XKCD.php
+++ b/lib/XRay/Formats/XKCD.php
@ -1,11 +1,19 @@
 <?php
 namespace XRay\Formats;
 namespace p3k\XRay\Formats;

 use DOMDocument, DOMXPath;
 use DateTime, DateTimeZone;
 use Parse, Config;
 use Config;

 class XKCD {
 class XKCD extends Format {

  public static function matches_host($url) {
    $host = parse_url($url, PHP_URL_HOST);
    return $host == 'xkcd.com';
  }

  public static function matches($url) {
    return self::matches_host($url) && parse_url($url, PHP_URL_PATH) != '/';
  }

  public static function parse($html, $url) {
    list($doc, $xpath) = self::_loadHTML($html);
@ -56,25 +64,4 @@ class XKCD {
    return $response;
  }

  private static function _unknown() {
    return [
      'data' => [
        'type' => 'unknown'
      ]
    ];
  }

  private static function _loadHTML($html) {
    $doc = new DOMDocument();
    @$doc->loadHTML($html);

    if(!$doc) {
      return [null, null];
    }

    $xpath = new DOMXPath($doc);

    return [$doc, $xpath];
  }

 }
--- a/lib/XRay/Parser.php
+++ b/lib/XRay/Parser.php
@ -0,0 +1,41 @@
 <?php
 namespace p3k\XRay;

 use p3k\XRay\Formats;

 class Parser {
  private $http;

  public function __construct($http) {
    $this->http = $http;
  }

  public function parse($body, $url, $opts=[]) {
    if(isset($opts['timeout']))
      $this->http->set_timeout($opts['timeout']);
    if(isset($opts['max_redirects']))
      $this->http->set_max_redirects($opts['max_redirects']);

    // Check if the URL matches a special parser

    if(Formats\Instagram::matches($url)) {
      return Formats\Instagram::parse($this->http, $body, $url);
    }

    if(Formats\GitHub::matches($url)) {
      return Formats\GitHub::parse($body, $url);
    }

    if(Formats\Twitter::matches($url)) {
      return Formats\Twitter::parse($body, $url);
    }

    if(Formats\XKCD::matches($url)) {
      return Formats\XKCD::parse($body, $url);
    }

    // No special parsers matched, parse for Microformats now
    return Formats\HTML::parse($this->http, $body, $url, $opts);
  }

 }
--- a/lib/XRay/Rels.php
+++ b/lib/XRay/Rels.php
@ -0,0 +1,63 @@
 <?php
 namespace p3k\XRay;

 class Rels {
  private $http;

  public function __construct($http) {
    $this->http = $http;
  }

  public function parse($url, $opts=[]) {
    if(isset($opts['timeout']))
      $this->http->set_timeout($opts['timeout']);
    if(isset($opts['max_redirects']))
      $this->http->set_max_redirects($opts['max_redirects']);

    $scheme = parse_url($url, PHP_URL_SCHEME);
    if(!in_array($scheme, ['http','https'])) {
      return [
        'error' => 'invalid_url',
        'error_description' => 'Only http and https URLs are supported'
      ];
    }

    $host = parse_url($url, PHP_URL_HOST);
    if(!$host) {
      return [
        'error' => 'invalid_url',
        'error_description' => 'The URL provided was not valid'
      ];
    }

    $url = normalize_url($url);

    $result = $this->http->get($url);

    $html = $result['body'];
    $mf2 = \mf2\Parse($html, $result['url']);

    $rels = $result['rels'];
    if(isset($mf2['rels'])) {
      $rels = array_merge($rels, $mf2['rels']);
    }

    // Resolve all relative URLs
    foreach($rels as $rel=>$values) {
      foreach($values as $i=>$value) {
        $value = \mf2\resolveUrl($result['url'], $value);
        $rels[$rel][$i] = $value;
      }
    }

    if(count($rels) == 0)
      $rels = new \StdClass;

    return [
      'url' => $result['url'],
      'code' => $result['code'],
      'rels' => $rels
    ];
  }

 }
--- a/lib/helpers.php
+++ b/lib/helpers.php
@ -1,4 +1,5 @@
 <?php
 namespace p3k\XRay;

 function view($template, $data=[]) {
  global $templates;
@ -34,4 +35,4 @@ function should_follow_redirects($url) {
  } else {
    return true;
  }
 }
 }
--- a/public/images/xkcd.png
+++ b/public/images/xkcd.png
--- a/tests/AuthorTest.php
+++ b/tests/AuthorTest.php
@ -8,7 +8,7 @@ class AuthorTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

--- a/tests/FeedTest.php
+++ b/tests/FeedTest.php
@ -8,7 +8,7 @@ class FeedTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

--- a/tests/FetchTest.php
+++ b/tests/FetchTest.php
@ -8,7 +8,7 @@ class FetchTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

--- a/tests/GitHubTest.php
+++ b/tests/GitHubTest.php
@ -8,7 +8,7 @@ class GitHubTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

--- a/tests/HelpersTest.php
+++ b/tests/HelpersTest.php
@ -3,14 +3,20 @@ class HelpersTest extends PHPUnit_Framework_TestCase {

  public function testLowercaseHostname() {
    $url = 'http://Example.com/';
    $result = normalize_url($url);
    $result = p3k\XRay\normalize_url($url);
    $this->assertEquals('http://example.com/', $result);
  }

  public function testAddsSlashToBareDomain() {
    $url = 'http://example.com';
    $result = normalize_url($url);
    $result = p3k\XRay\normalize_url($url);
    $this->assertEquals('http://example.com/', $result);
  }

  public function testDoesNotModify() {
    $url = 'https://example.com/';
    $result = p3k\XRay\normalize_url($url);
    $this->assertEquals('https://example.com/', $result);
  }

 }
--- a/tests/InstagramTest.php
+++ b/tests/InstagramTest.php
@ -8,7 +8,7 @@ class InstagramTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

@ -71,8 +71,8 @@ class InstagramTest extends PHPUnit_Framework_TestCase {

    $this->assertEquals(2, count($data['data']['category']));
    $this->assertContains('http://tinyletter.com/kmikeym', $data['data']['category']);
    $this->assertArrayHasKey('http://tinyletter.com/kmikeym', $data['refs']);
    $this->assertEquals(['type'=>'card','name'=>'Mike Merrill','url'=>'http://tinyletter.com/kmikeym','photo'=>'https://instagram.fsjc1-3.fna.fbcdn.net/t51.2885-19/s320x320/12627953_686238411518831_1544976311_a.jpg'], $data['refs']['http://tinyletter.com/kmikeym']);
    $this->assertArrayHasKey('http://tinyletter.com/kmikeym', $data['data']['refs']);
    $this->assertEquals(['type'=>'card','name'=>'Mike Merrill','url'=>'http://tinyletter.com/kmikeym','photo'=>'https://instagram.fsjc1-3.fna.fbcdn.net/t51.2885-19/s320x320/12627953_686238411518831_1544976311_a.jpg'], $data['data']['refs']['http://tinyletter.com/kmikeym']);
  }

  public function testInstagramPhotoWithVenue() {
@ -86,8 +86,8 @@ class InstagramTest extends PHPUnit_Framework_TestCase {

    $this->assertEquals(1, count($data['data']['location']));
    $this->assertContains('https://www.instagram.com/explore/locations/109284789535230/', $data['data']['location']);
    $this->assertArrayHasKey('https://www.instagram.com/explore/locations/109284789535230/', $data['refs']);
    $venue = $data['refs']['https://www.instagram.com/explore/locations/109284789535230/'];
    $this->assertArrayHasKey('https://www.instagram.com/explore/locations/109284789535230/', $data['data']['refs']);
    $venue = $data['data']['refs']['https://www.instagram.com/explore/locations/109284789535230/'];
    $this->assertEquals('XOXO Outpost', $venue['name']);
    $this->assertEquals('45.5261002', $venue['latitude']);
    $this->assertEquals('-122.6558081', $venue['longitude']);
--- a/tests/ParseTest.php
+++ b/tests/ParseTest.php
@ -8,7 +8,7 @@ class ParseTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

@ -205,9 +205,9 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $data = json_decode($body, true);
    $this->assertEquals('entry', $data['data']['type']);    
    $this->assertEquals('http://example.com/100', $data['data']['in-reply-to'][0]);
    $this->assertArrayHasKey('http://example.com/100', $data['refs']);
    $this->assertEquals('Example Post', $data['refs']['http://example.com/100']['name']);
    $this->assertEquals('http://example.com/100', $data['refs']['http://example.com/100']['url']);
    $this->assertArrayHasKey('http://example.com/100', $data['data']['refs']);
    $this->assertEquals('Example Post', $data['data']['refs']['http://example.com/100']['name']);
    $this->assertEquals('http://example.com/100', $data['data']['refs']['http://example.com/100']['url']);
  }

  public function testPersonTagIsURL() {
@ -230,10 +230,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $data = json_decode($body, true);
    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('http://alice.example.com/', $data['data']['category'][0]);
    $this->assertArrayHasKey('http://alice.example.com/', $data['refs']);
    $this->assertEquals('card', $data['refs']['http://alice.example.com/']['type']);
    $this->assertEquals('http://alice.example.com/', $data['refs']['http://alice.example.com/']['url']);
    $this->assertEquals('Alice', $data['refs']['http://alice.example.com/']['name']);
    $this->assertArrayHasKey('http://alice.example.com/', $data['data']['refs']);
    $this->assertEquals('card', $data['data']['refs']['http://alice.example.com/']['type']);
    $this->assertEquals('http://alice.example.com/', $data['data']['refs']['http://alice.example.com/']['url']);
    $this->assertEquals('Alice', $data['data']['refs']['http://alice.example.com/']['name']);
  }

  public function testSyndicationIsURL() {
@ -372,10 +372,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $this->assertEquals($url, $data['data']['url']);
    $this->assertEquals('2016-02-09T18:30', $data['data']['start']);
    $this->assertEquals('2016-02-09T19:30', $data['data']['end']);
    $this->assertArrayHasKey('http://source.example.com/venue', $data['refs']);
    $this->assertEquals('card', $data['refs']['http://source.example.com/venue']['type']);
    $this->assertEquals('http://source.example.com/venue', $data['refs']['http://source.example.com/venue']['url']);
    $this->assertEquals('Venue', $data['refs']['http://source.example.com/venue']['name']);
    $this->assertArrayHasKey('http://source.example.com/venue', $data['data']['refs']);
    $this->assertEquals('card', $data['data']['refs']['http://source.example.com/venue']['type']);
    $this->assertEquals('http://source.example.com/venue', $data['data']['refs']['http://source.example.com/venue']['url']);
    $this->assertEquals('Venue', $data['data']['refs']['http://source.example.com/venue']['name']);
  }

  public function testMf2ReviewOfProduct() {
@ -395,10 +395,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $this->assertContains('red', $data['data']['category']);
    $this->assertContains('blue', $data['data']['category']);
    $this->assertContains('http://product.example.com/', $data['data']['item']);
    $this->assertArrayHasKey('http://product.example.com/', $data['refs']);
    $this->assertEquals('product', $data['refs']['http://product.example.com/']['type']);
    $this->assertEquals('The Reviewed Product', $data['refs']['http://product.example.com/']['name']);
    $this->assertEquals('http://product.example.com/', $data['refs']['http://product.example.com/']['url']);
    $this->assertArrayHasKey('http://product.example.com/', $data['data']['refs']);
    $this->assertEquals('product', $data['data']['refs']['http://product.example.com/']['type']);
    $this->assertEquals('The Reviewed Product', $data['data']['refs']['http://product.example.com/']['name']);
    $this->assertEquals('http://product.example.com/', $data['data']['refs']['http://product.example.com/']['url']);
  }

  public function testMf2ReviewOfHCard() {
@ -416,10 +416,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $this->assertEquals('5', $data['data']['best']);
    $this->assertEquals('This is the full text of the review', $data['data']['content']['text']);
    $this->assertContains('http://business.example.com/', $data['data']['item']);
    $this->assertArrayHasKey('http://business.example.com/', $data['refs']);
    $this->assertEquals('card', $data['refs']['http://business.example.com/']['type']);
    $this->assertEquals('The Reviewed Business', $data['refs']['http://business.example.com/']['name']);
    $this->assertEquals('http://business.example.com/', $data['refs']['http://business.example.com/']['url']);
    $this->assertArrayHasKey('http://business.example.com/', $data['data']['refs']);
    $this->assertEquals('card', $data['data']['refs']['http://business.example.com/']['type']);
    $this->assertEquals('The Reviewed Business', $data['data']['refs']['http://business.example.com/']['name']);
    $this->assertEquals('http://business.example.com/', $data['data']['refs']['http://business.example.com/']['url']);
  }

  public function testMf1Review() {
@ -438,10 +438,10 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $this->assertEquals('5', $data['data']['best']);
    $this->assertEquals('This is the full text of the review', $data['data']['content']['text']);
    // $this->assertContains('http://product.example.com/', $data['data']['item']);
    // $this->assertArrayHasKey('http://product.example.com/', $data['refs']);
    // $this->assertEquals('product', $data['refs']['http://product.example.com/']['type']);
    // $this->assertEquals('The Reviewed Product', $data['refs']['http://product.example.com/']['name']);
    // $this->assertEquals('http://product.example.com/', $data['refs']['http://product.example.com/']['url']);
    // $this->assertArrayHasKey('http://product.example.com/', $data['data']['refs']);
    // $this->assertEquals('product', $data['data']['refs']['http://product.example.com/']['type']);
    // $this->assertEquals('The Reviewed Product', $data['data']['refs']['http://product.example.com/']['name']);
    // $this->assertEquals('http://product.example.com/', $data['data']['refs']['http://product.example.com/']['url']);

  }

@ -473,8 +473,8 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('https://www.facebook.com/555707837940351#tantek', $data['data']['url']);
    $this->assertContains('https://www.facebook.com/tantek.celik', $data['data']['invitee']);
    $this->assertArrayHasKey('https://www.facebook.com/tantek.celik', $data['refs']);
    $this->assertEquals('Tantek Çelik', $data['refs']['https://www.facebook.com/tantek.celik']['name']);
    $this->assertArrayHasKey('https://www.facebook.com/tantek.celik', $data['data']['refs']);
    $this->assertEquals('Tantek Çelik', $data['data']['refs']['https://www.facebook.com/tantek.celik']['name']);
  }

  public function testEntryAtFragmentID() {
@ -485,6 +485,7 @@ class ParseTest extends PHPUnit_Framework_TestCase {
    $this->assertEquals(200, $response->getStatusCode());
    $data = json_decode($body, true);
    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Comment text', $data['data']['content']['text']);
    $this->assertEquals('http://source.example.com/fragment-id#comment-1000', $data['data']['url']);
    $this->assertTrue($data['info']['found_fragment']);
  }
--- a/tests/SanitizeTest.php
+++ b/tests/SanitizeTest.php
@ -8,7 +8,7 @@ class SanitizeTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Parse();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
    $this->client->mc = null;
  }

--- a/tests/TokenTest.php
+++ b/tests/TokenTest.php
@ -8,7 +8,7 @@ class TokenTest extends PHPUnit_Framework_TestCase {

  public function setUp() {
    $this->client = new Token();
    $this->client->http = new p3k\HTTPTest(dirname(__FILE__).'/data/');
    $this->client->http = new p3k\HTTP\Test(dirname(__FILE__).'/data/');
  }

  private function token($params) {
--- a/tests/TwitterTest.php
+++ b/tests/TwitterTest.php
@ -29,7 +29,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testBasicProfileInfo() {
    list($url, $json) = $this->loadTweet('818912506496229376');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('aaronpk dev', $data['data']['author']['name']);
@ -43,7 +43,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testProfileWithNonExpandedURL() {
    list($url, $json) = $this->loadTweet('791704641046052864');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('http://agiletortoise.com', $data['data']['author']['url']);
  }
@ -51,9 +51,9 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testBasicTestStuff() {
    list($url, $json) = $this->loadTweet('818913630569664512');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals(200, $data['code']);
    $this->assertEquals(null, $data['code']); // no code is expected if we pass in the body
    $this->assertEquals('https://twitter.com/pkdev/status/818913630569664512', $data['url']);
    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('A tweet with a URL https://indieweb.org/ #and #some #hashtags', $data['data']['content']['text']);
@ -67,14 +67,14 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testPositiveTimezone() {
    list($url, $json) = $this->loadTweet('719914707566649344');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);
    $this->assertEquals("2016-04-12T16:46:56+01:00", $data['data']['published']);
  }

  public function testTweetWithEmoji() {
    list($url, $json) = $this->loadTweet('818943244553699328');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Here 🎉 have an emoji', $data['data']['content']['text']);
@ -83,7 +83,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testHTMLEscaping() {
    list($url, $json) = $this->loadTweet('818928092383166465');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Double escaping &amp; & amp', $data['data']['content']['text']);
@ -92,7 +92,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testTweetWithPhoto() {
    list($url, $json) = $this->loadTweet('818912506496229376');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Tweet with a photo and a location', $data['data']['content']['text']);
@ -102,7 +102,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testTweetWithTwoPhotos() {
    list($url, $json) = $this->loadTweet('818935308813103104');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Two photos', $data['data']['content']['text']);
@ -113,7 +113,7 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testTweetWithVideo() {
    list($url, $json) = $this->loadTweet('818913178260160512');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Tweet with a video', $data['data']['content']['text']);
@ -123,12 +123,12 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testTweetWithLocation() {
    list($url, $json) = $this->loadTweet('818912506496229376');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Tweet with a photo and a location', $data['data']['content']['text']);
    $this->assertEquals('https://api.twitter.com/1.1/geo/id/ac88a4f17a51c7fc.json', $data['data']['location']);
    $location = $data['refs']['https://api.twitter.com/1.1/geo/id/ac88a4f17a51c7fc.json'];
    $location = $data['data']['refs']['https://api.twitter.com/1.1/geo/id/ac88a4f17a51c7fc.json'];
    $this->assertEquals('adr', $location['type']);
    $this->assertEquals('Portland', $location['locality']);
    $this->assertEquals('United States', $location['country-name']);
@ -138,38 +138,38 @@ class TwitterTest extends PHPUnit_Framework_TestCase {
  public function testRetweet() {
    list($url, $json) = $this->loadTweet('818913351623245824');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertArrayNotHasKey('content', $data['data']);
    $repostOf = 'https://twitter.com/aaronpk/status/817414679131660288';
    $this->assertEquals($repostOf, $data['data']['repost-of']);
    $tweet = $data['refs'][$repostOf];
    $tweet = $data['data']['refs'][$repostOf];
    $this->assertEquals('Yeah that\'s me http://xkcd.com/1782/', $tweet['content']['text']);
  }

  public function testRetweetWithPhoto() {
    list($url, $json) = $this->loadTweet('820039442773798912');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertArrayNotHasKey('content', $data['data']);
    $this->assertArrayNotHasKey('photo', $data['data']);
    $repostOf = 'https://twitter.com/phlaimeaux/status/819943954724556800';
    $this->assertEquals($repostOf, $data['data']['repost-of']);
    $tweet = $data['refs'][$repostOf];
    $tweet = $data['data']['refs'][$repostOf];
    $this->assertEquals('this headline is such a rollercoaster', $tweet['content']['text']);
  }

  public function testQuotedTweet() {
    list($url, $json) = $this->loadTweet('818913488609251331');

    $data = $this->parse(['url' => $url, 'json' => $json]);
    $data = $this->parse(['url' => $url, 'body' => $json]);

    $this->assertEquals('entry', $data['data']['type']);
    $this->assertEquals('Quoted tweet with a #hashtag https://twitter.com/aaronpk/status/817414679131660288', $data['data']['content']['text']);
    $tweet = $data['refs']['https://twitter.com/aaronpk/status/817414679131660288'];
    $tweet = $data['data']['refs']['https://twitter.com/aaronpk/status/817414679131660288'];
    $this->assertEquals('Yeah that\'s me http://xkcd.com/1782/', $tweet['content']['text']);
  }