Aaron Parecki
f8e9a87667
parse github issues and comments
closes #20
7 years ago
Aaron Parecki
5f63ed7944
updates for instagram scraping
7 years ago
Aaron Parecki
63ab3031a3
parse XKCD comics
skip image alt text for now
closes #34
7 years ago
Aaron Parecki
5f5392a7b8
deduplicate categories, and strip leading hashtags
7 years ago
Aaron Parecki
a1234f61e3
recognize h-card if it's the only object
closes #36
7 years ago
Aaron Parecki
4a4bc73f5e
don't include the RT'd photo or video in the main entry
they are part of the reposted object instead.
closes #27
8 years ago
Aaron Parecki
5e60e13b5a
add h-recipe
closes #24
8 years ago
Aaron Parecki
5d8fb4e13c
support h-review and h-product vocab
* closes #23
* major refactor of the methods for extracting properties to consolidate the logic
* hReview parsing is incomplete due to issues with the php-mf2 backcompat parsing. see https://github.com/indieweb/php-mf2/issues/107
8 years ago
Aaron Parecki
88a2c7f5bf
add test for syndication property
8 years ago
Aaron Parecki
7d781c3129
match `http-equiv=Status` in addition to `status`
8 years ago
Aaron Parecki
7ef9d2c936
check for http-equiv for deleted posts when target URL provided
8 years ago
Aaron Parecki
de060f110f
return url and code with no link when target URL provided
8 years ago
Aaron Parecki
227311faa9
check for meta-equiv HTTP deleted
closes #16
8 years ago
Aaron Parecki
bc74919ade
return status code and final URL in response
* closes #14
* updated readme with details of the response
* includes `url` and `code` in the response with the final URL after following redirects and the HTTP status code returned
8 years ago
Aaron Parecki
753407c904
set default config for test suite
8 years ago
Aaron Parecki
876d4696fb
catch non-expanded profile URLs
apparently some people's profile URLs don't get t.co'd
8 years ago
Aaron Parecki
041cc92a8b
add test file
:headdesk:
8 years ago
Aaron Parecki
755fe8c222
fix positive timezones and case-insensitive username check
8 years ago
Aaron Parecki
0beac036b9
add twitter support
closes #18
8 years ago
Aaron Parecki
db8dba9f23
include published date for Instagram photos
if the photo has a location, the timezone is set on the published date
8 years ago
Aaron Parecki
44b452a8d0
disable tests that make actual http calls
8 years ago
Aaron Parecki
773252559d
parse instagram photos and videos
8 years ago
Aaron Parecki
3bdafad98e
parse URLs with fragment IDs
If the input URL contains a fragment, finds the DOM tree at that ID and runs the subtree through the mf2 parser.
closes #15
8 years ago
Aaron Parecki
c59ab9a2d6
also check img/video/audio for target URL
8 years ago
Aaron Parecki
ac32522c25
should always return 200
8 years ago
Aaron Parecki
565d50b862
add token fetching and authentication for posts
8 years ago
Aaron Parecki
62697ee46b
strict type checking on properties
8 years ago
Aaron Parecki
1f6de10aba
add tests for validating URL fields
* fields that should be URLs will now be omitted if the value was not a URL, such as when the value is `javascript:alert()`
* makes Mf2 class slightly more self-contained by duplicating the URL helper functions into it
* fixes tests to not cache responses in memcache
8 years ago
Aaron Parecki
1aa2f01d94
convert hostnames to lowercase
8 years ago
Aaron Parecki
bf6de4de06
only return HTTP 400 on client errors
errors fetching the URL should not result in a 400 response
8 years ago
Aaron Parecki
3edc01d1b7
add test for invitee
8 years ago
Aaron Parecki
6de9be2567
parse h-event
closes #9
8 years ago
Aaron Parecki
ee5e48e1ef
if there is exactly one item and it's an h-entry, use that
8 years ago
Aaron Parecki
2d52b982cb
fix test data
8 years ago
Aaron Parecki
8dc0caa4d0
use effective URL after following redirects when comparing URLs
8 years ago
Aaron Parecki
162d2f5ef8
add tests for feeds, catch case when a permalink has other h-entrys
8 years ago
Aaron Parecki
d1c6dc9268
add a test for rsvp
8 years ago
Aaron Parecki
4cb548fdfc
add test files illustrating a few different ways feeds (lists of posts) can appear
8 years ago
Aaron Parecki
075f78a6c1
parse h-entry even if it's not the first objet
8 years ago
Aaron Parecki
a202aa9c9a
tests for sanitizing and escaping HTML
use fork of php-mf2 until https://github.com/indieweb/php-mf2/pull/83 is merged
8 years ago
Aaron Parecki
d7672df96c
allow ul/li/ol
8 years ago
Aaron Parecki
e3ff109b37
restrict matching mf2 classes to only lowercase names
see http://microformats.org/wiki/microformats2-parsing-issues#ignore_u-camelCase_properties for background
8 years ago
Aaron Parecki
66a9b1cc9e
sanitize HTML in the entry
allow only a basic set of tags, and remove any non-mf2 classes
closes #2
8 years ago
Aaron Parecki
241594dcf5
sanitize HTML
sanitize the HTML returned in the content property. allows a common set of HTML tags.
for #2
8 years ago
Aaron Parecki
ac6d86c0db
includes nested h-cite and other objects
if a property such as `in-reply-to` is an h-cite, the URL is still returned as the `in-reply-to` value, and the h-cite object is available in a different part of the response.
closes #6
8 years ago
Aaron Parecki
097e999768
return type=unknown instead of error=no_content
8 years ago
Aaron Parecki
ed88b4881b
use file_get_contents only for appengine URLs
8 years ago
Aaron Parecki
d853a52eb4
disable the timeout test for now
8 years ago
Aaron Parecki
2924f35e0d
fix tests for new HTTPStream
8 years ago
Aaron Parecki
69223cad1d
return matching author url
8 years ago