Aaron Parecki
5b2b4f3142
return original input URL for feed discovery when 302 is found
closes #86
pirms 6 gadiem
Aaron Parecki
19126b5836
pass thru HTTP code and parse deleted pages
more consistently returns HTTP 410 now
pirms 6 gadiem
Aaron Parecki
5b779feb8a
check for key first
pirms 6 gadiem
Aaron Parecki
c5d417e87e
leave out placeholder alt text from instagram
pirms 6 gadiem
Aaron Parecki
156fd62678
Include alt text from Instagram posts
This adds a new property, "meta", which includes alt text and has room to include additional properties later.
closes #85
pirms 6 gadiem
sebsel
1e638e7fd0
fix a case where the removed h-card is the first item
The previous check removed a h-card, but this one still takes the first of the original array. Which means, in the case of `h-card+h-entry`, the h-card get’s removed, but then the if triggers and the `h-card` is selected.
pirms 7 gadiem
Aaron Parecki
bed6efba29
jsonfeed: use feed info as author as fallback
pirms 7 gadiem
Aaron Parecki
87b2cf10d8
add follow-of posts
replaces #78
pirms 7 gadiem
Aaron Parecki
8043ba575f
improve authorship discovery
closes #79
pirms 7 gadiem
Aaron Parecki
18dc92966b
recognize pattern of h-entry + h-card
* a single h-entry and h-card, where the h-entry has no URL, will result in a permalink page with that h-entry
* multiple h-entrys followed by an h-card is a feed
pirms 7 gadiem
Aaron Parecki
a0f80593e9
move xpath query to parent helper class
pirms 7 gadiem
Aaron Parecki
9163341af2
normalize relative URLs in JSONFeed items
closes #77
pirms 7 gadiem
Aaron Parecki
e79872fd37
change class for AS Emoji
pirms 7 gadiem
Aaron Parecki
470639f486
recognize h-event "content" in addition to "description"
pirms 7 gadiem
Aaron Parecki
5418072704
don't use twitter bio URL as author URL
pirms 7 gadiem
Aaron Parecki
8b4a38cef7
catch error with rel-urls
pirms 7 gadiem
Aaron Parecki
43db6098fc
handle the case where the server returns multiple content-type headers
pirms 7 gadiem
Aaron Parecki
707e750429
fix likes and reposts
pirms 7 gadiem
Aaron Parecki
7252d5a3f4
also parse the object inside Create activities
pirms 7 gadiem
Aaron Parecki
ca9c8c02ef
AS: parse likes and reposts
pirms 7 gadiem
Aaron Parecki
85d973916f
support articles and summary
pirms 7 gadiem
Aaron Parecki
c9371788c5
fix for old php
pirms 7 gadiem
Aaron Parecki
d3e36038b2
parse basic ActivityStreams objects
including from rel=alternate
pirms 7 gadiem
Aaron Parecki
154b7e874a
check for a rel=alternate to existing parsed mf2 JSON and use that instead
pirms 7 gadiem
Aaron Parecki
7621bca4a6
adds new "source-format" property to indicate how XRay found the data
* mf2+html
* mf2+json
* feed+json
* xml
* instagram/facebook/github/xkcd
pirms 7 gadiem
Aaron Parecki
b074d652e0
also accept application/xml as RSS feeds
pirms 7 gadiem
Aaron Parecki
38d307de1c
implements post type discovery
returns a new property `post-type` next to `type`
closes #25
pirms 7 gadiem
Aaron Parecki
70f1576926
support twitter animated gifs
pirms 7 gadiem
Aaron Parecki
e1600cc5bc
real fix for quotation-of
pirms 7 gadiem
Aaron Parecki
c4da480866
quotation-of should always be a single value
pirms 7 gadiem
Aaron Parecki
112b75b623
parse quotation-of from HTML as well
closes #73
pirms 7 gadiem
Aaron Parecki
01cce9b823
sends an Accept header when fetching posts
pirms 7 gadiem
Aaron Parecki
e2780ba0a0
when interpreting JSON, don't require `value` for html values
pirms 7 gadiem
Aaron Parecki
417cc1b3cc
parse redirect uri for h-app
parse from both link tags and the u-redirect-uri property
pirms 7 gadiem
Aaron Parecki
fc74da5be9
add parser for instagram user feeds
pirms 7 gadiem
Aaron Parecki
2d19db0308
include instagram bio in h-card
pirms 7 gadiem
Aaron Parecki
921d5262ea
also parse instagram profile URLs
pirms 7 gadiem
Aaron Parecki
6f39655c8a
parse instagram user info from HTML instead of secret JSON API
adds script to refresh the downloaded instagram data for the tests as well
pirms 7 gadiem
Aaron Parecki
c70b29479a
updates for instagram parsing
instagram seems to have rolled out the `graphql` key everywhere now
pirms 7 gadiem
Aaron Parecki
85c3a17934
whitespace cleanup
pirms 7 gadiem
Aaron Parecki
4959ec15f2
remove duplicate url values
pirms 7 gadiem
Aaron Parecki
8026279cba
fix tests for new mf2 parser
main difference is the deprecated rel handling
pirms 7 gadiem
Aaron Parecki
a50cd6284b
fix whitespace handling for br tags in html
pirms 7 gadiem
Aaron Parecki
c27f228314
include in-reply-to URL for tweets
pirms 7 gadiem
Aaron Parecki
c68c7661c8
inspect content to determine if a page is atom or rss
closes #62
pirms 7 gadiem
Aaron Parecki
cb1e32278d
convert newlines to <br> for html in tweets
pirms 7 gadiem
Aaron Parecki
bf4bc3a668
extract photos and videos from streaming tweets when truncated
pirms 7 gadiem
Aaron Parecki
fb2fcec9c6
include HTML for tweets with links or user mentions
also expands parsing to be able to handle twitter JSON from the streaming API which is subtly different from the HTTP API.
closes #61
pirms 7 gadiem
Aaron Parecki
b995a1d3ee
whitespace
pirms 7 gadiem
Aaron Parecki
452accf6bf
include `quotation-of` property for quoted tweets
pirms 7 gadiem