Posts this month
A blog on financial markets and their regulation
Updated: In the comments, Maries pointed me to Mihai Parparita’s Reader is Dead tools (see also here and here). Though Google Reader has officially shut down, it is still accessible, Mihai’s tools are still working, and I was able to create a multi-GB archive of everything that existed in my Google Reader. But the tools required to read this archive are still under development. So in the meantime, I still need my old code and may be more such code to read all the XML and JSON files in these archives.
With Google Reader shutting down, I have been experimenting with many other readers including Feedly and The Old Reader. Since many feed readers are still being launched and existing readers are being improved, I may keep changing my choice over the next few weeks. Importing subscriptions from Google Reader to any feed reader is easy using Google Takeout. The problem is with the starred items. I finally sat down and wrote a python script that reads the starred.json file that is available from Google Takeout and writes out an html file containing all the starred items.
Python’s json library makes reading and parsing the json file a breeze. By looking at some of the entries, I think I have figured out the most important elements of the structure. I am not sure that I have understood everything, and so suggestions for improving the script are most welcome.
Where the original feed does not contain the entire post, but only a summary, ideally I would like to follow the link, convert the web page to PDF and add a link pointing to the converted PDF file. This would protect against link rot. I tried doing this with wkhtmltopdf but I was not satisfied with the quality of the conversion. Any suggestions for doing this would be most welcome. Ideally, I would like to use Google Chrome’s ability to print a web page as PDF, but I do not find any command line options to automate this from within the python script.