Using the unique skillsets of grassroots Bernie Sanders supporters to create websites or apps to help Bernie Sanders win. Projects are independent from the campaign.
Scraping data from RCP (RealClearPolitics)
RCP provides a wonderful web interface for looking at aggregated poling data.
For example, shown here is a lovely chart (via D3.js) which shows you the RCP average polling data for each candidate on the democratic side as a time series where you can sort by a custom time range or various selections like 1year, 6 months, 14 days, etc.
Below the D3.js chart they have a listing of the polling data.
Does anyone know if RCP will provide this data for analysis, if not, is it possible to scrape from their website easily?
I'd like to make some plots with this data but am only really familiar with python. My web knowledge is a bit lacking. I was hoping to find something in the web-source showing an xml file or a csv file that was being loaded into the d3 chart which would be accessible somehow but didn't see anything like that at first glance
JSONP
http://www.realclearpolitics.com/epolls/json/3824_historical.js?1453388629140&callback=return_json
Thank you!
Yup. In the future, there are two good ways of pulling data out of a website.
Like you mentioned, if the data isn't an image, more likely than not the raw values are loaded somewhere. It will either be embedded in the source of the original page loaded, or you can use the chrome/safari/firefox/etc. tools to see all network requests, and you can go through them to figure out where the data comes from.
If all else fails, you can embed jQuery onto the webpage and try to scrape the data from the browser. For example, if I want a list of animal names:
http://lib.colostate.edu/wildlife/atoz.php?letter=ALL
I look at the source and I see that all the links are directly embedded in the tables in this format (table.names > tbody > tr > td > a) where > represents a direct child
I can run this in the console:
and get the 2,996 animals on the page:
Really appreciate the help. I do a lot of data analysis work in python but none of it requires any web knowledge rather just parsing data sets from experiments I work on.
Will save this for future reference.
NP, I like stealing data :P
You can do the same sort of thing with the python library beautiful_soup. Or, if you like node, check out my post for an example of doing it with Cheerio
the 'requests' library is also excellent (a little more general purpose and nicer than beautiful_soup imo but both work great for little things like these).