How to get the Infobox data from Wikipedia?

Question

If I have the url to a page, how would I obtain the Infobox information on the right using MediaWiki webservices?

You must be talking about the box used on some pages, e.g. a page about a person, covering basic details (weight, height, age, ...) - right? — Reinstate Monica - Goodbye SE, Jul 23, 2010 at 10:53
I'm afraid, there is no a standard way: wikimedia.7.x6.nabble.com/template-parameters-td4998633.html Checkout this solution: github.com/ujjawal/Parse-Wiki-Infobox — Vanuan, Jun 12, 2013 at 11:14
Possible duplicate of How to get Infobox from a Wikipedia article by Mediawiki API? — Termininja, Dec 9, 2016 at 19:48
See How to extract information from a Wikipedia infobox? for a more detailed answer. — Tgr, Jul 19, 2017 at 22:35
Does this answer your question? How to extract information from a Wikipedia infobox? — Matthias Winkelmann, Sep 29, 2020 at 1:57

siznax · Accepted Answer · 2020-01-11 09:27:11Z

Use the Mediawiki API through this Python library: https://github.com/siznax/wptools

Usage:

import wptools
so = wptools.page('Stack Overflow').get_parse()
infobox = so.data['infobox']
print(infobox)

Output:

{'alexa': '{{Increase}} 34 ( {{as of|2019|12|15|lc|=|y}} )',
 'author': '[[Jeff Atwood]] and [[Joel Spolsky]]',
 'caption': 'Screenshot of Stack Overflow in February 2017',
 'commercial': 'Yes',
 'content_license': '[[Creative Commons license|CC-BY-SA]] 4.0',
 'current_status': 'Online',
 'language': 'English, Spanish, Russian, Portuguese, and Japanese',
 'launch_date': '{{start date and age|2008|9|15}}',
 'logo': 'Stack Overflow logo.svg',
 'name': 'Stack Overflow',
 'owner': '[[Stack Exchange]], Inc.',
 'programming_language': '[[C Sharp (programming language)|C#]]',
 'registration': 'Optional',
 'screenshot': 'File:Stack Overflow homepage, Feb 2017.png',
 'type': '[[Knowledge market]]',
 'url': '{{URL|https://stackoverflow.com}}'}

unable to get wptools to install on windows due to pycurl dependency. Tried for hours and given up. This works great on linux though. — user3486773, Dec 16, 2023 at 22:19

PhysRex · Accepted Answer · 2020-05-18 06:40:46Z

14

If you just want to parse the infobox or you want to get some digested data, a look at the DBPedia project: http://dbpedia.org

The DBPedia project scans the infoboxes in WP to create a RDF database from Wikipedia: https://github.com/dbpedia/extraction-framework/

edited May 18, 2020 at 6:40

PhysRex

3243 silver badges8 bronze badges

answered Jul 28, 2010 at 12:21

Pierre

34.9k31 gold badges116 silver badges194 bronze badges

Add a comment |

wizzwizz4 · Accepted Answer · 2016-04-26 17:43:12Z

11

There is no trivial way to do that. You can try fetching the page content using action=raw, i.e. http://en.wikipedia.org/w/index.php?action=raw&title=Douglas_Jardine Then find the start of the infobox by searching for {{Infobox. Then find the end by finding the matching }}, taking into account that the infobox itself can also contain {{-}} and {{{-}}} pairs.

edited Apr 26, 2016 at 17:43

wizzwizz4

6,2912 gold badges27 silver badges64 bronze badges

answered Jul 26, 2010 at 10:28

Bryan

6475 silver badges9 bronze badges

Add a comment |

Arsen Khachaturyan · Accepted Answer · 2020-01-11 10:25:31Z

8

Each Wikipedia page is associated with a Wikidata item, and all these items include the most parameters from the Wikipedia page's Infobox templates. So you need only to access the data associated with your Wikipedia page from Wikidata API.

An example of how to get the data for Wikipedia Donald Trump page from Wikidata item:

https://www.wikidata.org/w/api.php?action=wbgetentities&sites=enwiki&props=claims&titles=Donald Trump

The response will include: date and place of birth, image, religion, mother, father, children, height, signature, official website, etc..., all main info about Donald Trump included in the Wikipedia Infobox...

edited Jan 11, 2020 at 10:25

Arsen Khachaturyan

8,1424 gold badges43 silver badges43 bronze badges

answered Dec 9, 2016 at 19:26

Termininja

6,81312 gold badges50 silver badges50 bronze badges

1

Wikidata is probably the way to go to extract semantic information. It seems way more robust and maintainable than parsing Wikipedia pages
– Sylvain Leroux
Apr 10, 2020 at 0:27

Add a comment |

Reinstate Monica - Goodbye SE · Accepted Answer · 2010-08-24 14:37:35Z

Tomxu - what you're talking about is a template - which is simple a page you can include on another page. For the infobox you need to start by looking at Template:Infobox. This gives you detailed instructions.

You can also press edit (or view code) and copy the contents to your own wiki. Bear in mind that templates tend to be in a hierarchy so you might need to copy other templates that Infobox uses (if you want to use them). Each template can be identified with {{}} so e.g. the Infobox template will look like this: {{Infobox}}.

I mentioned a hierarchy: you'll actually find multiple templates that all use Template: Infobox. To find them, just type this into Wikipedia's search field: Template:Infobox and then you'll find multiple examples, e.g. Template:Infobox writer

Update: if you mean Navboxes, then see this information.

The Template:Infobox page seems to be entirely about describing the data structure of an infobox, with no information on how to access that data on a specific page. Can you clarify how to use the information on that page? — Le Mot Juiced, May 8, 2018 at 9:27

igor · Accepted Answer · 2011-12-16 19:18:37Z

0

In our project we use queries for fetching data from wiktionary like this:

http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20html%20where%20url%3D%22http%3A%2F%2Fen.wiktionary.org%2Fwiki%2Flife%22%20and%20xpath%3D'%2F%2Fdiv%5B%40id%3D%22bodyContent%22%5D'&format=xml&diagnostics=false&env=store%3A%2F%2Fdatatables.org%2Falltableswithkeys&callback=recwiki

I have no comprehensive understanding of it, but it works. Output result can de filtered using jquery or something else.

edited Dec 16, 2011 at 19:18

user212218

answered Jul 28, 2010 at 12:15

igor

11 bronze badge

Add a comment |

Pingger Shikkoken · Accepted Answer · 2015-07-22 10:18:22Z

0

What about using the Edit Mode? You could just start at the correct TextArea (most of the Time contains id="wpTextBox1") and parse the content of that TextArea ... The URL I used to find that out was (Note: section=0):

https://de.wikipedia.org/w/index.php?title=Pelephone&action=edit&section=0

Greetings

answered Jul 22, 2015 at 10:18

Pingger Shikkoken

4081 gold badge6 silver badges15 bronze badges

Add a comment |

celsowm · Accepted Answer · 2023-10-29 18:00:24Z

0

It is possible using pandas too:

import pandas as pd
page = 'https://pt.wikipedia.org/wiki/Python'
infoboxes = pd.read_html(page, index_col=0, attrs={"class":"infobox"})
print(infoboxes)

answered Oct 29, 2023 at 18:00

celsowm

3319 gold badges35 silver badges61 bronze badges

While this works, it's slower than other solutions and replaces special characters that separate the items in the box. So under an artist infobox 'Genre' it may say blues blues grass. How would you know how to correctly separate those, being that one is a two word phrase.
– user3486773
Dec 16, 2023 at 22:23

Add a comment |

Michael DiCioccio · Accepted Answer · 2017-05-26 15:54:54Z

-1

Using MediaWiki, you can view the infobox on the right of a Wikipedia page by using this link below. As you see, the format is in JSON (can be changed) and by changing the "hydrogen" word to the specific title you want you will get an page with an infobox.

https://en.wikipedia.org/w/api.php?action=parse&page=Template:Infobox%20hydrogen&format=json

answered May 26, 2017 at 15:54

Michael DiCioccio

1737 bronze badges

Add a comment |

Collectives™ on Stack Overflow

How to get the Infobox data from Wikipedia?

9 Answers 9

Your Answer

Not the answer you're looking for? Browse other questions tagged
wiki
mediawiki
wikipedia
or ask your own question.

Linked

Hot Network Questions

Collectives™ on Stack Overflow

9 Answers 9

Your Answer

Sign up or log in

Post as a guest

Not the answer you're looking for? Browse other questions tagged wikimediawikiwikipedia or ask your own question.

Linked

Related

Not the answer you're looking for? Browse other questions tagged
wiki
mediawiki
wikipedia
or ask your own question.