13

I have the following Wikipedia API search query:

http://en.wikipedia.org/w/api.php?&action=query&generator=search&gsrnamespace=0&gsrlimit=20&prop=pageimages|extracts&pilimit=max&exintro&exsentences=1&exlimit=max&continue&pithumbsize=100&gsrsearch=Albert%20Einstein

I just want to list famous people - is there a way to do that?

3
  • Also, if there is a suggestion with a limitation - for instance one that only works if a date of birth exists - I am still interested.
    – rybo111
    May 24, 2015 at 8:58
  • 1
    How do you define famous? You could argue that anyone who has a Wikipedia page dedicated to them is famous.
    – Adam Rice
    Apr 1, 2016 at 21:07
  • @AdamRice Since Wikipedia only allows "notable people", it would be any non-fictional person, I suppose.
    – rybo111
    Jan 17, 2021 at 22:34

4 Answers 4

7
+100

There isn't an exact way to limit your search results to only famous people. However, you can use a few different filters in with Wikipedia's CirrusSearch to roughly narrow your results to people:

  • incategory: Can you find a category that includes the people you want? Categories may not be a great solution, since they may be inconveniently specific.
  • linksto: Do articles about people link to a common article?
  • hastemplate: Can you find a template that is used on biographies of famous people? The template {{birth date}} may be a good solution (if it's fine to limit your search to mostly non-fictional people with non-disputed known birthdates).

For example, see your same search result with hastemplate:Birth_date to see people:

https://en.wikipedia.org/w/api.php?&action=query&generator=search&gsrnamespace=0&gsrlimit=20&prop=pageimages|extracts&pilimit=max&exintro&exsentences=1&exlimit=max&continue&pithumbsize=100&gsrsearch=hastemplate%3ABirth_date+Albert%20Einstein

{
"batchcomplete": "",
"continue": {
    "gsroffset": 20,
    "continue": "gsroffset||"
},
"query": {
    "pages": {
        "92733": {
            "pageid": 92733,
            "ns": 0,
            "title": "Albert A. Michelson",
            "index": 14,
            "thumbnail": {
                "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/9/9e/Albert_Abraham_Michelson2.jpg/71px-Albert_Abraham_Michelson2.jpg",
                "width": 71,
                "height": 100
            },
            "pageimage": "Albert_Abraham_Michelson2.jpg",
            "extract": "<p><b>Albert Abraham Michelson</b> (surname pronunciation anglicized as \"Michael-son\", December 19, 1852 \u2013 May 9, 1931) was an American physicist known for his work on the measurement of the speed of light and especially for the Michelson\u2013Morley experiment.</p>"
        },
        "736": {
            "pageid": 736,
            "ns": 0,
            "title": "Albert Einstein",
            "index": 1,
            "thumbnail": {
                "source": "https://upload.wikimedia.org/wikipedia/commons/thumb/3/3e/Einstein_1921_by_F_Schmutzer_-_restoration.jpg/76px-Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
                "width": 76,
                "height": 100
            },
            "pageimage": "Einstein_1921_by_F_Schmutzer_-_restoration.jpg",
            "extract": "<p><b>Albert Einstein</b> (<span><span>/<span><span title=\"/\u02c8/ primary stress follows\">\u02c8</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span><span title=\"'s' in 'sigh'\">s</span><span title=\"'t' in 'tie'\">t</span><span title=\"/a\u026a/ long 'i' in 'tide'\">a\u026a</span><span title=\"'n' in 'no'\">n</span></span>/</span></span>; <small>German:</small> <span title=\"Representation in the International Phonetic Alphabet (IPA)\">[\u02c8alb\u025b\u0250\u032ft \u02c8a\u026an\u0283ta\u026an]</span>; 14 March 1879&#160;\u2013 18 April 1955) was a German-born theoretical physicist.</p>"
        },
        "1139788": {
            "pageid": 1139788,
            "ns": 0,
            "title": "Alfred Einstein",
            "index": 6,
            "thumbnail": {
                "source": "https://upload.wikimedia.org/wikipedia/en/thumb/1/12/Alfred_Einstein.jpg/70px-Alfred_Einstein.jpg",
                "width": 70,
                "height": 100
            },
            "pageimage": "Alfred_Einstein.jpg",
            "extract": "<p><b>Alfred Einstein</b> (December 30, 1880&#160;\u2013 February 13, 1952) was a German-American musicologist and music editor.</p>"
        },

        ...

Someday, you should be able to use Wikidata to search for entities on Wikipedia that are an instance of human. For now, we'll have to work with search filters.

3
  • Interesting. But a query with hastemplate:Birth_date does not find Tom Cruise, whereas the same query without it lists him first.
    – rybo111
    Mar 28, 2016 at 16:20
  • For now I have decided to skip articles where the revision content does not contain birth_date. I would prefer to exclude them in the query, if that's possible.
    – rybo111
    Mar 29, 2016 at 0:05
  • Once you filter by categories I'd suggest to look for back links inside Wikipedia. That's easy because you have a "What links here?" link in each article. A famous person probably will have more back links.
    – derloopkat
    Mar 31, 2016 at 14:25
1

My workaround for now is to filter search results server-side, by only showing articles that have birth_date in their revision content.

The bounty is still available if someone finds a way around this.

1

I think all persons will have ... birthDate) (if still alive) or birthDate - died) in the first line of the extract. So I guess you can filter only records with an extract matching this regex:

^[^.]*\d{4}\)[^.]*\..*

Which will only match texts with something like 2001) in the first row.

If it's safe to assume that other records don't have it (I'm not sure that it is), then you can stop there. If not, at least you filtered a few more records before checking the revision.

1

There is two urls to search famous peoples :

https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&exintro&exsentences=1&exlimit=max&gsrlimit=20&gsrsearch=hastemplate:Birth_date_and_age+Melanie_laurent&pithumbsize=100&pilimit=max&prop=pageimages%7Cextracts
https://en.wikipedia.org/w/api.php?action=query&generator=search&format=json&exintro&exsentences=1&exlimit=max&gsrlimit=20&gsrsearch=hastemplate:Birth_date+Melanie_laurent&pithumbsize=100&pilimit=max&prop=pageimages%7Cextracts

The only difference between both url is gsrsearch parameter :

To get people alive you have to use hastemplate:Birth_date_and_age

To get dead people you have to use hastemplate:Birth_date

In my case, i have to do two requests.

In this example url, juste replace Melanie_laurent by your query.

1
  • I'm a bit late, but how would I search for all people on a specific Wikipedia page. For example, I want any links to people (ex. Xi Jinping) from the China Wikipedia page.
    – Alex
    Jan 21, 2021 at 13:10

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Not the answer you're looking for? Browse other questions tagged or ask your own question.