I am trying to extract links from the summary section of a wikipedia page. I tried the below methods : r/CodingHelp Skip to main content

Get the Reddit app

Scan this QR code to download the app now
Or check it out in the app stores
r/CodingHelp icon
r/CodingHelp icon
Go to CodingHelp
r/CodingHelp
A banner for the subreddit

**Join our discord server**: https://discord.gg/r-codinghelp-359760149683896320


Members Online

I am trying to extract links from the summary section of a wikipedia page. I tried the below methods

[Python]

I am trying to extract links from the summary section of a wikipedia page. I tried the below methods :

This url contains all the links of alot of pages: https://de.wikipedia.org/wiki/Liste_der_St%C3%A4dte_und_Gemeinden_in_Bayern#A

And for extracting links associated to any section I can filter based on the section id - for e.g.,

for the Definition section of same page I can use this url:

for the Overview section of same page I can use this url: https://de.wikipedia.org/wiki/Liste_der_St%C3%A4dte_und_Gemeinden_in_Bayern#A

But I am unable to figure out how to extract only the links from summary section

I even tried using pywikibot to extract linkedpages and adjusting plnamespace variable but couldn't get links only for summary section.

Share
Sort by:
Best
Open comment sort options

Scrape, use regex to search for <a> tags, should work! But I'm sure there's a better way. You could even parse the scrape with BeautifulSoup.