When you reach the last page, the button gets disabled:
<a data-pagina="2" href="?ss=4da73052cb8296b5&st=G1&q=incerteza+pol%C3%ADtica+economia&cat=a&species=not%C3%ADcias&page=2"
class="proximo fundo-cor-produto"> próximo</a>
^^^^
# ok
<a data-pagina="41" href="?ss=4da73052cb8296b5&st=G1&q=incerteza+pol%C3%ADtica+economia&cat=a&species=not%C3%ADcias&page=41"
class="proximo disabled">próximo</>
^^^^
# no more next pages
So just keep looping until then:
from bs4 import BeautifulSoup
import requests
from itertools import count
page_count = count(1)
soup = BeautifulSoup(requests.get(url.format(next(page_count))).content)
disabled = soup.select_one("#paginador ul li a.proximo.disabled")
print([a["href"] for a in soup.select("div.busca-materia-padrao a")])
print(soup.select_one("a.proximo.disabled"))
while not disabled:
soup = BeautifulSoup(requests.get(url.format(next(page_count))).content)
disabled = soup.select_one("#paginador ul li a.proximo.disabled")
print([a["href"] for a in soup.select("div.busca-materia-padrao a")])
If you were using requests wanted to check if you had been redirected you could access the .history
attribute:
In [1]: import requests
In [2]: r = requests.get("http://g1.globo.com/busca/?q=incerteza%20pol%C3%ADtica%20economia&cat=a&ss=4da73052cb8296b5&st=G1&species=not%C3%ADcias&page=5000")
In [3]: print(r.history)
[<Response [301]>]
In [4]: r.history[0].status_code == 301
Out[4]: True
Another way using requests would be to disallow redirects and catch a 301 return code.
soup = BeautifulSoup(requests.get(url.format(next(page_count))).content)
print([a["href"] for a in soup.select("div.busca-materia-padrao a")])
while True:
r = requests.get(url.format(next(page_count)), allow_redirects=False)
if r.status_code == 301:
break
soup = BeautifulSoup(r.content)
print([a["href"] for a in soup.select("div.busca-materia-padrao a")])