컴퓨터/Python

네이버 카페 서브페이지 웹크롤링

풍경소리^^ 2019. 6. 23. 09:43

from selenium import webdriver
from bs4 import BeautifulSoup as bs
import info

ie_webdriver=r'c:\webdriver\IEDriverServer.exe'
driver=webdriver.Ie(ie_webdriver)

driver.get('https://www.naver.com/')

driver.find_element_by_class_name('lg_local_btn').click()

# driver.find_element_by_id('id').send_keys(info.ID)
# driver.find_element_by_id('pw').send_keys(info.PW)

# 캡챠 우회 - javascript
driver.execute_script("document.getElementsByName('id')[0].value=\'"+본인ID+"\'")
driver.execute_script("document.getElementsByName('pw')[0].value=\'"+본인PW+"\'")

driver.find_element_by_class_name('btn_global').click()

# 목적지
base_url='https://cafe.naver.com/jkitstudy'

# 카테고리
url="/ArticleList.nhn?search.clubid=29375370&search.menuid=24&search.boardtype=L"

driver.get(base_url+url)

# 프레임 이동
driver.switch_to.frame('cafe_main')

soup=bs(driver.page_source,'html.parser')

# 공지사항 있을 때 - 공지사항 넘어가서 본 내용만
# soup=soup.find_all(class_='article-board m-tcol-c')[1]
# rows=soup.find_all('td',class_='td_article')
# for row in rows:
# article_title=row.find('a',class_='article')
# if len(article_title.find_all('span'))>0:
# for s in soup('span'):
# s.extract()
# article_title=article_title.get_text().strip()
# print(article_title)


for i in range(1,6):
navi = soup.find({"class":"Nnavi"})
article_url="/ArticleList.nhn?search.clubid=29375370&search.menuid=24&search.boardtype=L&search.questionTab=A&search.totalCount=72&search.page="+str(i)

subpage_url=base_url+article_url
print(subpage_url)
driver.get(subpage_url)

# 프레임 이동
driver.switch_to.frame('cafe_main')
soup_subpage = bs(driver.page_source, 'html.parser')

rows=soup_subpage.find_all(class_='board-list')
for row in rows:
board_list=row.find(class_='m-tcol-c')
article_title=board_list.get_text().strip()
print(article_title)

https://www.youtube.com/watch?v=jI4Q5jd0LW0

 

'컴퓨터 > Python' 카테고리의 다른 글

json viewer  (0) 2019.06.23
bs4  (0) 2019.06.23
python 같은그림 클릭  (0) 2019.06.16
pyinstaller 파이썬 exe 파일만들기  (0) 2019.06.13
Python: Openpyxl Iterate Column Wise or Row Wise  (0) 2019.06.12