전공 과목 이수1👨‍💻/파이썬

스크랩핑- 네이버 날씨 / 뉴스 헤더라인

천숭이 2021. 12. 4. 01:39
import requests
from bs4 import BeautifulSoup


def create_soup(url):
    header = {"User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.45 Safari/537.36"}
    res = requests.get(url, headers=header)
    res.raise_for_status()
    soup = BeautifulSoup(res.text, "lxml")
    return soup

def scrape_weather():
    url = "https://search.naver.com/search.naver?sm=tab_hty.top&where=nexearch&query=%ED%98%B8%EC%9B%90%EB%8F%99%EB%82%A0%EC%94%A8&oquery=%EC%84%9C%EC%9A%B8%EB%82%A0%EC%94%A8&tqi=hjoIdwp0YidssLEyuSdssssssSw-369936"

    soup = create_soup(url)

    print("[* 오늘의 호원동 날씨 *]")

    # 오늘 날씨  # 어제보다 5도낮아요 / 맑음
    cast = soup.find("p", attrs = {"class":"summary"})
    summary_text = cast.get_text()
    
    # 현재 온도
    curr_temp = soup.find("div", attrs = {"class":"temperature_text"}).get_text()[1:]

    # 최저 기온, 최고 기온
    min_temp = soup.find("span", attrs = {"class":"lowest"}).get_text()
    max_temp = soup.find("span", attrs = {"class":"highest"}).get_text()

    # 강수 확률
    weather_left = soup.find("span", attrs = {"class":"weather_left"})
    morning_rain_rate = weather_left.span.get_text()
    afternoon_rain_rate = weather_left.span.next_element.get_text()

    # 미세먼지
    dust = soup.find("ul", attrs = {"class":"today_chart_list"})
    pm10 = dust.find_all("li")[0].get_text()
    pm25 = dust.find_all("li")[1].get_text()
    uv = dust.find_all("li")[2].get_text()

    #출력
    print(summary_text[:11]+"/"+summary_text[12:]) 
    print("{} ({} / {})".format(curr_temp, min_temp, max_temp))
    print("강수 : 오전 {} / 오후 {}".format(morning_rain_rate, afternoon_rain_rate))
    print()
    print(pm10[2:])
    print(pm25[2:])
    print(uv[2:])

def scrape_headline_news():
    url = "https://news.naver.com"
    print("[ 오늘의 헤드라인 뉴스]")
    soup = create_soup(url)
    news_list = soup.find("ul", {"class":"hdline_article_list"}).find_all("li")
    for idx, news in enumerate(news_list):
       headline = news.find("div", {"class":"hdline_article_tit"}).a.get_text()
       headline=headline.strip()
       link = url + news.find("a")["href"]
       print("{}. {}".format(idx + 1, headline))
       print("링크 : {}".format(link))



if __name__ == "__main__":
    scrape_weather()  # 오늘의 날씨 정보 가져오기
    scrape_headline_news()

# scarpe_weather 실행결과

[* 오늘의 호원동 날씨 *]
어제보다 4° 낮아요/ 맑음 
현재 온도-2°  (최저기온-6° / 최고기온5°)
강수 : 오전 0% / 오후 0%

미세먼지 좋음
초미세먼지 좋음
자외선 좋음

# scrape_headline_news 실행결과

[ 오늘의 헤드라인 뉴스]
1. 이재명, 조동연 사의 표명에 “모든 책임 제가 지겠다”
링크 : https://news.naver.com/main/read.naver?mode=LSD&mid=shm&sid1=100&oid=277&aid=0005009987
2. 이재명, 이재용에 "기본소득 얘기해보면 어떠냐" 제안한 이유는?
링크 : https://news.naver.com/main/read.naver?mode=LSD&mid=shm&sid1=100&oid=469&aid=0000644845
3. 중국판 우버 '디디추싱' 뉴욕증시 상장 폐지...中 당국 압력
링크 : https://news.naver.com/main/read.naver?mode=LSD&mid=shm&sid1=104&oid=052&aid=0001672809
4. 양제츠 "종전선언 추진 지지..코로나 안정때 시 주석 방한"
링크 : https://news.naver.com/main/read.naver?mode=LSD&mid=shm&sid1=100&oid=437&aid=0000282538
5. 돈 갚으라며 동창 딸 결혼식서 축의금 가져간 제약사 2세 송치
링크 : https://news.naver.com/main/read.naver?mode=LSD&mid=shm&sid1=102&oid=020&aid=0003397667