Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
277 views
in Technique[技术] by (71.8m points)

Get web article information (content , title, ...) from multiple web pages-python code

There is a python Library - Newspaper3k, which makes life easier to get content of web pages. [newspaper][1]

for title retrieval:

import newspaper
a = Article(url)
print(a.title)

for content retrieval:

url = 'http://fox13now.com/2013/12/30/new-year-new-laws-obamacare-pot-guns-and-drones/'
article = Article(url)
article.text

I want get info about web pages (sometimes title, sometimes actual content)there is my code to fetch content/text of web pages:

from newspaper import Article
import nltk
nltk.download('punkt')
fil=open("laborURLsml2.csv","r") 
# 3, below read every line in fil
Lines = fil.readlines()
for line in Lines:
    print(line)
    article = Article(line)
    article.download()
    article.html
    article.parse()
    print("[[[[[")
    print(article.text)
    print("]]]]]")

The content of "laborURLsml2.csv" file is: [laborURLsml2.csv][2]

My issue is: my code reads first URL and prints content but failed to read 2 URL on-wards


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Answer

0 votes
by (71.8m points)
等待大神答复

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome to WuJiGu Developer Q&A Community for programmer and developer-Open, Learning and Share
...