英语 英语 日语 日语 韩语 韩语 法语 法语 德语 德语 西班牙语 西班牙语 意大利语 意大利语 阿拉伯语 阿拉伯语 葡萄牙语 葡萄牙语 越南语 越南语 俄语 俄语 芬兰语 芬兰语 泰语 泰语 泰语 丹麦语 泰语 对外汉语

New AI Tool Searches Millions of Historical Newspaper Pages

时间:2020-10-02 23:51来源:互联网 提供网友:nan   字体: [ ]
    (单词翻译:双击或拖选)

 

A new search tool uses machine learning to search millions of U.S. newspaper pages for historical pictures.

The U.S. Library of Congress recently launched the tool, called Newspaper Navigator. The online search system is available for free to the public.

The Library of Congress is the world's largest library. It offers materials from the creative record of the United States. The library serves as the main research service for the U.S. Congress.

Newspaper Navigator currently1 permits users to search more than 16 million pages from newspapers across the country, from 1900 to 1963.

The newspaper pages were digitized for another Library of Congress project, called Chronicling America. This tool also permits searches across the library's 16 million newspaper pages. The pages contain more than 1.5 million images.

The Chronicling America system permits users to find and look at full newspaper pages as digitized images. Users can also search the collection by keyword, using optical2 character recognition3 -- OCR. OCR is a tool that uses digital cameras to identify printed characters on a page for searches or to produce text.

This meant that people using the Chronicling America site had to search through newspaper pages themselves when trying to find specific images. The new Newspaper Navigator tool offers the ability to carry out searches based on image-only content in the collection.

This is where the machine-learning methods come in. The search system was trained to recognize different kinds of images. For example, it was designed to tell the difference between photos, maps, comics, advertisements, etc. It can also identify similar images and return these in search results.

Benjamin Lee created the system. He is a member of the Library of Congress' Innovator4 in Residence Program. The program was established to sponsor people from different fields to create new ways to present the library's huge historical collections to the public.

Lee trained a machine-learning model to identify the visual content and then ran the model over all 16 million pages in Chronicling America.

His training model was based on another Library of Congress experiment called Beyond Words. That project invited members of the public to help identify cartoons, drawings, pictures and advertisements in newspapers during World War I.

Lee said that after he learned of the Beyond Words experiment, he saw a great possibility to use that information to power his machine-learning tool. "I began to wonder whether this identified visual content was the key to throwing open the treasure chest of visual content, throughout all 16 million pages in Chronicling America."

Newspaper Navigator works like other search engines. Users enter a search term in the "keyword" box. They can also choose to limit search results by location, as well as by date.

But one of the most powerful tools in the system is the ability to search images by visual similarity. Users of the tool can save images to a personal "collection." They can then use those images as a basis for finding other visually similar images across the library's full collection.

The system even permits users to "retrain" the machine learning tool for individual searches. This is done by examining the images that the search returns. By selecting whether images found were similar or not similar to the desired result, the user is "retraining" the system to improve its search performance.

A demonstration5 of the Newspaper Navigator is available to help users learn more about the tool and how to carry out different searches. The creators hope the tool can be useful for historians6, reporters, educators, professional researchers or anyone interested in learning about U.S. history through newspapers.

The Library of Congress notes that all images included in Newspaper Navigator and Chronicling America are in the public domain7, meaning people are free to use them as they wish.

Words in This Story

page – n. one part of a website

digitize – v. to put information into the form or a series of numbers, usually so that it can be understood by a computer

character – n. a letter, number or other mark or sign used in writing or printing

comics – n. a series of pictures that tell a story

content – n. information contained in a piece of writing, a speech, a movie or on the internet

visual – adj. related to seeing

sponsor – v. to pay for someone to do something or for something to happen

location – n. place where something takes place


点击收听单词发音收听单词发音  

1 currently SvMzI2     
adv.通常地,普遍地,当前
参考例句:
  • Currently it is not possible to reconcile this conflicting evidence.当前还未有可能去解释这一矛盾的例证。
  • Our contracts are currently under review.我们的合同正在复查。
2 optical 7IoxW     
adj.光(学)的,眼的,视力的,视觉的
参考例句:
  • He has optical trouble.他的视力有问题。
  • Telescopes and microscopes are optical instruments.望远镜和显微镜是光学仪器。
3 recognition zUYxm     
n.承认,认可,认出,认识
参考例句:
  • The place has changed beyond recognition.这地方变得认不出来了。
  • A sudden smile of recognition flashed across his face.他脸上掠过一丝笑意,表示认识对方。
4 innovator r6bxp     
n.改革者;创新者
参考例句:
  • The young technical innovator didn't lose heart though the new system was not yet brought into a workable condition. 尽管这种新方法尚未达到切实可行的状况,这位青年技术革新者也没有泄气。 来自《简明英汉词典》
  • Caesar planned vast projects and emerged as a great innovator. 恺撒制定了庞大的革新计划。 来自英汉非文学 - 文明史
5 demonstration 9waxo     
n.表明,示范,论证,示威
参考例句:
  • His new book is a demonstration of his patriotism.他写的新书是他的爱国精神的证明。
  • He gave a demonstration of the new technique then and there.他当场表演了这种新的操作方法。
6 historians aa2dff49e1cda6eb8322970793b20183     
n.历史学家,史学工作者( historian的名词复数 )
参考例句:
  • Historians seem to have confused the chronology of these events. 历史学家好像把这些事件发生的年代顺序搞混了。
  • Historians have concurred with each other in this view. 历史学家在这个观点上已取得一致意见。
7 domain ys8xC     
n.(活动等)领域,范围;领地,势力范围
参考例句:
  • This information should be in the public domain.这一消息应该为公众所知。
  • This question comes into the domain of philosophy.这一问题属于哲学范畴。
本文本内容来源于互联网抓取和网友提交,仅供参考,部分栏目没有内容,如果您有更合适的内容,欢迎点击提交分享给大家。
------分隔线----------------------------
TAG标签:   VOA英语  慢速英语
顶一下
(0)
0%
踩一下
(0)
0%
最新评论 查看所有评论
发表评论 查看所有评论
请自觉遵守互联网相关的政策法规,严禁发布色情、暴力、反动的言论。
评价:
表情:
验证码:
听力搜索
推荐频道
论坛新贴