“Google opens cultural floodgate - CharlotteObserver.com” plus 1 more |
Google opens cultural floodgate - CharlotteObserver.com Posted: 16 Dec 2010 07:17 PM PST Posted: Friday, Dec. 17, 2010 With little fanfare, Google has made available to the public a mammoth database selected from nearly 5.2 million digitized books for free downloads and online searches, opening a new landscape of possibilities for research and education in the humanities. The digital storehouse, which comprises words and short phrases as well as a year-by-year count of how often they appear, represents the first time a data set of such magnitude and searching tools are at the disposal of Ph.D.s, middle school students and anyone else who likes to spend time in front of a screen. It consists of the 500 billion words that are contained in books published between 1800 and 2000 in English, French, Spanish, German, Chinese, Russian and Hebrew. The intended audience is scholarly, but a simple online tool also allows anyone with a computer to plug in a string of up to five words and see a graph that charts the phrase's use over time - a diversion that can quickly become as addictive as the habit-forming video game Angry Birds. With a click you can see that "women," in comparison with "men," is rarely mentioned until the early 1970s, when feminism gained a foothold. The two lines, moving in opposite directions, finally cross paths in about 1986. You can also learn that Mickey Mouse and Marilyn Monroe don't get nearly as much attention in print as Jimmy Carter; compare the many more references in English than in Chinese to "Tiananmen Square" after 1989; or follow how "grilling" began a climb in the late 1990s until it outpaced "roasting," "baking" and "frying" in 2004. "The goal is to give an 8-year-old the ability to browse cultural trends throughout history, as recorded in books," said Erez Lieberman Aiden, a junior fellow at the Society of Fellows at Harvard. Lieberman Aiden and Jean-Baptiste Michel, a postdoctoral fellow at Harvard, assembled the data set with Google and spearheaded a research project to demonstrate how vast digital databases can transform our understanding of language, culture and the flow of ideas. Their study, to be published in the journal Science today, offers a tantalizing taste of the rich buffet of research opportunities now open to literature, history and other liberal arts professors who may have previously avoided quantitative analysis. Science is taking the unusual step of making the paper available online to nonsubscribers. "We wanted to show what becomes possible when you apply very high-turbo data analysis to questions in the humanities," said Lieberman Aiden, whose expertise is in applied mathematics and genomics. He called the method "culturomics." The data set can be downloaded, and users can build their own search tools. With the most powerful version of the data set, the researchers measured the endurance of fame, finding that written references to celebrities faded twice as quickly in the mid-20th century as they did in the early 19th. "In the future everyone will be famous for 7.5 minutes," they write. Looking at inventions, they discovered that technological advances took, on average, 66 years to be adopted by the larger culture in the early 1800s and only 27 years in the period between 1880 and 1920. The Charlotte Observer welcomes your comments on news of the day. The more voices engaged in conversation, the better for us all, but do keep it civil. Please refrain from profanity, obscenity, spam, name-calling or attacking others for their views. Read moreRead less The Charlotte region is vast and diverse. The more voices engaged in conversation, the better for us all. The Charlotte Observer welcomes your comments on news of the day, but we ask that you keep the discourse civil.
We do not monitor each and every posting, but we reserve the right to block or delete comments that violate these rules. You can help: Notify us of violations by hitting the "Report Abuse" link. Users who continue posting comments that violate these guidelines may, at our discretion, be blocked from submitting future comments as well. And finally, as Mark Twain said: "Always do right. This will gratify some people and astonish the rest." Enjoy the discussion. This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php |
Cultural Evolution Could Be Studied in Google Books Database - Wired News Posted: 16 Dec 2010 03:43 PM PST Google's massive trove of scanned books could be useful for researchers studying the evolution of culture. In a paper published Dec. 16 in Science, researchers turned part of that vast textual corpus into a 500-billion-word database in which the frequency of words can be measured over time and space. Their initial subjects of analysis, including cultural trajectories of popular modern thinkers and the conjugation of irregular verbs, hint at what might be done. "There are many more questions, that we could never think of, that this data makes possible," said Harvard University evolutionary dynamicist Jean-Michel Baptiste. "What we present in the paper is our first explorations of what becomes possible when you have this dataset." The new research is part of an emerging approach to applying rigorous statistical analysis, traditionally known from the study of biological evolution, to cultural evolution. Unlike biological evolution, however, which can be studied through the fossil record and in genomic comparisons, cultural evolution has proved difficult to study. Researchers have used archaeological documentation of Polynesian canoe shapes and records painstakingly assembled by comparative linguists, but rich and rigorously compiled datasets are rare. One potential source is Google, which has scanned some 15 million books, or roughly 12 percent of every book ever published. Michel-Baptiste and his colleagues turned one-third of these, selected for legibility and fully documented origins, into a massive word database. Patterns that can be queried from its cloud are not necessarily answers unto themselves, they say, but a way of illuminating subjects for further investigation. "It's not just an answer machine. It's a question machine," said study co-author Erez Lieberman-Aiden, a computational biologist at Harvard University. "Think of this as a hypothesis-generating machine." In the new study, the researchers restricted their queries to single words and names, as more sophisticated querying raised the potential of copyright violation. (Google and book publishers are currently negotiating terms of access to copyright material, putting scientific accessibility and legal restrictions at odds.)
They also traced the prominence of 20th-century thinkers — at least numerically, Freud overtook Darwin shortly after World War II — and quantified the public effects of censorship on intellectuals in China and Nazi Germany. Another analysis found that modern fame both accrues and fades faster now than a century ago, giving quantitative form to an intuitively held sentiment. That example is particularly instructive, as the database identified a trend, but the implied social dynamics need to be studied through non-quantitative approaches. Cultural evolution researchers greeted the database with qualified enthusiasm. "There's a shortage of datasets. This might add another important database. But how valuable it's going to be is going to require a lot of thought about various biases in how the data is gathered," said Stanford University biologist Paul Ehrlich, whose investigations of Polynesian canoe design were among the first of the new cultural-evolution studies. Ehrlich cited the frequency of obscenity or the treatment of women as two off-the-cuff examples of topics for which a database of published books may not be a simple indicator of cultural trends. "How the books reflect society is a major issue that depends a lot on what particular research you're interested in," he said. Mark Pagel, a University of Reading evolutionary biologist who has studied the evolution of language, called the database "thrilling." But like Ehrlich, he said the usefulness of the database would only become evident with time, and will require more-sophisticated use. To describe the database's potential for studying cultural evolution, the study authors coined the term "culturomics," a term that resonates with the modern field of genomics. "There was great promise to genomics, and enormous hype surrounding the completion of the Human Genome Project. It was a few years before people realized that having a list of genes wasn't very useful at all. We now appreciate that it's not genes that matter, but how genes are expressed in bodies," said Pagel. "I'm not saying the data isn't useful. It's just that the database is not going to cough up simple answers," he said. The database is freely available for online queries and complete download. Images: 1) Textual frequencies of influential western thinkers during the 20th century./Science. 2) Contrasting evolution of "burned" and "burnt" in the United States and United Kingdom./Science. 3) Culinary trends./Science. See Also: Citation: "Quantitative Analysis of Culture Using Millions of Digitized Books." By Jean-Baptiste Michel, Yuan Kui Shen, Aviva Presser Aiden, Adrian Veres, Matthew K. Gray, The Google Books Team, Joseph P. Pickett, Dale Hoiberg, Dan Clancy, Peter Norvig, Jon Orwant, Steven Pinker, Martin A. Nowak, Erez Lieberman Aiden. Science, Vol. 330 Issue 6011, Dec. 17, 2010. This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php |
You are subscribed to email updates from cultural - Bing News To stop receiving these emails, you may unsubscribe now. | Email delivery powered by Google |
Google Inc., 20 West Kinzie, Chicago IL USA 60610 |
0 comments:
Post a Comment