Planet Social Media Research

crowd

March 15, 2010

Intentialicious

International Workshop on Modeling Social Media 2010 (MSM'10)

I’d like to point you to a Call for Papers for a workshop I’m involved in organizing at Hypertext 2010 in Toronto this June. I’m really excited about the focus of this event, and I’m looking forward to lots of exciting discussions and presentations (check out the invited talks and panelists!).

International Workshop on

Modeling Social Media 2010 (MSM’10)

Website: http://kmi.tugraz.at/workshop/MSM10/

June 13, 2010, co-located with Hypertext 2010,

Toronto, Canada

Important Dates:

* Submission Deadline: April 9, 2010

* Notification of Acceptance: May 13, 2010

* Final Papers Due: May 20, 2010

* Workshop date: June 13, 2010, Toronto, Canada

Workshop Organizers:

  • Alvin Chin, Nokia Research Center, Beijing, China, alvin.chin (at) nokia.com
  • Andreas Hotho, University of Wuerzburg, Germany, hotho (at) informatik.uni-wuerzburg.de
  • Markus Strohmaier, Graz University of Technology, Austria, markus.strohmaier (at) tugraz.at

Format:

The workshop will be opened by an invited talk given by Ed Chi (Palo Alto Research Center). The talk will be followed by a number of peer-reviewed research and position paper presentations and a discussion panel including Barry Wellman (University of Toronto), Marti Hearst (University of California, Berkeley) and Ed Chi (Palo Alto Research Center).

Workshop’s Objectives and Goals:

The goal of this workshop is to focus the attention of researchers on the increasingly important role of modeling social media. The workshop aims to attract and discuss a wide range of modeling perspectives (such as justificative, explanative, descriptive, formative, predictive, etc models) and approaches (statistical modeling, conceptual modeling, temporal modeling, etc). We want to bring together researchers and practitioners with diverse backgrounds interested in 1) exploring different perspectives and approaches to modeling complex social media phenomena and systems, 2) the different purposes and applications that models of social media can serve, 3) issues of integrating and validating social media models and 4) new modeling techniques for social media. The workshop aims to start a dialogue aiming to reflect upon and discuss these issues.

Topics:

Topics may include, but are not limited to:

+ new modeling techniques and approaches for social media

+ models of propagation and influence in twitter, blogs and social tagging systems

+ models of expertise and trust in twitter, wikis, newsgroups, question and answering systems

+ modeling of social phenomena and emergent social behavior

+ agent-based models of social media

+ models of emergent social media properties

+ models of user motivation, intent and goals in social media

+ cooperation and collaboration models

+ software-engineering and requirements models for social media

+ adapting and adaptive hypertext models for social media

+ modeling social media users and their motivations and goals

+ architectural and framework models

+ user modeling and behavioural models

+ modeling the evolution and dynamics of social media

Preliminary Program Committee (confirmed):
  • Ansgar Scherp, Koblenz University, Germany
  • Roelof van Zwol, Yahoo! Research Barcelona, Spain
  • Marti Hearst, UC Berkeley, USA
  • Ed Chi, PARC, USA
  • Peter Pirolli, PARC, USA
  • Steffen Staab, Koblenz University, Germany
  • Barry Wellman, University of Toronto, Canada
  • Daniel Gayo-Avello, University of Oviedo, Spain
  • Jordi Cabot, INRIA, France
  • Pranam Kolari, Yahoo! Research, USA
  • Tad Hogg, Institute for Molecular Manufacturing, USA
  • Wai-Tat Fu, University of Illinois at Urbana-Champaign, USA
  • Thomas Kannampallil, University of Texas, USA
  • Justin Zhan, Carnegie Mellon University, USA
  • Marc Smith, ConnectedAction, USA
  • Mark Chignell, University of Toronto, Canada

Website: http://kmi.tugraz.at/workshop/MSM10/



Filed under: events

by Markus Strohmaier at March 15, 2010 09:26 AM

March 12, 2010

Complexity and Social Networks Blog

Cell Phone Data Collection - New Experiments and Existing Resources

I've been asked a lot by many different researchers how they can get their hands on behavioral data logging programs that work on cell phones, such as in Nathan Eagle and Sandy Pentland's landmark Reality Mining study. That study was back in 2004, and they were using old Nokia phones with the Symbian OS, which presented a host of problems. Below I'll go through the currently available data logging applications for phones, and I'll describe a new system being built on top of Android that will allow for an incredibly enhanced platform for social scientists. All of these applications log Bluetooth proximity information, call logs, and cell tower IDs, but some log additional information such as WiFi access points, SMS messages, and accelerometer data. Here are many of the dominant data logging applications available today:

Nokia

Only 6600 phones are officially supported, but the Context Group at the University of Helsinki has developed a number of behavior logging applications for these phones, available for download here (use mitv2).

iPhone

The iPhone is nice because a lot of people have them, but it's a poor choice for data logging because it does not allow processes to run in the background. This means you have to have jailbroken iPhones to run these applications, and it also means you can't offer them for download on the official app store. Anmol Madan from our group has made an iPhone app available for download here, and he also wrote a short tutorial on how to get this application running. Your iPhones have to have older versions of the firmware, however, and it doesn't work with the new 3G iPhones.

Windows Mobile

This is still a widely used phone OS, and Anmol has written a fairly robust data logging application that eclipses all of the previous versions in functionality with WiFi access point logging, survey launcher, and automatic updating tool. He hasn't made it available for download quite yet, but it should be appearing in the next few weeks on the Human Dynamics Social Evolution website. Unfortunately, this version will not be useful for new phones in a few months because Microsoft is releasing Windows Mobile 7.0, which is not compatible with the old 6.x version that this application is written for.

Android

Now the good news: Android phones are becoming increasingly popular and will most likely eclipse all other platforms as the dominant phone OS. Almost every cell phone manufacturer is producing Android phones and with a unified and unrestricted app store there is an opportunity to easily reach millions of people after a short development period. Android also allows easily for automatic updates.

Nadav Aharony from our group is spearheading the project for creating an Android data logging application, and he has already deployed it on over 50 phones in a new study of consumption patterns among family groups (rather than the normal college students in dorms study). This application logs most of the usual suspects (Bluetooth, WiFi access points, call logs), but it also hashes the contents of text messages, allowing researchers to see not just who texts who, but get an idea about how topics spread (not the actual content, since the words are hashed, but just that topic A passed from person 1 to person 2). Actually this application has been running on my phone for over a month with no real problems. The platform also comes with a special app store that allows researchers to log what applications people install, allowing you to look at how application usage spreads among friends. Soon Nadav is also planning to allow researchers to deploy their own apps over this new app store so that researchers can push surveys or more sophisticated logging tools to study participants. Instead of paying for apps, though, users will get paid to download apps so that they will participate (sort of like Mechanical Turk).

The Android Reality Mining platform promises to be extremely powerful, and the results from the current study should further push the boundaries of computational social science.

March 12, 2010 03:33 PM

Complexity and Social Networks Blog

Steve Strogatz writes about Math, Science, and the World

strogatz45.jpgIn a series of weekly pieces for The New York Times, Steve Strogatz, a marvelous network scientist, has been very good. My favorite so far, which gets right into balance theory and transitivity, is this one. Follow him on Mondays in the NYT, in the Opinionator Blog.

March 12, 2010 12:12 PM

Data Mining: Text Mining, Visualization and Social Media

MSN Gets Hyperlocal

Briefly, MSN is now surfacing hyperlocal blog content (originally found in the Local Lens application on Bing Maps) in the Local Edition area of the site. Currently this is only available in the ten cities that Local Lens covers, but that will change...

Localedition

by Matthew Hurst at March 12, 2010 06:30 AM

March 11, 2010

Connected Action

Conference: NodeXL and Social Media Networks tutorial at CHI 2010

If you are attending the CHI 2010 conference in Atlanta and are interested in social media network analysis, consider attending this tutorial:

CN03: Introduction to Social Network Analysis

Time: Monday, 12 April 2010, 11:30 to 18:00

Organizers: Marc A. Smith, Panayiotis Zaphiris, C.S. Ang, Derek Hansen

Benefits

This course provides an overview of Social Network Analysis (SNA) and demonstrates through theory and practical case studies how it can be used in HCI (especially computer-mediated communication and CSCW) research and practise. This topic is of particular importance due to the popularity of social networking websites (e.g. YouTube, Facebook, MySpace etc.) and social computing. As people increasingly use online communities for social interaction, new methods are needed to study these phenomena. SNA is a valuable contribution to HCI research as it gives an opportunity to rigorously study the complex patterns of online communication.

Social network theory views a network as a group of actors who are connected by a set of relationships. Actors are often people, but can also be nations, organizations, objects etc. Social Network Analysis (SNA) focuses on patterns of relations between these actors. It seeks to describe networks of relations as fully as possible. This includes teasing out the prominent patterns in such networks, tracing the flow of information through them, and discovering what effects these relations and networks have on people and organizations. It can therefore be used to study network patterns of organizations, ideas, and people that are connected via various means in an online environment.



Audience

We welcome practitioners and academics interested in computer-mediated communication, universal design, and social software. No background knowledge about Social Network Analysis or statistics is required.

Origins

Versions of this course were delivered as tutorials at HCII 2007, 2009, Nordi-CHI 2009, INTERACT 2009, the INSNA Sunbelt Social Network Analysis Conference, the 2009 Communities and Technologies Conference and is scheduled to be run this Summer at the Stanford Media – X workshop and the 2009 Social Computing Conference . The content of the course is also part of the Inclusive Design course at the Cyprus University of Technology and has been taught in the context of a class on Communities of Practice at the University of Maryland, College of Information Studies. The course is constantly being revised and modified to account for current trends in this dynamic and emerging research area.

Features

Upon completion of this course, participants will:

  • understand the basics of SNA, its terminology and background;
  • be able to transform communication data (e.g. YouTube, MySpace etc.) to network data;
  • understand the different possible presentations of social networks, e.g in a matrix or a sociogram;
  • know practically how SNA can be applied to HCI (especially CMC) analysis;
  • get familiar with the use of standard SNA tools and software in general and the NodeXL social network analysis add-in for Excel in particular; and
  • be able to derive practical and useful information through SNA analysis that would help designers cultivate more successful online communities.

Instructors

  • Dr. Marc A. Smith is a sociologist and Chief Social Scientist at Connected Action Consulting Group, a provider of fine quality social media analysis platforms and systems. Smith specializes in the social organization of online communities and computer mediated interaction. He founded and managed the Community Technologies Group at Microsoft Research in Redmond, Washington and is now leading the development of social media reporting and analysis tools for Connected Action. Smith lives and works in Silicon Valley, California. [ Marc@connectedaction.net, http://www.connectedaction.net, http://delicious.com/marc_smith/]
  • Dr. Panayiotis Zaphiris is the departmental coordinator and an Associate Professor at the Department of Multimedia and Graphic Arts of the Cyprus University of Technology. His research interests lie in HCI with an emphasis on inclusive design and social aspects of computing. More information can be found at http://www.zaphiris.org .
  • Dr. C.S. Ang is a Lecturer at Kent University, UK. His research interests include social aspects of computer games, computer-mediated communication and agent-based simulation of social networks.
  • Dr. Derek Hansen is assistant professor at the University of Maryland, College of Information Studies and the Director of the Center for the Advanced Study of Communities and Information. Hansen focuses his research and teaching on social computing, mass collaboration, consumer health informatics, and information services.
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at March 11, 2010 10:30 PM

Connected Action

Talk at IE University in Segovia, Spain: Transnational Connections, March 24-25, 2010

I will attend and speak at a symposium being heldMarch 24-25, 2010 at the IE University Department of Communication in Segovia, Spain.  The topic is: Transnational connections: Challenges and opportunities for communication.

“The Symposium aims to generate discussion on cutting-edge ideas in political communication, encourage international cooperation and unite scholars and practitioners.”

The symposium is organized with Center for Global Communication Studies at the Annenberg School for Communication, University of Pennsylvania.

Organizer and founding Dean of IE School of Communication, Samuel Martín-Barbero notes that the event will gather:

“More than forty international panelists, moderators and speakers (who) will not only reflect on the state of the field, but will also discuss cutting-edge advances in theory, research and practice.”

I will attend along with my colleague John Kelly, from the Berkman Center for Internet & Society, Harvard University and Founder of Morningside Analytics.

Here is the Program, a link to the Registration, and the Twitter stream for the conference.

I will speak about “Analyzing Internet social media: visualizing the social networks in computer networks”, details follow:

Social media systems on the Internet are sociologically interesting: why do some online groups succeed where others fail?  How do different collections of online media and populations of authors differ from one another?  How do patterns of contribution vary and how do these differences illustrate the roles people play within their communities?  Several visualizations of patterns of contribution and connection in a range of Internet social media including web boards, enterprise social networks services, and personal email are presented to illustrate the range of variation among social media repositories and between types of contributors.  These images suggest that a more comprehensive overview of social media can generate sociologically relevant findings, improve community management tasks as well as provide features that can improve search and ranking of user generated content.  A freely available tool, NodeXL, will be demonstrated to perform basic social media analysis tasks.  Extending these tools to include mobile social software (“mososo”) data sets is a major new direction.   In the not too distant future, mobile devices will possess a range of sensors and become  more “socially aware”.  When phones routinely notice each other the nature of social interaction will change dramatically.  How will places and  locations change when machines become socially aware?  In this talk, sociologist Marc Smith, Chief Social Scientist for Connected Action Consulting Group, a provider of social media analysis platforms and services, will describe these new technologies and some ways of thinking about their implications.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at March 11, 2010 09:55 PM

Connected Action

Talk at Israel Internet Association on February 22, 2010

2009 - December - isoc_logo2009 - December - isoc logo

The Annual Meeting of the Israel Internet Association (http://www.isoc.org.il (English)) was held February 22-23 2010. I spoke at this year’s meeting: http://www.isoc.org.il/conf2010/agenda.php?lang=en

Part 1

Part 2

The previous year’s conference website is at: http://www.isoc.org.il/conf2009/program.php

The Israel Internet Association is the official Israeli Chapter of the Internet Society.  Their annual meeting is a central event of academics (sociologists, psychologists, business and law) as well as industry participants from sectors including mobile cellular companies and internet service suppliers.

My talk title: Analyzing Internet social media: visualizing social networks in (mobile) computer networks
Abstract: Social media systems on the Internet are sociologically interesting: why do some online groups succeed where others fail?  How do different collections of online media and populations of authors differ from one another?  How do patterns of contribution vary and how do these differences illustrate the roles people play within their communities?  Several visualizations of patterns of contribution and connection in a range ofInternet social media including web boards, enterprise social networks services, and personal email are presented to illustrate the range of variation among social media repositories and between types of contributors.  These images suggest that a more comprehensive overview of social media can generate sociologically relevant findings, improve community management tasks as well as provide features that can improve search and ranking of user generated content.  A freely available tool, NodeXL, will be demonstrated to perform basic social media analysis tasks.  Extending these tools to include mobile social software (“mososo”) data sets is a major new direction.   In the not too distant future, mobile devices will possess a range of sensors and become more “socially aware”.  When phones routinely notice each other the nature of social interaction will change dramatically.  How will places and locations change when machines become socially aware?  In this talk, sociologist Marc Smith, Chief Social Scientist for Connected Action Consulting Group, a provider of social media analysis platforms and services, will describe these new technologies and some ways of thinking about their implications.
Photos from the trip:
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at March 11, 2010 09:00 PM

Mike Love - influence and visualization

mikelove

The New York Times has an article, Disease Cause Is Pinpointed With Genome, by Nicholas Wade, which is a good overview of the status of whole genome sequencing for disease research. Some quotes:

Besides identifying disease genes, one team, in Seattle, was able to make the first direct estimate of the number of mutations, or changes in DNA, that are passed on from parent to child. They calculate that of the three billion units in the human genome, 60 per generation are changed by random mutation — considerably less than previously thought.

On genome-wide associational studies:

And in most diseases the culprit DNA was linked to only a small portion of all the cases of the disease. It seemed that natural selection has weeded out any disease-causing mutation before it becomes common. The finding implies that common diseases, surprisingly, are caused by rare, not common, mutations.

…implying we need to do more fine-grained studies of genomes. On the cost of whole genome sequencing:

The family whose genomes they report in Science were sequenced by a company with a new DNA sequencing method, Complete Genomics of Mountain View, Calif., at a cost of $25,000 each. Clifford Reid, the chief executive, said that the company was scaling up to sequence 500 genomes a month and that for large projects the price per genome would soon drop below $10,000. “We are on our way to the $5,000 genome,” he said.



by mikelove at March 11, 2010 12:27 PM

Mike Love - influence and visualization

mikelove

What does it mean that some data is Poisson? Here’s a quick refresher. The binomial distribution describes the number of heads out of n independent “coin flips” with probability p of the coin coming up heads. The Poisson can be described as a limiting case of the binomial distribution. If the number of coin flips n goes to infinity, but the probability for one flip coming up heads is λ/n, for a fixed parameter λ, this becomes a Poisson random variable with mean value λ. This means we expect a finite number of heads over an infinite number of coin flips because the mean value is n*p = n*(λ/n) = λ.

The Poisson distribution is “a discrete probability distribution that expresses the probability of a number of events occurring in a period of time if these events occur with a known average rate and independently of the time since the last event.” Time is continuous (made up of an infinite sequence of discrete time steps), but the process under study has a known average rate of events occurring in a time period.

Lots of real life count data is Poisson. At Stanford, a biologist came in to the department for consulting, who wanted to compare the number of birds spotted in two locations. Since the counts were made over a fixed period of time, and the chance of seeing a bird out of a scattered population of birds is fairly rare, this data is probably Poisson distributed. This means the standard t-test shouldn’t be used, because it relies on the counts being normally distributed. A simple fix for this is to take the square root of Poisson distributed data and run the test on the transformed values. Other more accurate transformations for Poisson data can be found on the Wikipedia page on Anscombe transformations.



by mikelove at March 11, 2010 11:48 AM

March 10, 2010

Augmented Social Cognition

Wikipedia's People-Ware Problem

Last week, we hosted a visit from the Wikimedia Foundation on issues relating to our work on community analytics, and what it tells us about Wikipedia's problems and possible solutions. Naoko Komura (pictured at right) of the Wikimedia Usability Initiative, as well as Eric Zachte, the staff data analyst (also pictured at right), spoke very eloquently about how we can create social tools to direct the best social attentions to the needed parts of Wikipedia.







Fundamentally, Wikipedia has always had a "people-ware" problem: the distribution of the expertise that is freely donated to the right places.  It has been and always will remain its greatest challenge. The amazing thing about Wikipedia is that it managed to do this for so long, such that a valuable knowledge repository can be built up as a result.  At first, people simply came because it was the place to be.  Now, we have to work a little harder.



We spent a lot of time talking about the best way to model this people-ware problem, either using biological metaphors (evolutionary systems with various forces), or economic models (see last post here).  However, one thing to be aware of is the danger of "analysis paralysis", where you spend so much time analyzing the problem, and forget that there are already many ideas that have been generated for moving the great experiment forward.



For example, there are many places in Wikipedia that are not well populated. It's well-known that many scientific and math concept articles, for example, could use an expert-eye to catch the errors and explain the concepts better. How can we build an expertise finder that would actually invite people to fix problems that we know exists in Wikipedia?



Another idea might be to have the whole system be more social. Chris Grams blogs about a part of this idea here. We suggested some time ago to have a system like WikiDashboard, where you actually show the readers what the social dynamics have been for a particular article.



Wikipedia was created in 2001, when social web was still in its infancy. During the ensuing 9 years, it has changed very little, and I would argue Wikipedia have not kept up with the times. Lots of "Social Web" systems and new cultural norms have been built up already.  For example, I suspect that many of us would not mind at all to reveal our identities on Wikipedia, and we might like to login with our OpenIDs and even have verified email addresses so that the system can send me verification/clarification/notification messages. The system perhaps should connect with Facebook, so that my activities (editing an article on "Windburn") is automatically sent to my stream there. My friends, upon seeing that I have been editing that article, might even join in.



I think that Wikipedia is about to change, and it is going to become a much more socially-aware place. I certainly hope that they will tackle the People-Ware (instead of the Tool-Ware) problems, and we will see it become an exciting place again.



by Ed H. Chi (noreply@blogger.com) at March 10, 2010 12:11 AM

March 05, 2010

Complexity and Social Networks Blog

High Throughput Humanities

Along with Riley Crane (of Darpa Challenge and Colbert Report fame), physicist Gourab Ghoshal, and quantitatively minded art historian Max Schich, I'm putting together a workshop on High Throughput Humanities as a satellite meeting at this years European Conference on Complex Systems in Lisbon this September. The general idea is to put together people who ask interesting questions of massive data sets. More specifically - as the title implies - we want to figure out how to use computers to do research in the humanities in a way extends beyond what can currently be accomplished by human beings.

Entire libraries are in the process of being scanned and we would like to begin to investigate questions like: Are there patterns in history that are currently 'invisible' due to the fact that humans have limited bandwidth - that we can only read small fraction of all books in a lifetime?

We have an exciting program committee so it should be an interesting day!

Confirmed Programme Committee Members

Albert-László Barabási, CCNR Northeastern University, USA.

Guido Caldarelli, INFM-CNR Rome, Italy.

Gregory Crane, Tufts University, USA.

Lars Kai Hansen, Technical University of Denmark.

Bernardo Huberman, HP Laboratories, USA.

Martin Kemp, Trinity College, Oxford, UK.

Roger Malina, Leonardo/ISAST, France.

Franco Moretti, Stanford University, USA.

Didier Sornette, ETH Zurich, Switzerland.

Practical information can be found at the conference website. Oh, and did I mention that Lisbon is beautiful in September! Sign up an join us. The workshop abstract is reprinted below.

Abstract

The High Throughput Humanities satellite event at ECCS'10 establishes a forum for high throughput approaches in the humanities and social sciences, within the framework of complex systems science. The symposium aims to go beyond massive data aquisition and to present results beyond what can be manually achieved by a single person or a small group. Bringing together scientists, researchers, and practitioners from relevant fields, the event will stimulate and facilitate discussion, spark collaboration, as well as connect approaches, methods, and ideas.

The main goal of the event is to present novel results based on analyses of Big Data (see NATURE special issue 2009), focusing on emergent complex properties and dynamics, which allow for new insights, applications, and services.

With the advent of the 21st century, increasing amounts of data from the domain of qualitative humanities and social science research have become available for quantitative analysis. Private enterprises (Google Books and Earth, Youtube, Flickr, Twitter, Freebase, IMDb, among others) as well as public and non-profit institutions (Europeana, Wikipedia, DBPedia, Project Gutenberg, WordNet, Perseus, etc) are in the process of collecting, digitizing, and structuring vast amounts of information, and creating technologies, applications, and services (Linked Open Data, Open Calais, Amazon's Mechanical Turk, ReCaptcha, ManyEyes, etc), which are transforming the way we do research.

Utilizing a complex systems approach to harness these data, the contributors of this event aim to make headway into the territory of traditional humanities and social sciences, understanding history, arts, literature, and society on a global-, meso- and granular level, using computational methods to go beyond the limitations of the traditional researcher.

March 05, 2010 06:49 PM

March 04, 2010

Augmented Social Cognition

The problem of matching social attention and products...



Many people have already stolen the attention-scarcity ideas from Herb Simon and said that the most important problem in our information overloaded society is the efficient distribution of attention. What some have called the "attention economy" is nothing more than a re-packaging of this idea.



In business, of course, getting the consumers' attention is quickly becoming an important aspect of being successful. Traditional ways of getting people's attention is through advertisement, and we have witnessed a dramatic transformation of how advertisements work in the online world in the last decade, from display advertising to search advertising and, more recently, further to action advertising. Increasingly, we can tie advertising dollars to direct consumer action.



For us, it was not a stretch, then, to start thinking about how the consumer actions are starting to quickly feedback to product design. Thus, we now have people talking about crowdsourced product designs. The most agile companies now listen to the consumers via channels such as Facebook, Twitter, and Blog analytics. They do this via services such as brand management consultants and sentiment analysis tools, so much so, they are able to discern tiny changes in consumer awareness of product issues and their desires.



We know also that traditional economic models serves to optimize the distribution of products to people who want them. But these models have also recently been used to optimize the distribution of people's attention to products that might serve their needs. The two usages obviously goes hand-in-hand.



If we can help companies to serve people attention spots just-in-time with the best products, we would have a highly optimized economy that wastes little energy in distributing worthless advertisements (or spam). In fact, the existence of spam points to the inefficiencies in the economic system.



Turns out that versions of this problem exists everywhere in the Web2.0 world:

  • The problem of efficiently distributing the best tweets to the people who want to view them is a version of this attention distribution problem. Any time you see a tweet that was worthless to you is an opportunity for optimization.

  • The problem of pointing experts to the most valuable articles that they can contribute to in Wikipedia is another version.

Solutions to these problems might take the form of recommendation systems or filtering systems, but might also be efficient interactive browsing systems (for products in an online store like Amazon, or articles in Wikipedia). Some thought experiments:

  • What if we can design an expertise finding system that recommends the best articles for you to contribute to in Wikipedia? Would it increase participation rates?

  • What if we analyze your social network everyday and tell you the best tweets that you should spend five minutes on? Would more people retweet more often?

  • What if product designers are better tuned to trending topics and needs, would they enable companies to succeed more often? Are companies like Zazzle and Cafepress the prototype examples of lubricating this path?



Your thoughts?



by Ed H. Chi (noreply@blogger.com) at March 04, 2010 11:44 PM

Complexity and Social Networks Blog

Video from International Seminar on Network Theory: Network Multidimensionality in the Digital Age

Many readers of this blog will find the videos of the following conference that took place at USC a couple of weeks ago quite interesting-- it's a fabulous line up.



International Seminar on Network Theory:

Network Multidimensionality in the Digital Age

The international Network Theory Conference, organized by the ANN and SONIC research centers, took place on Feb 19-20 at the University of Southern California. Bruno Latour delivered the keynote speech titled "Networks, Societies, Spheres: Reflections of an Actor-network theorist." The four panels were focused on conceptual and methodological aspects of network theory, network inclusion and exclusion, network theories of power, and the semantic web. The list of presenters includes: Noshir Contractor, Peter Monge, Paul Leonardi, Yochai Benkler, Ernest J. Wilson III, Rahul Tongia, Karine Barzilai-Nahon, Wendy Hall, Nigel Shadbolt, David Grewal, and Manuel Castells.



height="344">

March 04, 2010 09:23 PM

Data Mining: Text Mining, Visualization and Social Media

Seeing a Web of Data

TEDi master Flake gives a dynamic overview of the Live Labs Pivot demo.



by Matthew Hurst at March 04, 2010 03:28 PM

March 02, 2010

Joseph Reagle on Wikipedia

Wales and Objectivism

I just finished an excellent biography of Ayn Rand and her philosophy in the context of American political culture. While reading, I couldn't help think of Wales' expressed interest in Objectivism and the next to the last page actually comments on this issue:

One of the many ironies of Rand's career is her latter-day popularity among entrepreneurs who are pioneering new forms of community. Among her high-profile fans as Wikipedia's founder Jimmy Wales, once an active participant in the listserv controversies of the Objectivist Center. A nonprofit that depends on charitable donations, Wikipedia may ultimately put its rival encyclopedias out of business. At the root of Wikipedia are warring sensibilities that seemed to both embody and defy Rand's beliefs. The website's emphasis on individual empowerment, the value of knowledge, and its own risky organizational model reflects Rand's sensibility. But its trust in the wisdom of crowds, celebration of the social nature of knowledge, and faith that many working together will produce something of enduring value contradict Rand's adage "all creation is individual." (Burns 2009, p. 284)

March 02, 2010 10:10 PM

Complexity and Social Networks Blog

Boston Ignite 7

For those in Cambridge/Boston, this post is a reminder that Ignite Boston 7 is taking place this Thursday evening at the Microsoft NERD office near Kendall Square in Cambridge, MA. For the uninitiated, Ignite talks work as follows: Presenters get 20 slides that are displayed 15 seconds each, for a grand total of five minutes to make their point. The results of these strict constraints are creative and (usually) exciting talks about a variety of subjects.

As always, the list of speakers is varied with many titles that should be interesting to readers of this blog. In particular, I look forward to Tim Hwang's talk On the Ecology of Awesome. Shameless plug: I will play a minor role in Max Schich's talk about the upcoming High Throughput Humanities Symposium at the ECCS'2010 conference this summer.

March 02, 2010 06:46 PM

Kevin Burton's feedBlog

burtonator

I spend the last couple days playing with InnoDB page compression on the latest Percona build.

I’m pretty happy so far with Percona and the latest InnoDB changes.

Compression wasn’t living up to my expectations though.

I think the biggest problem is that the compression can only use one core in replication and ALTER TABLE statements.

We have an 80GB database that was running on 96GB boxes filled with RAM.

I wanted to try to run this on much smaller instances (32GB-48GB boxes) by compressing the database.

Unfortunately, after 24 hours of running an ALTER TABLE which would only use one core per table, the SQL replication thread went to 100% and started falling behind fast.

I think what might be happening is that the InnoDB page buffer is full because it can’t write to the disk fast enough which causes the insert thread to force compression of the pages in the foreground.

Having InnoDB only use one core / thread to compress pages seems like a very bad idea (especially on 8-16 core boxes, I’m testing on an 8 core box now but we have 16 core boxes in production).

The InnoDB page compression documentation doesn’t seem to yield any hints about when InnoDB pages are compressed and in which thread. Nor does there seem to be any configuration variables that we can change in this regard.

Perhaps a ‘compressed buffer pool only’ option could be interesting.

This way InnoDB does not have to maintain an LRU for compressed/decompressed pages. Further, it can read pages off disk, decompress them, and then leave the pages decompressed in a small buffer. Then a worker thread (executing on another core) can compress the pages and move them back into the buffer pool where they can be stored and placed back on disk.

This process could still become disk bottlenecked but at least it would use multiple cores.



by burtonator at March 02, 2010 05:53 PM

February 23, 2010

Connected Action

Video: Network Theory Documentary

The folks from the Gephi project have posted a copy of the video documentary that explores network theory.

It does not give much time to the history of network theory in the social sciences but does a nice job of conveying the value of the small worlds model.

http://vimeo.com/2477265

Connected: The Power of Six Degrees – Part 1 from gephi on Vimeo.


http://vimeo.com/2477361

Connected: The Power of Six Degrees – Part 2 from gephi on Vimeo.


http://vimeo.com/2477459

Connected: The Power of Six Degrees – Part 3 from gephi on Vimeo.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 23, 2010 02:00 PM

February 22, 2010

Complexity and Social Networks Blog

Easley & Kleinberg Text on Networks ....

easley kleinberg.png

David Easley and Jon Kleinberg of Cornell have written what might be the first real undergraduate textbook on networks. They developed it as part of a very successful course (cross-listed as Econ/Soc/CS/InfoSci) at Cornell.

The course website is here.

The text, Networks, Crowds, and Markets (to be published shortly by Cambridge) is here.

The text (as well as the course website) has homework problems.



Even though the book has too much economics in it, it should be a big hit for many disciplines. Parts of it can certainly be understood by any social science major, and the more technical parts, on epidemics, diffusion, the internet, and dynamics can be used in math, stat, or physics courses.

February 22, 2010 07:59 PM

February 18, 2010

Data Mining: Text Mining, Visualization and Social Media

Bing Maps Updates Hyperlocal Application

Today we pushed out a modest update to our hyperlocal application Local Lens. Users won't see a huge difference, but behind the scenes we have improved our text mining systems which recognize and map entity names (e.g. restaurants, museums, coffee shops) and addresses. In addition, some other changes will have improved coverage in a few areas (such as Capitol Hill in Seattle).

Below, in a post from SF Appeal's The Alley, we pull out one of the more complex address expressions.

Bingupdate

With the tight integration between Bing map apps and the full mapping platform, clicking on the address expression brings the user directly to the location mentioned, providing us with intimate context.

Bingupdate1  

For a look at the future of mapping, take a look at Blaise' latest TED talk on the Bing Maps ecosystem.

 

by Matthew Hurst at February 18, 2010 05:52 AM

February 16, 2010

Joseph Reagle on Wikipedia

Diffing Word Files

For the most part, I wrote my dissertation and book manuscript using a simplified version of markdown complemented with biblatex citations. Because it was a simple text file, it made managing the edits to the manuscript very easy. I could do global textual replacements trivially. Also, obviously, it was trivial to generate PDFs, HTML, etc. Using Mercurial, I could take advantage of some nice features like the "attic" extension which allows me to keep change sets on the side to be applied only when appropriate. So, for example, the changes necessary generate HTML were kept in the attic and would only be applied when I wanted that.

Unfortunately, once the manuscript went into the MIT Press system, I had to use Microsoft Word. As much as much as the Word document format annoys me, I understand it is widely used, and I can't think of an easy alternative that also provides the capability for editorial annotations. Nonetheless, I had a difficult time seeing changes in Microsoft Word, and want to backport the changes into my source files. And, there does not appear to be a nice textual difference tool for Word documents.

I have posted a small Python script that makes use of antiword and dwdiff but also gives me context on either side of the change. It, of course, doesn't work well with formatting, but is useful and will generate output like the following:

   reflects {-the-} [+a+] stabilization
   a {-number of pragmatic questions: it-} [+project was conceived. It+] would
   there {-will-} [+would+] be
   article {-will-} [+would+] be
   linked {-to from-} [+via+] a

February 16, 2010 05:06 PM

February 15, 2010

Mike Love - influence and visualization

pivotboot

The theory behind bootstrapping a pivotal quantity always eluded me.  The simplest way to approach bootstrapping a statistic is to generate B bootstrap samples of the original data and calculate B statistics from these samples.  Then take the alpha/2 and 1-alpha/2 quantiles of the statistic to form a confidence interval of level alpha. But in some circumstances, more than alpha of the intervals might not contain the true value.  This can be shown through a simple simulation.

Take 20 random exponential variables with mean 3.  In R this looks like:

x = rexp(20,rate=1/3)

Then generate B=1000 bootstrap samples of x, and calculate the mean for each bootstrap sample.

s = numeric(B)

for (j in 1:B) {

boot = sample(n,replace=TRUE)

s[j] = mean(x[boot])

}

Then, for an alpha = .05 / 95% confidence interval, look at the .025 and .975 quantiles of the bootstrap statistics in the vector s:

simple.ci = quantile(s,c(.025,.975))

If I repeat this process from the start (including drawing a new x of 20 random exponential variables of mean 3) I can see how often the intervals actually contain the true mean.  Here are 100 replicates of the whole interval creation process:

11 of the intervals, highlighted in red, do not contain the true mean 3, the blue vertical line.  On average we would expect 5 if these are 95% confidence intervals.  If I repeat this 1000 times, I get 88.4% of the intervals containing the true mean.

The bootstrap-t interval is a way of dealing with this problem.  In An Introduction to the Bootstrap by Efron and Tibshirani, the authors write,

The quantity (theta-hat – theta)/se-hat is called an approximate pivot: this means that its distribution is approximately the same for each value of theta….

Some elaborate theory shows that in large samples the coverage of the bootstrap-t interval tends to be closer to the desired level than the coverage of the standard interval or the interval based on the t table….

Notice also that the normal and t percentage points are symmetric about zero, and as a consequence the resulting intervals are symmetric about the point estimate theta-hat.  In contrast, the bootstrap-t percentiles can be asymmetric about 0, leading to intervals which are longer on the left or right.  This asymmetry represents an important part of the improvement in coverage it enjoys.

Note the authors here are not comparing the bootstrap-t interval to the bootstrap percentile method. Also, they add a caveat: “The bootstrap-t method, at least in its simple form, cannot be trusted for more general problems, like setting a confidence interval for a correlation coefficient.”

But it works well for calculating the mean, as in this case. This time we calculate a pivotal quantity as the bootstrapped statistic. For the vector x of random exponential variables of mean 3, we first calculate the mean and standard deviation of the original dataset:

x = rexp(n,rate=1/true.mean)

mean.x = mean(x)

sd.x = sd(x)

Then for each bootstrap sample, calculate the difference between the mean and the bootstrap mean, divided by the standard deviation of the bootstrap.

z = numeric(B)

for (j in 1:B) {

boot = sample(n,replace=TRUE)

z[j] = (mean.x – mean(x[boot]))/sd(x[boot])

}

Then to form a confidence interval, take quantiles of our bootstrapped statistic, multiply by the standard deviation of the original data and add this to the mean of the original data:

pivot.ci = mean.x + sd.x*quantile(z,c(.025,.975))

Now there are only 4 out of 100 intervals that do not contain the true mean:

If I do this 1000 times, I get 95.7% of the intervals containing the true mean. The mean interval size has increased though, from 2.4 for the simple intervals to 3.2 for the bootstrap-t intervals.

Here is an R script for this simulation.



by mikelove at February 15, 2010 05:34 PM

Complexity and Social Networks Blog

Reminder: submission to 2010 Political Networks Conference due today

Just a reminder that the deadline for submitting proposals and funding applications to the 2010 Political Networks Conference is today. Hope to see you all at Duke in May.

________________________________

2010 Political Networks Conference

Call for Papers

The Third Annual Political Networks Conference, formerly known as the Harvard Networks in Political Science Conference, will be held May 19-21, 2010 at Duke University. The conference calls for paper proposals on all aspects of political networks. Submissions are encouraged from a wide range of disciplines including, but not limited to, political science, sociology, economics, anthropology, psychology, business, information systems, and complex systems. Methodological approaches may include, but are not limited to, statistical studies, laboratory and field experiments, ethnography, formal models, and computer simulations. Papers may report on investigations of individual political behavior; interactions of individuals with traditional and online media; activist participation in interest groups, social movements, and political parties; interest group and corporate relationships with government agencies and officials; relationships within and among local, state, and federa!

l government agencies; legislator communications and alliance formation; the political implications of international trade flows; ethnic cooperation and conflict; international alliances and conflict; and/or a wide range of other political topics. Please refer to the 2008 and 2009 conference programs for examples of what has been presented at events in previous years (2008: , 2009: .)

The first day of the conference will be devoted to didactic sessions on network methodology by Matt Jackson (Economics, Stanford) and Carter Butts (Sociology, UC-Irvine). The second and third days will be organized via panels organized from submissions. The conference will also feature a keynote address by Matt Jackson and a plenary address by James Moody (Sociology, Duke University).

Registration is open. February 15 is the deadline for proposals. Decisions will be announced on or before March 15, 2010. April 1 is the deadline for early registration.

NSF-funded fellowships are available to support approximately 40 graduate students and recent Ph. D.s to attend the conference, including attendees from outside the United States. Priority will be given to new attendees who plan to present at the 2010 conference. However, attendees who would like to attend without a presentation are eligible for funding, as are previously funded applicants.

A block of rooms at attractive rates at the Washington Duke Inn is available (until April 18) to conference attendees.

Details, including links to registration, application for aid, and accommodations, are available at .

Please submit your ideas for papers and panels, and plan to attend the Duke Conference.

Michael T. Heaney, Program Chair, mheaney@umich.edu

Michael D. Ward, Local Host, mw160@duke.edu

APSA Political Networks Organized Section:

Robert Huckfeldt, Chair

Scott McClurg, Chair-elect

John Scholz, Treasurer

Meredith Rolfe, Communications

Betsy Sinclair, 2010 APSA Organizer

Board Members -- Chris Ansell, Delia Baldassarri and Paul Thurner

-----------

February 15, 2010 01:50 PM

Data Mining: Text Mining, Visualization and Social Media

Apps are great, but Ecosystems are better

By now, everyone is pretty much on the same page with apps on mobile devices. The popularity of these programs, which range from simple widgets to full applications, is not surprising given a) the model has been in existence for years (i.e. 'applications' running on PCs) and b) the simplicity of putting together a website or service which backs what is, in many cases, a simple rendering of the data.

However, something really special happens with these apps live not in the flat world of the operating system as most do, but in ecosystems. In this case, by ecosystem, I mean an environment in which applications can interact with each other in a seamless manner, one in which the user experience is at the fore. My favourite example of this right now is the adhoc-spontaneous-movie experience in the bing ecosystem on the iphone.

You start off at the bing home:

Eco1 

hitting the 'movie' button brings up a listing of which movies are on near you in time and space.

Eco2 

Selecting a movie I can read synopses and information about possible locations.

Eco3

Selecting a location provides me with an option to get directions

Eco5 

And then, there I am, watching the utterly disappointing Wolfman.

Wolfman

by Matthew Hurst at February 15, 2010 05:01 AM

February 13, 2010

Connected Action

University of Haifa and IBM Research talk on Social Media Networks, February 24, 2010

After speaking at the Israel Internet Association conference February 22, I will speak at IBM Research in Haifa at an event co-sponsored by the University of Haifa Department of Sociology and Anthropology and the M.A. program in Sociology of Technology.  My hosts are Professor Gustavo Mesch from the university and Dr. Adam Perer from IBM research.

Wednesday, February 24th, 1 pm

IBM Building

Haifa University

Title: Visualizing collections of social media connections: using social network analysis to assess, evaluate and measure social media engagement

Abstract: Social networks are created whenever people interact.  These networks become more visible when interactions take place through social media.  Social networks form when people link, reply, comment, edit, tag, and friend one another.  Sub-populations are formed whenever people mention the same company, products, event, topic, or personality.  Using social network analysis on collections of social media connections reveals important patterns: how are people clustered and grouped, where are the gaps, who plays the roles of bridge, hub, and isolate? In this talk I will display maps of twitter, you tube, flickr, and enterprise email systems and demonstrate several tools that can be used to collect, analyze, map and monitor social media, including the free and open NodeXL (network overview, discovery and exploration) add-in Excel 2007.

Here, for example, is a map of the connections among people who recently mentioned “haifa” in twitter sized by number of followers:

Here are some photos taken during the trip:

Tel Aviv Graffiti Tel Aviv Graffiti No more pay phones in Tel Aviv Olives! in Tel Aviv Crafts in Tel Aviv A place to hang your shmates in Tel Aviv Adam in the Market in Tel Aviv Tel Aviv Street market in Tel Aviv Tel Aviv Bita and Adam in Tel Aviv Tel Aviv Tel Aviv Graffiti in Tel Aviv Graffiti in Tel Aviv Tel Aviv Breakfast (around noon) in Tel Aviv ISOC-IL Web 10 Conference, Airport City, Israel At the ISOC-IL 10 conference Jerusalem, Israel Antenae in Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Marc in Jerusalem, Israel The Western Wall in Jerusalem, Israel Marc in Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Jerusalem, Israel Haifa Art Mall Ceiling Adam Perer at IBM Haifa Adam Perer at IBM Haifa IBM Research Haifa Haifa IBM Research, Haifa Haifa, Israel University of Haifa: Social Science Departments University of Haifa Campus University of Haifa Gustavo Mesch and Marc Smith at the University of Haifa, Department of Sociology Gustavo Mesch at University of Haifa, Department of Sociology Port of Haifa University of Haifa Sociologists (and me)! University of Haifa Sociologists! Ba'hai Temple Haifa, Israel Haifa, Israel Ships in the port of Haifa Views of Haifa Farms in Northern Israel Close to the border with Jordan Yardenit: Baptisms in the river Jordan Yardenit: Baptisms in the river Jordan Haifa and the north of Israel Yardenit Yardenit Yardenit: Catfish in the Jordan River in the North of Israel Yardenit Yardenit Yardenit The North of Israel Lake Tiberius and the North of Israel Lake Tiberius and the North of Israel The North of Israel Lake Tiberius and the North of Israel Gustavo Mesch in the North of Israel The North of Israel The North of Israel The North of Israel Near Nazareth in the North of Israel In the North of Israel Towns in the North of Israel Haifa apartment buildings (what a view!) Haifa and the North of Israel Haifa and the North of Israel Dan Gardens Hotel, Haifa Haifa and the North of Israel Haifa and the north of Israel Haifa and the north of Israel Marc speaks at ISOC-IL 10 Marc speaks at ISOC-IL 10 Marc speaks at ISOC-IL 10 Madeline loses a tooth Jaffa, Israel Food in Jaffa, Israel Adam dines in Jaffa I had the chicken Restaurant in Jaffo Jaffo Jaffo Tel Aviv lights along the coast Restaurant in Jaffo Steps in Jaffo Tel Aviv Tel Aviv Madeline in Tel Aviv Jelly Fish (big one!) in the shores of Tel Aviv Tel Aviv On the beach in Tel Aviv Tel Aviv beach hotels Tel Aviv reflections Tel Aviv hotels Tel Aviv beach Tel Aviv Tel Aviv hotels on the beach Juice bar in Tel Aviv Tel Aviv Skyscrapers Tel Aviv Sykscrapers Tel Aviv Sykscrapers Tel Aviv Sykscrapers Tel Aviv Sykscrapers Street scene Tel Aviv Sail boat on the Mediterranean On the beach in Tel Aviv Marc squints in Mediterranean sun in Tel Aviv On the beach in Tel Aviv Beach hotel in Tel Aviv On the beach Marc's shadow on Mediterranean  sand Japanese erasers make the long flight more fun! Japanese erasers make the long flight more fun! Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 13, 2010 04:07 PM

February 12, 2010

Connected Action

Meeting: Saving Our Present for the Future: Personal Archiving 2010, February 16th at the Internet Archive



I will attend an interesting discussion organized by Jeff Ubois on February 16th at the Internet Archive in San Francisco.

Saving Our Present for the Future: Personal Archiving 2010

From family photographs and personal papers to health and financial information, vital personal records are becoming digital. At the same time, creation and capture of new digital information has become a part of the daily routine for hundreds of millions of people. But what are the long term prospects for this data?

The combination of new capture devices (more than 1 billion camera phones will be sold in 2010) with the move from older forms of media is reshaping both our personal and collective memories. The size and complexity of personal collections growing, these collections are spread across different media (including film and paper!), and the lines between personal and professional, published and unpublished are being redrawn.

Whether these issues are described as personal archiving, lifestreams, personal digital heritage, preserving digital lives, scrapbooking, or managing intellectual estates, they present major challenges for both individuals and institutions: data loss is a nearly universal experience, whether it is due to hardware failure, obsolescence, user error, lack of institutional support, or any one of many other reasons. Some of these losses may not matter; but the early work  of the Nobel prize winners of the 2030s is likely to be digital today, and therefore at risk in ways that previous scientific and literary creations were not. And it isn’t just Nobel winners that matter: the lives of all of us will be preserved in ways not previously possible.

On Tuesday, February 16, the Internet Archive will host a small conference for practitioners in personal digital archiving.

The morning sessions will be devoted to examples of current practice; the afternoon discussion will focus on developing recommendations for institutions and individuals, and on developing a research agenda. Among the questions we would like to discuss:

- What new social norms around preservation, access, and disclosure are emerging?

- How can we cope with the shift from simple (e.g. text) to rich media (e.g. moving images) in personal collections?

- What is the gap between current possibilities for preserving personal collections, and what is actually needed by both individuals and institutions?

- What tools and services are needed to better enable self-archiving?

- What new economic models to support personal archives may be evolve?

- What are the long term rights management issues? Are there unrecognized stakeholders we should begin to account for now?

- Can we better anticipate (and measure) losses of personal material?

- Do libraries, museums, and archives have a new responsibility to collect personal materials?

- What has already failed? Can we generalize about approaches that are likely to fail over time?

- What are the options for cultural heritage institutions — libraries, museums, and archives — that want to preserve the personal collections of citizens and scholars, creators and actors?

- What might be the risks of building archiving systems that are “too good” or overly applied?

For individuals, institutions, investors, entrepreneurs, and funding agencies thinking about how best to address these issues, this meeting will include a variety of examples that may be replicated, and will sharpen the questions (technical, social, economic) around personal archiving.

For further information, please contact Jeff Ubois (jeff@ubois.com).

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 12, 2010 04:00 PM

Intentialicious

Markus

I want to share the abstract of our upcoming paper at WWW’2010 (here is a link to the full paper). In case you are interested in our research and going to WWW in Raleigh this year as well, I’d be happy if you’d get in touch.

C. Körner, D. Benz, A. Hotho, M. Strohmaier, G. Stumme, Stop Thinking, Start Tagging: Tag Semantics Emerge From Collaborative Verbosity, 19th International World Wide Web Conference (WWW2010), Raleigh, NC, USA, April 26-30, ACM, 2010.

Abstract: Recent research provides evidence for the presence of emergent semantics in collaborative tagging systems. While several methods have been proposed, little is known about the factors that influence the evolution of semantic structures in these systems. A natural hypothesis is that the quality of the emergent semantics depends on the pragmatics of tagging: Users with certain usage patterns might contribute more to the resulting semantics than others. In this work, we propose several measures which enable a pragmatic differentiation of taggers by their degree of contribution to emerging semantic structures. We distinguish between categorizers, who typically use a small set of tags as a replacement for hierarchical classification schemes, and describers, who are annotating resources with a wealth of freely associated, descriptive keywords. To study our hypothesis, we apply semantic similarity measures to 64 different partitions of a real-world and large-scale folksonomy containing different ratios of categorizers and describers. Our results not only show that ‘verbose’ taggers are most useful for the emergence  of tag semantics, but also that a subset containing only 40% of the most ‘verbose’ taggers can produce results that match and even outperform the semantic precision obtained from the whole dataset. Moreover, the results suggest that there exists a causal link between the pragmatics of tagging and resulting emergent semantics. This work is relevant for designers and analysts of tagging systems interested (i) in fostering the semantic development of their platforms, (ii) in identifying users introducing “semantic noise”, and (iii) in learning ontologies.

More details can be found in the full paper.

This work is funded in part by the Know-Center and the FWF Research Grant TransAgere. It is the result of a collaboration with the KDE group at University of Kassel and the  University of Würzburg. You might  also want to have a look at a related blog post on the bibsonomy blog.



Filed under: Uncategorized

by Markus Strohmaier at February 12, 2010 01:28 PM

Data Mining: Text Mining, Visualization and Social Media

Bing Maps Continues To Innovate

Take a moment to check out this video which shows some of the content that Blaise presented at TED today. There are two key features: integration and matching of photographs to our human scale imagery, and the integration of the world wide telescope. These features are pretty cool, but ultimately it is the whole idea of the mapping ecosystem that is the real winner.

The Street Shots app, which matches images to our human scale experience has a couple of really nice emergent qualities. The human scale imagery, in some sense, is more useful the more objective it is - matching images can really bring a place alive by capturing a human moment or event.

Flickr1

Secondly, when someone has uploaded an historical image, once can experience a location with a view to a different age. Here is a picture of Vancouver from 1890.

Flickr2 

Bing maps - the ecology - is only just getting going! 

 

by Matthew Hurst at February 12, 2010 04:56 AM

February 11, 2010

Complexity and Social Networks Blog

More (Steve) Jobs, Jobs, Jobs, Jobs (and Wozniaks...)

In a relatively recent New York Times OP-ED, Thomas Friedman (http://www.nytimes.com/2010/01/24/opinion/24friedman.html) argues that: "Obama should make the centerpiece of his presidency mobilizing a million new start-up companies [...]." Friedman then argues that if: "[y]ou want more good jobs, spawn more Steve Jobs."

This emphasis on innovation and entrepreneurship is well placed. A substantial body of research going back to Adam Smith, Max Weber, and Joseph Schumpeter has demonstrated how innovation and/or entrepreneurship are associated with prevailing social and economic conditions. However, Friedman's emphasis on "spawning" more Steve Jobs is somewhat misplaced. Rater, as former Secretary of Labor Robert Reich noted in a 1987 Harvard Business Review article, it is often the entrepreneurial team that is the hero. Indeed, even Apple was initially founded by a team of entrepreneurs: Steve Jobs, Steve Wozniak, and Ronald Wayne.

More generally, research generally finds that entrepreneurial teams tend to outperform solo ventures. And research suggests that including "weak ties" or social connections that can access distinct factor or knowledge markets may be key to novel (re)combination that is the essence of "Schumpeterian entrepreneurship" as novel combination. However, research by Ruef, Aldrich, and Carter (which appeared in the American Sociological Review) finds that very few founding teams include such ties. This implies that at their founding most ventures (in a representative sample of would-be US entrepreneurs) do not have the "social DNA" that is best suited to creating truly innovative products or services.

This finding presents unique challenges for those creating public policy intended to foster entrepreneurship. (It should be noted that some researchers believe that entrepreneurship implies innovation whereas others do not.) While tax policy may be important for entrepreneurship generally defined, serious consideration should also be devoted to issues concerning intellectual property and structural features--including social networks--that can be harnessed to create the conditions for innovation.

February 11, 2010 01:36 AM

February 09, 2010

Connected Action

Sparklines guide Dynamic Filters for social networks in NodeXL v.1.0.1.111

Time for another NodeXL update: sparklines!  Sparklines are a nifty and compact way of displaying a line chart in a small area.

Setting dynamic filters in NodeXL has been somewhat like rummaging around in the dark: without a way to see the distribution beneath a filter the user only knows the max and min values, not where the bulk of the observed data is located.  This version of NodeXL (1.0.1.111) features an improvement to the Dynamic Filters feature used to limit the nodes and edges displayed in the network visualization pane.  Earlier versions of Dynamic Filters allowed users to select a range for each attribute associated with the Edges and Vertices worksheet.  In the last version of NodeXLwe added an automated feature for creating those distribution histograms and placing them in a stack on the Overall Metrics worksheet after the user runs the “Graph Metrics” feature.  That is helpful but the worksheet is far from the user when they are setting the ranges within the Dynamic Filters dialog.  Now, the current release adds “sparklines” to the Dynamic Filters dialog box: as you set the upper and lower bounds for any network edge or vertex attribute, you can see how much of the distribution is included and excluded in the display.

This is one of several features added to NodeXL to make it easier for users to explore their networks and find actionable insights.  We have added sparklines in the dynamic filters interface in NodeXL so that you can now see the shape of a value’s distribution as you set the maximum and minimum values to be included in the filter.  Histograms also now appear in the Overall Metrics tab of the NodeXL worksheet where they convey the distribution of the major network attributes in the graph.

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 09, 2010 11:37 PM

Connected Action

Conversations on Innovation, Power and Responsibility with Jeff Ubois

My good friend and colleague Jeff Ubois recently edited and released a volume entitled Conversations on Innovation, Power and Responsibility for the Fondazione Giannino Bassetti. Some of my comments on the topic of innovation from a conversation with Jeff are included in the volume which collects a wide range of thoughts about the nature and consequence of technical change.

Table Of Contents

Foreword

Introduction


About the Question

Related Concepts

Choosing Subjects: Where Does Responsibility Matter Now?

Genetics And Healthcare

Thomas Murray, The Hastings Center

Ignacio Chapela: Drawing a Boundary Around the Lab

Arthur Caplan: Innovation as Politics

David Magnus & Mildred Cho: True Fictions

Nanotechnology

Christine Peterson: Nanotechnology and Enhancement

Lawrence Gasman: Nanomarkets

Robotics And Computing

Ronald Arkin: Embedding Values in Machines

Jeff Jonas: Applying the UN declaration of human rights

Marc Smith: Invention, mitigation, accounting and externalities

Mikko Ahonen: Open Innovation … and Radiation Safety

Design

Roberto Verganti: Varieties of Design Innovation

Michael Twidale: IRBs, Design, Empowerment,

Accountability, Sustainability

My comments from the volume are after the fold…


Marc Smith: Invention, Mitigation, Accounting and externalities

If responsibility is about effects, then systems of measurement and observation are key to any understanding of responsible innovation.

Dr. Marc Smith (http://www.connectedaction.net) is formerly a Senior Research Sociologist leading the Community Technologies Group at Microsoft Research in Redmond, Washington. His group focused on computer-mediated collective action, and studied and designed enhancements for social cyberspaces.  Smith now leads the Connected Action Consulting Group in Silicon Valley. In particular, he is interested in the emergence of social organizations like communities in online conversation and annotation environments. The goal is to identify the resources groups need in order to cooperate productively.

Smith is the co-editor of Communities in Cyberspace, which explores identity, social order and control, community structures, dynamics, and collective action in cyberspace, and has developed software called Netscan that measures and maps social spaces in the Internet, starting with the Usenet. A related effort, Project AURA, allows users to associate conversations (and more) with physical objects using mobile wireless devices and web services.

Responsibility in innovation often comes after the fact. “The history of the technology is seize new power—then mitigate, mitigate, mitigate all the pathologies,” Smith says. “How long was it from the Model T Ford to safety belts?”

Responsibility can be improved through measurement of effects. “May I suggest that your best method for mitigation is documentation of negative externalities?” Smith says. “You can ‘govern’ innovation when the language and the data to document negative consequences are more available, freely available, easily used. Then you have a technology regulation and negative externality problem, and that I think is one that is more tractable.”

Part of that is stakeholder identification. Smith argues that it is easy to identify stakeholders, at least after the fact, because, “They’re at the top of the legal documents that are served to you. That’s how you identify the stakeholders, the ones that actually get the job done, make themselves known, tell you that you’re creating negative externalities for them, and insist that you [provide] remedy.”

But it’s also important to recognize the initial position of most innovators. “Innovation typically comes from the people who are most squeezed out of the sweet solution space,” Smith says. “You don’t innovate unless you have to… Innovation is the behavior, I think, of marginal actors in an ecological landscape.”

Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 09, 2010 08:50 PM

February 08, 2010

Data Mining: Text Mining, Visualization and Social Media

Visualizing Vultures

Driving back from the slopes on Saturday we got caught in a jam due to an accident on the 520 bridge over Lake Washington. Looking at the traffic we can see that while the crash was on the westbound lane (the top of the map), the impact on the flow of traffic on the eastbound lane due to rubbernecking was equally severe. For the record, we couldn’t find any tweets during our 30 mins in slow traffic detailing what had happened, but traditional sources and access to web cams did help.

image

by Matthew Hurst at February 08, 2010 04:34 PM

February 05, 2010

Complexity and Social Networks Blog

Our Homogeneous Social Networks

This past week I was at a workshop held at GDI on "The Social Data Revolution" with a number of executives from industry as well as entrepreneurs from around the world. The group had been explicitly chosen to gather a unique group of people to discuss and formulate key issues around the explosion of social data from online sources as well as data traces from phones and other sensors.

While the workshop was very engaging and had some interesting discussion, for me what was one of the most fascinating things was actually a discussion I had with a few of the participants over dinner.

I brought up the point that while on the face of it this group of people at the workshop seemed to be fairly diverse, with people from Asia, the US, and various countries around Europe participating, we were actually part of a very closed network and coming at the issues of the workshop from very similar perspectives. Most of us had graduated from top universities and had strong connections to academia (particularly around social media for this group), and most of us were heavily involved in the technology sector. Of the invited speakers, about of half us knew each other, and naturally the executives from the participating companies knew the other executives in their company.

This begs the question exactly how many unique perspectives were actually being brought to the table. We can think about this as sort of a more unconstrained version of groupthink, and one sharpened by the fairly flawed assumption that respected academics and executives are smarter than everyone else.

When bringing up this point at dinner, there was wide agreement that this was an important problem, and many of us have found ways to break out of these closed off communities. For example every week I take a Brazilian Ju Jitsu class where I interact with many people who come from a completely different part of society. There are electricians, construction workers, immigrants, as well as graduate students from local universities and biotech researchers.

Then a German researcher at the table talked about how they actually had a public bus driver's license, and once a month they would actually drive a public bus around their city to get that interaction with other groups of people. This prompted a long stare from a Swiss academic at the table, who murmured "I also drive a public bus a few times a year."

This was so incredible that everyone at the table burst out laughing. We were such a homogeneous group that we even mirror the obscure ways that we try to break out of this group! It's not like in Europe it's common practice for people to have bus driver's licenses on the side. This just really drove home for me the importance of branching out from your core network in many facets of your life, since we're actually so like the people that we know that we will probably attempt to branch out in very similar ways.

Of course now I wonder if all of the bus drivers on my ride to work actually moonlight as academics...

February 05, 2010 02:03 PM

February 04, 2010

Connected Action

NodeXL update: v.1.0.110 – New histograms of network metrics on Overall Metrics worksheet

In the most recent prior release of NodeXL we added new metrics that describe networks in terms of their number of components and the length of paths in those networks.  In this release we automate creation of histograms of network metrics.  It is useful to see the distribution of attributes like in-degree or betweenness to get a feel for the nature of a network.  Building a histogram in Excel is easy, but building seven (one for each of the metrics we create: degree, in-degree, out-degree, betweenness, closeness, eigenvector centrality, and clustering coefficient) is a chore.  Doing this repeatedly for several networks is too much work!  Now, when you calculate metrics in NodeXL we will create these charts for you and place them on the Overall metrics worksheet.

We will add axis markings and titles soon, making these charts ready to use in a variety of network reports.  These histograms will also appear in the Dynamic Filters dialog to guide users as they select segments of the distribution to include or filter out of the displayed network graph.

Other updates:

1.0.1.110 (2010-02-03)

  • The Overall Metrics worksheet now includes more information about the degree, in-degree, out-degree, betweenness centrality, closeness centrality, eigenvector centrality, and clustering coefficient metrics when those metrics are computed. The additional information includes the minimum, maximum, average, and median metric values, and a histogram showing the metric value distribution.
  • The “Convert Old Workbook” item on the NodeXL, Data, Import menu in the Ribbon is now called “Import from NodeXL Workbook Created on Another Computer.” This menu item can be used to work around the following problem: NodeXL workbooks created on a 64-bit Windows computer cannot be opened directly in Excel on a 32-bit Windows computer, and vice-versa. (If you attempt to do so, you will get an error message whose details include “could not find a part of the path.”)
  • A Clear All Worksheet Columns Now button has been added to the Autofill Columns dialog box (NodeXL, Visual Properties, Autofill). Also, you can now clear an individual worksheet column by clicking a button in the dialog box’s Options column.
  • Bug fix: On large-font machines, the buttons at the bottom of the Autofill Columns dialog box didn’t fit within the dialog box.
  • Bug fix: In some circumstances, vertices were drawn below the bottom of the graph pane and were impossible to see. One such circumstance was when the selection was exported to a new workbook (NodeXL, Data, Export, Selection to New NodeXL Workbook). The graph pane in the new workbook acted as if it were taller than its real height, leading to vertices dropping off the bottom.
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at February 04, 2010 10:42 PM

February 03, 2010

Data Mining: Text Mining, Visualization and Social Media

Tree Map Visualization of Budget

The New York Times has an interesting visualization of the budget proposal. While this gets off to a good start, it doesn’t really deliver due to the lack of true zooming. When you zoom into a portion of the image, the labels detailing the new context are missing.

image

by Matthew Hurst at February 03, 2010 03:12 PM

UMBC Ebiquity

Kaggle aims to host data-driven machine learning competitions

Kaggle is a site for data-related competitions in machine learning, statistics and econometrics. Companies, researchers, government and other organizations will be able to post their modeling problems and invite researchers to compete to produce the best solutions. The Kaggle demo site currently has three example competitions to illustrate how it will work and expects to host the first real one in March. Kaggle’s competition hosting service will be free, but the site says that it plans to “offer paid-for services in addition to its free competition hosting.”

by Tim Finin at February 03, 2010 11:36 AM

Data Mining: Text Mining, Visualization and Social Media

Search and Social Media 2010

Tomorrow I will be attending Search and Social Media (at WSDM) - and am very much looking forward to it. While thinking about the workshop I realised that the title can lead to endless debates around the definition of 'search' and 'social media', so I think it makes a lot of sense to break things down by aspects of the social media space. In particular: search and real time content, search and community, search and influence, authority and popularity, search and geolocated content, search and social networks, etc.

When breaking things down in this manner, we can also reflect on the notion of 'search'. There are many interactions with social media that currently don't follow the traditional search dialogue. TopicFire and Techmeme, for example, are more focused on real time analytics than on search as a primary mode of interaction. Similarly, Bing, Google, Yelp and many others provide a 'what's nearby' interaction which uses context to parameterize content in the spatial dimension, much as TopicFire and Techmeme use time (what's nearby versus what's rightnow).

by Matthew Hurst at February 03, 2010 01:17 AM

January 31, 2010

Data Mining: Text Mining, Visualization and Social Media

Social Psychologists in Las Vegas

I've just returned from a brief trip to Las Vegas where I had been invited by Sam Gosling and Kate Niederhoffer to participate in a panel at the annual meeting of the Society for Personality and Social Psychology. I was very happy to participate with Tom Lento from Facebook and Winter Mason from Yahoo Research. The panel topic was Forging connections between Social Media and Social/Personality Psychology and was intended to help bridge the gap between researchers in the social-psych field and those of us in industry working with social and personal data.

Perhaps the most salient point that came out of the discussion was the difference between current social-psych methods and the data scale of the online social world. To help understand this new world, and for social-psych researchers to make the transition, an ability to code and to deal with large amounts of data is required. There was genuine interest in this (provoking questions about the specifics we had in mind, such as working with scripting languages) and I believe a real interest in the opportunity that lies in this data rich online world.

In terms of facilitating interactions, Winter probed the audience regarding knowledge of key conferences for web data analytics, etc. such as WWW and ICWSM. I consider the later to be a perfect forum for this discussion to continue, and with Sam's help I think this can really happen.

The conference itself is run in a manner quite different to that which computer scientists are used to. No papers are submitted and acceptance (which runs high) is based on short abstracts (generally paragraphs). I managed to see one of the poster sessions and was fascinated by the topics (extremely focused accounts of certain behaviours and traits) and the methodologies.

I feel that there is a lot to learn and the the ultimate winners in the social space will be those companies farsighted enough to really embrace this field and who figure out how to integrate such insights about users into their products.

by Matthew Hurst at January 31, 2010 06:57 AM

January 30, 2010

UMBC Ebiquity

UMBC global game jam live video feed

Via Marc Olano: The Global Game Jam is into its second day at UMBC with 41 registered participants working on seven games. Keep up from home with our live video feed and games list.

by Tim Finin at January 30, 2010 11:45 PM

Connected Action

Book in progress: “Analyzing Social Media Networks with NodeXL: Insights from a Connected World”

2009 - November - Morgan Kaufmann Logo

Along with Professors Ben Shneiderman (Computer Science/Human Computer Interaction Lab) and Derek Hansen (College of Information Studies) from the University of Maryland I am writing and editing a book about analyzing the social media networks that form whenever people link or reply to one another, favorite, rate, read, or edit data about other people or their objects.  Social media networks can be analyzed using the methods of social network analysis, the mathematical application of graph and network theory to the social sciences.  Using social network analysis collections of connections can be analyzed and compared to identify key people and groups and measure changes over time and following interventions.

2009 - December - Elsevier Logo

I am pleased to announce that we have signed with Elsevier/ Morgan Kaufmann to produce a book: Analyzing Social Media Networks with NodeXL: Insights from a Connected World for a Summer 2010 delivery!

2009 - October - NodeXL Facebook Network Marc Smith

A map of the relationships among the population of people who all tweet a particular keyword can lead to the discovery of the key hubs and influential people in the network.  A social network analysis of reply patterns in email collections displays clusters around projects and highlights key people and relationships.  Visualizing the connections among your friends in Facebook can reveal the various life stages and communities in which you have participated.  When you chart the links between videos and users in YouTube content with interesting network properties is exposed based on well connected content creators and influential commentators.  A graph of  the individual connections between flickr users illustrates the emergent formation of groups around social networks, locations, and topics.

These kinds of social media network data collection, scrubbing, analysis, and display tasks have historically required a remarkable collection of tools and skills.  A great example of the variety of tools that can be used in concert to extract, analyze and display social media networks can be found on Drew Conway’s blog.  This is a powerful set of tools for those who can master the demands of python and API interfaces.  In contrast, the approach the NodeXL project has taken is to provide an end-user GUI application environment built within the framework of Excel 2007 for performing basic social media network analysis and visualization for non-programmers.  The python path is certainly the high road for experts and those with demanding volumes or esoteric data requirements.  But for the non-coding user, NodeXL may be one of the easiest ways to both manipulate network graphs and get graphs from a variety of social media sources.

There are already some materials available to guide new users interested in learning about NodeXL, social networks, and social media.  A video tutorial for NodeXL demonstrates the extraction of the network of people in twitter who mentioned the term “digg”.  A tutorial guide to NodeXL offers a step by step guide to features in the NodeXL toolkit (with supporting data sets).  But the book will capture the theory, history, domain and process of social media network analysis in a single volume.

The volume contains a broad introduction to social media, social networks and the operation of the NodeXL application and then features a series of  chapters from leading researchers that focus on a particular social media system (email, Facebook, Twitter, YouTube, flickr, Wikis, the WWW hyperlink network) and the networks each contains (replies, friends, follows, subscribes, comments, favorites, edits, links, etc).   A final chapter outlines a programmer’s view of the NodeXL code, in contrast to the code-free approach of the remainder of the book.

Our intended audience is the mostly non-programming population that is interested in social media and the techniques of social network analysis.  The volume is largely in the form of a how-to guide that readers can follow and replicate all examples.  Using your own free and open copy of NodeXL, you will be able to use sample data sets or create similar live queries that map relationships in social media systems.

We have an ambitious production schedule so the book may be on a book store shelf or online retailer search result in summer 2010.

Table of contents…

1. Introduction
2. Social media
3. Social Network Analysis
4. Hands on SNA: Learning by doing – Network Layout
5. Hands on SNA: Learning by doing – Network Metrics
6. Hands on SNA: Learning by doing – Network Filtering
7. Hands on SNA: Learning by doing – Network Clustering
8. Email
9. Lists, message boards, and communities
10. Twitter: Scott Golder and Vladimir Barash,Cornell University
11. Facebook: Bernie Hogan, Oxford Internet Institute
12. WWW: Robert Ackland, Australian National University
13. flickr: Eduarda Mendes-Rodriguez and Natasa Milic-Frayling, University of Porto and Microsoft Research, Cambridge
14. YouTube: Jen Golbeck and Dana Rotman, University of Maryland
15. Wikipedia: Ted Welser, Patrick Underwood, Dan Cosley, Derek Hansen, and Laura Black, Ohio University, Cornell University, and University of Maryland
16. NodeXL for programmers: Tony Capone, Microsoft Research
Share and Enjoy: Digg del.icio.us Facebook Mixx Google Bookmarks StumbleUpon NewsVine Reddit Slashdot FriendFeed LinkedIn MSN Reporter Netvibes Ping.fm Technorati



by Marc Smith at January 30, 2010 03:00 PM