In an oldish post on my blog, I got the comment pasted in below. I have derezzed the details so as not to provide the bot incentive to chat with me at tumblr. But damn, the tone was almost perfect to make me click “approve” without looking at the commenter URL. Spambots are getting better and better. I want to believe that nerds are arguing about the semantic web with their materials science trained boyfriends, dammit. OK, I'll come clean: this reminds me of an embarrassingly recent conversation with my materials science-trained boyfriend. Commenter name: supratall Commenter email address: Commenter URL: Commenter IP address: 78.185.131.XXX
I was going to call this “Citation in a ‘like’ button world” but I’ve heard that more serious titles get more attention. Ahem. Anyway, I was in a session at the Linked Open Data - Libraries and Museums unconference (LOD-LAM) yesterday on citation. It was a wide-ranging session, as unconferences tend to be. But a few things emerged for me that had previously been unclear. First, the very idea of why citations are useful is shifting. It used to be that the main reasons a citation existed were simple - I would cite you to show that I used your work (and that I was a responsible member of the Academic-Industrial Complex). You would compile a list of citations to your work to show you were an Important Person in the Academic-Industrial Complex, helping advance your tenure there. But now there’s new reasons. One point was to have a list of the 30 most downloaded papers on protein kinases, or most cited, so that one could complete one’s library (iTunesish). Another was to compile a course catalog out of citations. I like both, but there’s a lot of possibility for groupthink there (entrenching the paradigm). Second, and the point that I both brought up and couldn’t set down, was the impact of the “like” button. It’s not just Facebook - Google is trying to wire the “like” feature into the very fabric of the web now. And it goes beyond liking. There’s a number of ways we can make a weak, but explicit, connection between ourselves and some piece of content. Thus, we can retweet something either automatically or after editing. We can embed a video (under Creative Commons on YouTube even). Going oldschool, we can link to something. But liking something is really different than citing something. Citing used to be imbued with rich meaning, not weak meaning. Citiation counts are important enough to be the foundation of grantmaking and tenure. There’s efforts to create rich-meaning citations for data. But we have zero idea what counts of weak ties mean. There’s an ontology of citation types that one can use to make this richer. And there’s definitely going to be some form of mix that judges an autoretweet as one level of citation, a retweet with comment another, and the moving impact of a tweet through time as another. How many likes, how many dislikes, how many mentions in blog posts (which in turn get liked or disliked). And on and on. But I’m just unable to see where and how the tenure and granting process, which is, to put it kindly, petrified in amber. I don’t see how the like button culture penetrates into science. Maybe it doesn’t - maybe it grows up next to science, and then swallows it whole, like iTunes did to the music industry.
The Radiators ended their 33-year run last night in New Orleans. The Rads were part of a season in my life that changed me forever, part of a transition that turned me from a first-semester molecular biology major at Tulane into a student of philosophy and languages. Their end as a touring outfit brings a bit of closure to that season even though I left the majority of that life behind me a long time ago. My first shows were in September of 1990 at Tipitina’s in New Orleans. I’d heard their first CD as a kid in Knoxville in the late 80s (used to shoot baskets in my driveway with “Law of the Fish” on the boombox with one of my friends) and as soon as I got to campus I saw posters for a three-night run at the famous club. Since the drinking age in those days in New Orleans was 18, but in reality was “if you are tall enough to order across the bar”, my age, and the fact that I looked about 12, couldn’t stop me. I didn’t really have any friends yet. I was a skinny kid with big glasses from east Tennessee, a bit of a geek with a giant box of sci-fi books, so I went by myself. It was revelatory - I’d never seen musicians that in sync with one another in a rock and roll context, playing like jazz musicians, extending songs for solo after solo. They didn’t repeat a song in three nights. Their fans took me in. I got adopted by a few of the older local Fishheads (because the Rads were about being funky, and there is nothing funkier than a day-old fish head) who in turn showed me more places to see more local music, like the Maple Leaf and Jimmy’s and Muddy Waters (where I later became a short-order cook), but also jazz places like Snug Harbor. Who served as my network, before the Web, to find out about private solo piano performances by Ellis Marsalis, who turned me on to Earl King, to Henry Butler, to Professor Longhair, who told me to listen to WWOZ, who taught me how to survive Jazzfest during final exams. They led me in a way to my friends in Juice. They were a movement. And a private one. Very few outside New Orleans knew, or cared. We didn’t have email yet, at least, not unless you walked to the computer lab. By the time I was a senior, all my friends (I did make some eventually) were hooked. Rads runs at Tipitina’s were an excuse to spend a weekend together, suffering through the mornings as a group, anticipating the evenings, and on the best ones, watching football in the middle. I saw the Rads in 20 states, all told. Saw ‘em in London, once, even. Spent my New Years with them a couple of times. Spent my Halloweens with them. Sweated out summer nights with them. Saw a moonrise on Cape Cod, dancing in the sand, speakers mounted outside at a clam shack. They’ve been part of the soundtrack of my life for more than 20 years now. They taught me about sharing music, too. Taping shows was encouraged from the very beginning, and their space at the Internet Live Music Archive is full of shows, going back to the beginning. They are part of a thread that connects those days to my days at Creative Commons, the idea that giving something away for free can bind a group together and, in the end, create something bigger than could have existed if every note were monetized. It was about being there when liftoff happened, when a show made The Leap. I didn’t see a great Rads show the last few years. The last time they really, really blew the roof a show I was at was the night before new years in 2001, a private show at a tiny bar in Baltimore. They played great shows after that, I just didn’t make it. My life’s changed. I have a kid now, and my vacations the last few years have trended towards visiting my family at home and abroad, not towards seeing live music. I was so tired from caring for the newborn that I left my tickets to the last Rads show in San Francisco untorn at will call. But there’s an oil painting of a Rads show at Tipitina’s by Frenchy, and my son spent his first few days in a crib underneath it, looking up at it, and I played him an acoustic version of Viva Las Vegas they playeda long time ago in Rhode Island while he was still in the hospital after being born. They are gone, but not forgotten, and in no small part because they shared what they had. So long, boys, and thanks for all the fish heads.
Got an interesting email today and have decided to post my reply here as well, as I get this question a lot. I’ve expanded and linked the email out. We’ve been discussing the issue of upstream licensing, especially from universities. More specifically, if we received an offer for a patent that had not yet been translated into a product, would it be practically (and legally) feasible to draft a license agreement that would ensure access to any products that came from the patent? To the first point, it is certainly possible to draft a patent license that requests access to products that ensue from a patent. This is done in the “closed” context all the time, and is known as a “reach-through” - it’s also regularly included in aggressive materials transfer agreements (MTAs). It is a fairly standard way to extract rents from downstream products. Here’s a nice analysis of the ways that reach-throughs get examined. We spent several years on both MTAs and patent licenses at Science Commons, and eventually reached the conclusion that reach-through terms were not something that would work particularly well for “open” patent licenses, whether in attempting to simply retain rights to improvements to the patented technology (which would appear to be the simple case) or to products ensuing from the patented technology. The reasons have little to do with whether or not one could draft such a license, but with whether or not such a license would actually be effective in real life. There are several reasons for this. The first is that lots of patents bear on most products, from lots of sources. It is almost never that case that one technology leads directly to one product. When there is a lot of money on the line, everyone is happy to negotiate, and it is not unusual to see anywhere from 15-20% of total revenues paid back in license fees in biotech products. But if one of those patents were to include a patent-left term, the odds are strong that it would never be part of the process or licensed, because it removes the economic incentive (or perceived economic incentive as the case may be) to include the technology in downstream product creation. We have an example of patent-left in action in the BIOS patent license created at CAMBIA for agricultural biotechnology. It was a set of fundamental gene transfer technologies, patented in order to be made open, and released under a license that granted licensees the right to make and patent products without prejudice or reach through, and only asked that if licensees patented improvements to the underlying technology that those improvements be made available back to all the other licensees. While many signed the license to use the technology, no improvements were licensed back. And that’s not even *product* reach through, where the real money is. PIPRA is an example of something that is working in biotechnology, which is a pool that allocates resources and works to create greater access through transaction cost reduction rather than through the creation of reach-through or patent-left terms. Another reason is that patents are not *enabling* rights in the same way that copyrights are. Although copyrights grant the right to prevent others from making copies, the creator who owns the rights owns all the rights needed to license the rights fully to another person, including the right to require them to share alike (copyleft). But in a patent context, especially in a medicines context, the odds are that the right to produce a product is dependent on dozens of patents and materials transfer agreements. This means that even if you could get a patent on a fundamental technology and convince the owner to license it openly, the production of the product can be blocked by someone outside the “open” transactional circle. Think in terms of chairs. I have a patent on chairs generally, and four-legged chairs specifically. You patent a new kind of chair - a three legged one - and license it openly under patent-left. Your patent keeps me from making a three legged chair unless I sign your license, but my patent lets me keep you, and anyone who licenses your patent, from making any chairs at all unless you license my patent on my terms. (this example is owed to Richard Jefferson from CAMBIA, who has forgotten more about open patents than most will ever know). The third reason is basic economics, which is that unlike copyrights, patents cost a lot to acquire and thus tend to be acquired by those with little philosophical or economic incentive to make those patents available openly. Copyrights are free and instant, and the advent of digital networks and cheap PCs and cheap digital cameras mean that it’s fast and easy to share stuff, and that we don’t need a large percentage of total content creators sharing in order to have a vast commons. Patents are far more scarce, and far more pricey, and far less likely to be individually owned. All of this taken together is why we worked on systems that reduced transaction cost, increased transparency, facilitated e-commerce-like effects, and in general encouraged people to think about patent rights as something that they should license non exclusively, but did not attempt to include reach-throughs or patent-left. The folks at CAMBIA have a new project called the Initiative for Open Innovation which is one of the more cutting edge projects in the open patent space. The idea here is that being able to see the forest of patents is job one, moreso than licensing. Their Patent Lens project is going to change a lot of things. Go look around. It’s worth the time.
I’ve been re-reading Misha Angrist’s book Here Is A Human Being: At The Dawn Of Personal Genomics (please, please go read it if you haven’t) as I dig into the ways that our genomic data can be correlated to our clinical data. And it resonates deeply for me, especially after an experience I had last week. I was at a conference, one held under the Chatham House Rule, so I won’t go into details on what other people said. But we were on the topic of how innovation might bring down health care costs. We got into a very abstract discussion of personal, direct-to-consumer testing for genomics, but it was so high level as to be worthless. So I jumped in with both feet in my mouth, and told everyone I had a marker for prostate cancer that increases my odds, that I’d found it via 23andme, and that I was gathering information about whether or not to start getting a PSA testearlier than the normal age. I got savaged by the doctors in the room. My initial reaction was “how dare you be paternal to me - I’m EMPOWERED!” and all that. But once we got past the emotional reaction, it hit me that the doctors were advising me that getting into the medical system would be more likely to make me sicker than staying out of it: I get a false positive on a PSA test, I get a biopsy, I get a complication, I get a radical prostate removal order, and my cancer might well have been non-threatening in the first place. Pretty far from paternal. Downright scary about the power of the reimbursement workflow that doctors operate under. Which brings me back to Misha. In his book, he quotes Amy McGuire (see representative toll access publication on consent) as defining the idea of consent in the modern age of primitive personal genomics as a process of consenting to uncertainty. We don’t know what will happen if we sequence ourselves, what decisions are best, what research is enabled, what harms we create. We’re uncertain. But that’s not a reason to bail. We need to get comfortable with consenting to uncertainty. Because the existing alternative - that workflow I mentioned above - sucks a lot more than being uncertain.
There’s anotice of a proposed rulemaking from the US Department of Health and Human Services that deals with the way that HIPAA “common rule” protections for individual human data conflict with the growing reality of networks and clinical studies. I can’t summarize it any better than Dan Vorhaus: Level of review does not match level of risk, particularly for non-invasive research; Multi-site IRB review is inefficient and ineffective (nobody takes responsibility); The informed consent process is broken and serves to protect institutions, not individuals; Increasing use of genetic information changes the nature of risks from physical to informational, privacy-based (and HIPAA is not adequate protection); There is no effective mechanism in place to determine whether the current system is/is not effective at protecting individuals; The current system does not reach all individuals, particularly those in research which is not federally funded and thus (generally) not subject to the Common Rule; and Overlapping & inconsistent regulatory requirements (HIPAA vs. Common Rule, in particular) make compliance painful, variable and sometimes simply impossible. \tightlist What he said. This seems as good a time as any to tell the world that this is the problem I want to work on now. It seems insane to me that the consent process isn’t something that I can control as an individual, at least as an opt-in. I’m building out a project as part of my involvement at Sage Bionetworks, with help from some awesome people like Dan, to create a system of “portable” informed consent that builds on open consent models like those at the Personal Genome Project, but more modular and untethered from a specific project. If you get consented, that consent will travel with you. More to come soon. This is a project that has been consuming my nights and weekends for a while now, and doesn’t show any signs of stopping. If you’re interested, drop me a line.
Bruce Sterling, as usual, says it best. Remember that Heavy Weather came out in 1994, and then go read it. “You have to wonder now, not what happens when people accept the truth, but how people who deny the truth can maintain the psychosis. All the proper points were fully made years ago. Like, when you’re on the Governor’s staff in Austin right now, and you prayed for rain months ago and none arrived, and the soil is so dry that sewer pipes are cracking in the dessicated, shrinking turf, how do you go home and sleep? I mean, it’s your own home. It’s not like Murdoch’s propaganda waters your lawn.”
It’s been a bit more than seven years since I took a call from Larry Lessig about joining Creative Commons. It was my birthday in mid-August 2004, and I was sitting on a log near the beach at Brewster, a small town in Massachusetts. The signal was cutting in and out, but enough got through for Larry to ask me to come on board, and for me to say yes. It was the best decision I ever made. Since that call, I’ve flown somewhere around 1,200,000 miles, spent about 800 nights on the road, and visited more than 30 countries, giving more than 400 lectures and talks. I’ve spent days with scientists in disciplines ranging from biodiversity to geospatial to chemistry to biology to physics to anthropology to neuroscience and on. I’ve danced to dubstep in Zagreb, lost my passport in Bogota, and been drunk under the table in Helsinki by our amazing international affiliates. I met my wife, Carolina Rossini, through Creative Commons (for this alone, I should buy the world a drink) and we recently had our first child, who may be the first baby to emerge from the movement. CC’s been incredible for me, a challenging job that grew me and opened me and taught me. I’m a better person because of this job. I got to go to work, pretty much every day, loving what I did. And I got to work with the most amazing people, my fellow staff who picked me up in moments of incredible personal trial and kept me going, as well as creating the most demanding intellectual environment I’ve ever known. Now it’s time for me to say yes to something else, and move on from my position as VP of Science at Creative Commons. I’ve launched a project called Consent to Research, which is being supported by the Kauffman Foundation, Sage Bionetworks, Lybba, and a few other organizations. The idea behind CtR is simple: make it easy for people who want to share data about themselves for scientific, medical, and health research to do so. It’s not centered on intellectual property, though it does touch on it. It’s more about privacy, and in particular, about making it possible for people to get informed about what is possible with their data and how beautiful research can emerge if enough genomes, enough biosamples, and enough other kinds of data can be shared and connected. CC is not abandoning the field in science. If anything, the next iteration will have an even greater impact at national and international policy levels. We spent seven years building expertise, building networks (both professional and social), and releasing products that drive science towards an open, networked state. Those seven years are the foundation for the next round of CC science, and don’t let anybody fool you into thinking that CC is out of the open science world. They will (strange to use the third person!) be proving that point with a series of workshops and papers in the coming months. I’m not leaving CC entirely. It means too much to me, and I’ll maintain a role as Senior Advisor so I can be actively involved with the next generation of CC’s scientific program. I’ve accepted a seat on the Board of Directors at iCommons, and I plan to be a project lead of the new CC-US jurisdiction as it emerges. I am therefore an affiliate now, and am looking forward to criticizing HQ for a lack of transparency ;-) Thank you to everyone who made this such a great experience. It’s been a joy.
The advance of Open Access to the scholarly literature is pretty hard to miss at this point. The Directory of Open Access Journals lists more than 7000 titles now, and the percentage of global articles that are OA is now somewhere above 10%. Revenues on OA journals are in the tens of millions of dollars annually (and that’s just combining the numbers we actually know or can extrapolate from BioMed Central and Public Library of Science). So this progress has been noted in some of the finest of the what’s left of the mainstream press recently. The Guardian and the New York Times, among others, have run articles positive to the emergence of OA. Heavens to Betsy, the Internet transforms content profit models, and the press notices it! Someone notify the newspapers, the music industry, and Blockbuster Video. There’ve been complaints about these articles from some in traditional publishing. Seeing these complaints doesn’t trigger sympathy in me, given the brutal attacks and false-front lobbying groups pushed on us by the traditionals. Remember that the strategy of equating peer review and the traditional publishing subscription model was created by a PR consultant named Eric Denzenhall (irony alert, toll access Nature article) and not by, yaknow, scientists or scholars. And though the outcry over the press daring to cover a significant trend is a spectacle itself, there’s a bigger thing to talk about than the outcry, which is what it tells us about those doing the crying. This debate totally misses the point of the transformative shift to Open Access from something that was political to something which is functional - from religion to strategic infrastructure. It reminds me of patterns I’ve seen again and again since I got onto the internet for the first time in the late 1980s. Though it’s easy to forget now, the internet used to be something of a religion, that zealots said would change the world, increase democracy, and create entire new industries. The world yawned, or at best, mocked. The same thing happened with the web. It’s full of cat pages and blink tags, said the content experts. It’s a lousy formatting language, said the formatters. No one will buy things online, said the brick and mortar stores. And there were failures, some spectacular, as new business models that were native to the medium of the network were tried. But a funny thing happened in each of these cases. There was a move from religion to trend, and from trend to infrastructure. And those who sat around attacking the religion angle tended to miss the transitions the worst, whereas those who got in early on the infrastructure got the best of the situation: they got to be part of changing the system entirely, and many of them became extremely wealthy. Even companies, big ones, got in on the shift to the network, the web, open source software. And there’s a reason for that. It’s because the movements began around simple, weak, open, standardized infrastructure. That allowed the world to add complexity where appropriate. To add power when needed. To add enclosure, when needed. And it meant that companies who built that into their business could benefit from the crowd, whereas companies who didn’t had only their own employees to leverage. That’s the transition that’s happening now in open access. It was a movement. Then it became a trend (that’s why the press is writing trend pieces, for those paying attention, not because we suddenly got Denzenhall to work for *us*). But it’s already undergoing the shift to infrastructure. Funders are starting to get that paying for permanent access is smarter than paying, over and over, for subscriptions. Universities are starting to get that asserting distribution rights increases impact. And businesses built on open models are popping up, inside big companies like Springer and Nature Publishing Group as well as in small companies like Mendeley. It’s not about religion on the OA side, or stodginess on the traditional publisher side. It’s about totally missing the transition from movement to trend, and from trend to infrastructure. So don’t waste breath fighting with people on the internet. Keep driving train tracks into the ground, relentlessly. Never stop building infrastructure, never stop using existing standards, never stop creating new businesses and projects that recognize open as infrastructure. That’s how we win. And when the old guard is ready, we should welcome them. There is tremendous knowledge inside the traditional publishing industry that we don’t want to lose. And we don’t win by throwing the baby out with the bathwater. What’s wrong with the old model isn’t wrong because of bad people, or people who don’t know things. What’s wrong with the old model is simply that it’s analog, and we live in a digital world.
Technology has a way of creating unintended consequences. I have been reading Jaron Lanier’s stuff lately (I don’t agree with 90% of it, but it’s interesting and provocative, which is more than I can say for 90% of the stuff I read; I also notice that the digerati who bash him have almost never actually read him) and it seems a key undertone of his work. It’s easy to see in his Edge interview on the “local-global flip”. There, he’s talking about the outcomes of Wal-Mart, of Apple, of Google. He’s negative about them. And in many ways I share some of the negativity. I miss the mom and pop stores of my childhood in East Tennessee. I fear the implications of technology that make people passive consumers. And I am working full time on making data something that people own and control. But he’s somehow lost the good parts of these systems, and there are good parts. They’re where these systems came from. Wal-Mart means not just the destruction of small stores, but the proliferation of warm, cheap clothing and cheap calories. That’s a benefit. Apple means my mom can send email and see her grandson (don’t tell me about other systems - I run Ubuntu - but Apple made it *easy*, and that’s what some people actually need). Google is, well, Google, for better or worse. The negative parts are theunintended long term consequences of the technologies and their implications. The link is for a lovely paper, with the lovely example of the microwave oven, whose inventor did not intend to be part of the long term destruction of the family meal (negative unintended consequence) or of the long term movement to liberate women (positive unintended consequence). Indeed, he was a guy fixing a radar system whonoticed his chocolate had melted, so it’s safe to say he didn’t have much of a social agenda at all. This is all a setup to my real point, which comes back to Open Access, and indeed to free culture generally. We are pursuing what Merton called a purposive social action in advocating for the liberation of scholarly, educational, and cultural content through free licensing (if you haven’t read the paper, do so now - it’s short, it’s beautiful, and it’s important). We are intentionally trying to change a system from closed to open under the belief that this effort will be more natural and native to a networked, digital world, and that the long term outcome will be a net positive to humanity. Part of our job should be to think about the consequences of that, for good and for bad. We tend to think of “undesired” results as negative ones, but Merton’s paper makes the key distinction that not all undesired results are undesirable. I think that’s the fundamental key to understanding innovation, actually. A lot of the arguments against OA focus on the undesired but foreseeable outcomes: business models will have to change, filtering and quality control methods will have to change, some people in power will have less, some new people will have new power. I don’t really give a hoot about those: the internet is here, the king is dead, long live the king. Some of the more nuanced arguments focus on foreseeable and truly negative outcomes: the concentration of wealth among the large publishers who can afford the move to author-pays, the lack of funds to make author-pays work in many disciplines, the inequality of asking authors to pay in the developing world, and so forth. I am more sympathetic to these arguments by far, and we have to address them. If we don’t, our failure to do so will cloud our ability to address the outcomes in the previous paragraph, which only really affect people in power. Lanier’s point about the flip is well taken here - we don’t want Elsevier, Springer, and Nature to concentrate publishing even more fully in their hands. They’ve already got more than half the market, and the local-global flip means we could easily see that skyrocket up, not down, as a consequence of openness. That’s a short term (10 year) view however. I tend to think these things shake themselves out over time. There will be a pre-cambrian explosion of models to address them, and a small number of the models will work, and will then mutate to address the needs. If we know these will be problems we can set up boundary conditions and facilitate the experimentation that needs to happen. Open Journal Systems is the kind of thing that helps here, by way of example. But there are going to be “undesired” effects of this purposive action - as in, effects that weren’t part of our argument for the change, or part of the arguments against it. Donald Rumsfeld famously called these “unknown unknowns”. Some of the unknown unknowns are going to be positive. Some are going to be negative. The beauty of systems that are open at the core is that those who follow us will have the rights to amplify the good ones, and the rights to fix the bad ones. And that’s in the end the point of Open Access, for me, as a purposive social action. It’s to guarantee first that the world has the right to read the literature of scholarship through the network, but the real goal is to make sure that whoever is reading has the knowledge to address the things we screw up, the negative consequences of Wal-Mart and Apple and Google. To fix the unknown unknowns. We have to deal with the foreseeable negative outcomes - especially the concentration of power that Lanier points out so well, which is looming over scholarly publishing like a wave at Mavericks. But we can’t lose sight of the goal in attempting to fix all the negative outcomes that we can predict: since we have a massive set of scientific problems to deal with, if we charge the world $30 an article, we are statistically less likely to have the right brains filled with the right knowledge at the right time to fix the problems we’ve left them. Work back from that, not forward from problems we can already see.
As one of our commenters recently asked, what would have happened “if a big commercial publisher had pioneered the high-volume, low-bar, author-pays mega-journal”? If PLoS ONE was instead Big Company ONE, would it have succeeded and been as adored as it is now by OA advocates, or would it have been seen as a “cynical exploitation of the publish-or-perish climate in academia?” Would publishing traditionalists see it as an exciting new market to serve rather than the end of quality? From The Scholarly Kitchen. A deeply good point, and one worth contemplating. I tend to think that if it were Big Company One, that it would neither have been received very well, nor worked as well as it has. PLOS is a brand built on abundance, not scarcity, with a good association with impact thanks to its loss leader print journals. PLOS One is a natural product for that kind of brand, not to mention a natural one for its audience (the transitions I talked about last week are at best unevenly distributed - there’s a lot of religious fervor left out there). It also had the benefit of good timing and great technical implementation, and good staff. All of those things go into making a new venture a success. Don’t forget that YouTube worked not just because of the team and founders, but the timing. Flash hit maturity. Broadband penetration went mainstream. Video got inserted into everything that had a camera. Imagine being the last video sharing platform to die before that trio of confluences lined up, and watching YouTube skyrocket to the heavens from the ground. Depressing. But if your brand is built on exclusivity and scarcity, it’s going to be much harder to execute the PLoS One model. You won’t have the faithful lined up, because they’re burned by years of being attacked by you (thus the perception of cynicism is probable). And you won’t have a team that understands abundance, that designs for it. And you’ll likely have a management structure that will be looking at it with skepticism, not with hope.
I am in Abu Dhabi for the World Economic Forum’s annual Summit on the Global Agenda. This is my second year as a member of the Global Agenda Council that’s looking at consumer goods (think: things that come in boxes that you buy in person) and how to create a culture of consuming those goods in a way that doesn’t destroy the planet and society along the way. It’s not a bunch of hippies. Indeed, as far from that as is possible. Big companies, big thinkers, big themes. And the irony of doing this here, where a liter of water takes four liters of diesel to produce, isn’t lost on anyone (if it were, having the middle in the middle of a formula one race track would pound it in). Anyway. I had two ideas today. One is that if we can trade emissions at a corporate level, we should be able to trade consumption. So if we can track consumption of goods, and the sustainability of those goods, we have the rudiments of a market for consumption. So why not offer (wealthy, western, northern) people the chance to pay extra for an offset for their iPad like they do with their plane ticket? My other idea was based on the ever present loyalty cards for grocery stores, pharmacies, and even cupcake shops in the US. You give away your personal data in return for lower prices (although I often use the algorithm of [local area code of store] + 867-5309). Why not something similar for sustainable goods? Either you pay the full price, or you pony up your data to save the world. Also you get a sticker to put on your computer to show how much better you are than other people - and that’s big, because being proud of being a sustainable consumer is currently, and unfortunately, densely tied to being one. Doubt either of these works in the real world. Don’t know enough economics to judge by dead reckoning. But it’s fun to poke a stick in the hive and see what comes buzzing out.
Check out the new White House Requests for Information on Digital Data and Public Access. We need to rally hundreds of comments on these RFIs if we don’t want to see the progress of open approaches to data beaten back by loud, well funded publishers. If you profess to be an Open Access person and you choose not to file a comment, I’ll come throwing tomatoes at your next talk. More on this - much more - to come.
I got a little worn out keeping up with the explosion of open science email over the holidays, especially the traffic on the open science list run by the OKF. For the most part I deleted without reading under the belief that time with the family is more important than obsessive attention to email during the holiday season. But I have to respond to one point from Peter Murray-Rust, with whom I often agree - and often disagree. Peter writes in his blog, on the idea that Open Science means loss of confidentiality:
This is my response to the White House RFI on Public Access. >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> John Wilbanks January 10, 2012 Response to Request for Information: Public Access to Peer-Reviewed Scholarly Publications Resulting From Federally Funded Research There are two kinds of markets for the access and analysis of peer reviewed publications emerging from federally funded research. One is the “mental” market, or the size of the readership base. This current market for the results of scientific research is limited, artificially, to those researchers who sit inside wealthy institutions whose libraries can afford to subscribe to the majority of scientific journals. This excludes researchers at many state educational systems, community colleges, middle and high schools, state and local employees, the American taxpayer, and the American entrepreneur. By implementing a robust public access policy for federally funded research outputs, each of these groups will have access to the literature and, if the policy is crafted correctly, the right to begin creating new knowledge and experimenting with new businesses atop it. This leads to the second kind of market – the economic one. At the moment, there is at best a sputtering startup culture built atop the scholarly literature, with a few text-mining companies here and there, mainly in the life sciences. A small number of publishing houses exploit their gatekeeper function to impose prices on elemental services like abstracting that in the consumer world would cause revolt, and the American venture capital industry invests instead in social media. The lack of robust public access to the literature – and the relentless focus on asserting and controlling copyright – means that economically it remains a content industry and not a knowledge industry. We will not see meaningful job creation in secondary markets as long as the primary secondary use of digital literature is informal file transfer via Twitter (using the #icanhaspdf hashtag). The scientific enterprise would clearly be better served through some creative destruction. We have replicated the analog production and distribution system digitally, realizing few of the cost benefits, few of the speed benefits, and none of the innovation benefits of the transition. iTunes came out more than a decade ago. Netflix, more than 15 years ago. Content industries are disrupted by technology, and should respond with innovation, creating new jobs that are durable against outsourcing. Yet we have seen none of this in the scholarly publishing industry, which given its enviable almost-monopoly on the outputs, has little incentive in the absence of policy to make the admittedly difficult transition to a knowledge industry. The intellectual property interests of the stakeholders must be aligned with the scientific goals of the government and taxpayers, which is easily done through the use of open copyright licenses such as those provided by Creative Commons. Open copyright licenses protect the rights of the author or legal copyright owner while providing for conditional access to the public – for example, copying and republishing may be allowed, even for commercial purposes, but attribution back to the author and original journal, including a link to a free copy of the paper, would be required and if not present the full power of copyright remedy could be brought to bear on the violator. Open copyright licenses can also be phased in alongside an embargo in a way that both protects the economic interests of publishers and the long term public interest in access to research literature. For example, during an embargo period, no open license might be used, switching to a license like Creative Commons Attribution-Non-Commercial for a second intermediate embargo period, and then eventually decaying to a Creative Commons Attribution license that is fully compliant with community definitions of Open Access. One could easily imagine using real data about economic usage of the literature to set these times in a noncontroversial fashion, creating a truly open corpus of literature both in terms of technical access and legal rights, without an emotional argument unfounded in data or the reality of modern web-based copyright licensing. The pros of a centralized approach to managing the public access are fairly straightforward. First, a single point of access to the research, with stable and common identifiers, radically decreases the cognitive burden to find and download the research. Second, the centralized approach raises the odds of common standards being applied to link the research to data (as we see in the vastly popular PubMed links to both internal and external data sources). And third, the centralized approach relieves the publishers of the need to perform these infrastructural functions, which should lower economic demands on the industry. However, it is important that a centralized repository be accompanied by open copyright licenses, so that additional copies of the open corpus can be maintained in libraries and research institutions, providing additional security to the preservation of the scholarly record. This mixture of a centralized resource with open licensing and standard technologies mirrors that of the internet itself, which runs on a small set of centralized resources (the domain name system, for example). Centralization of resources also radically lowers the burdens on the researchers and their host institutions. A single interface to upload to learn, a single interface for libraries to manage, and the comfort of a persistent repository rather than the funding of local repositories at library after library, reduces the burden of compliance not just on the publisher but on the other key stakeholders in the process. The cons of a centralized approach are also straightforward. It must be funded (and thus can be defunded in a crisis) and it takes a certain amount of control out of the hands of the publisher – but since the goal is to remove access controls, removal of control may in fact be a pro rather than a con. To encourage interoperable search, discovery, and analysis capability (and the small business, venture-backed job creation that innovation in each of those spaces will bring) the federal government should make a commitment to clear standards in document format, metadata, structured vocabulary and taxonomy, and commit to using its procurement power to only pay for articles that carry the designated metadata. Standards building is a long and cumbersome process, and any standard that doesn’t have adoption may be worth less than the (digital) paper on which it is printed. Having a stable customer for metadata in the person of the government creates a defined and clear market for startup business to serve, and creates potential for top-line economic growth at more established publishers as well. It is vital as well to ensure that the metadata associated with the research is itself public. While the copyright status of metadata has not been extensively tested in court, there is reason to believe (from cases involving medical procedure codes among others) that at least some metadata, especially vocabularies and ontologies, may carry copyright obligations. The federal government should authorize the use of open copyright licenses such as the Creative Commons licenses on metadata, and preferentially select vendors who use the most open of copyright licenses and tools. While scholarly articles are the traditional focus, and should be the first order of business in a federal open access policy, book chapters and conference proceedings (and even perhaps more novel forms of communication, like blogs and wikis and social media) should be evaluated for inclusion in the policy. However, careful attention should be paid to the level of effort required to create the work, and different rules might be applied to works that require a bit less effort (a conference poster might be required to be open immediately, no embargo) compared to those that require significant effort (a book chapter might receive a longer embargo than an article). About me: I am a Senior Fellow at the Ewing Marion Kauffman Foundation, and a Fellow at Lybba. I’ve worked at Harvard Law School, MIT’s Computer Science and Artificial Intelligence Laboratory, the World Wide Web Consortium, the US House of Representatives, and Creative Commons. I also started a bioinformatics company called Incellico, which is now part of Selventa. I sit on the Board of Directors for Sage Bionetworks, iCommons, and 1DegreeBio, as well as the Advisory Board for Boundless Learning and Genomera. I have been creating and funding jobs since 1999.
My response to the White House RFI on Digital Data. >>>>>>>>>>>>>>>>>>>>>>>>>> While the advent of data sharing plan submission requirements at the NIH and the NSF is a welcome development, encouraging the reuse of scientific data needs far more policy intervention. First, Standards should be developed that can be used to grade data sharing plans, so that grant review panels can know both whether or not a specific data sharing plan is satisfactory and so that for any given call for submissions the reviewers have a sense of how important data sharing is versus the scientific goals of the project. Second, data sharing plans should be made public alongside the notices of awards and contact information for the principal investigators, so that both taxpayers and scientists know what promises were made and how to contact a scientist and ask for data under the plan approved. Third, tracking should be possible to begin to estimate compliance: annual grant review forms should contain fields where the researcher is obliged to place URLs to data shared under the plan (or if left blank, explain why), for example. It should also be easy to create a data request system in which those asking for data send a copy of their request to the grants database, which can then be cross-referenced against the review forms to provide at least a rough estimate of compliance. And fourth, scientists with a record of subpar execution against data sharing plans should be downgraded in their applications for new funding. Taken together, these four elements create an incentive structure that would significantly increase the incentive for scientists to provide public access to the digital data resulting from federally funded research. In tandem, the funding agencies might develop financial models for the preservation of these digital data in much the same way that models exist for estimating overhead and other baseline costs as a percentage of the grant. This could fund not only new library services and jobs in the research enterprise but also serve as a non dilutive funding source for a new breed of data science startup companies focused on preservation, governance, querying, integration, and access to digital data. However, we should be careful not to treat data as property by default. Intellectual property is a useful frame through which to view creative works and inventions in science, as well as to protect valuable “marks” and secrets. But in the United States at least, data is typically in the public domain already, and therefore the extension of intellectual property rights to it would represent a vast expansion of rights in a space where there is zero empirical evidence that it is needed. Typically data is treated more as a secret, which is at odds with the public nature of the idea of data access, and the obstacles to data sharing are less legal than they are professional and economic. The ugly reality is that sharing data represents a net economic loss in the eyes of many researchers: it takes time and effort to make the data useful to third parties (through annotation and metadata) and that is time that could be spent exploiting the data to make new discoveries. On top of this, there is a twin incentive problem. Scientists see no benefit to sharing data and are not punished if they fail to share data, while there is a pervasive fear that other scientists will “scoop” them if their data are available before being fully explored. This creates a collective action problem that can be overcome most easily by clear funder policy as enumerated above: data sharing plan mandates with transparency, accountability, tracking, and impact on future funding. One policy action that would be very welcome would be an unambiguous signal that publicly funded science data is in the public domain worldwide, not just in the United States. This could be accomplished either through the use of a copyright waiver, such as the Creative Commons Zero tool, or through other means. But it is vital to make it unambiguous and clear when and where data are free to reuse, because applying conditions imported from creative works and inventions to a class of information that is fundamentally far less like “property” can have serious unintended consequences. Easily imaginable consequences include vast cascades of attribution requirements, so that a query to 40,000 data sets requires 40,000 attributions – every time – or worse, the poisoning of data for use in job creation by small companies who wish to build atop data as a platform or infrastructure. The intellectual property status of data does differ across the scholarly disciplines and its own status in how far it’s been processed. Some sciences rely on inherently copyrightable “containers” for data, from field books to recordings to photographs. And raw data converted to beautiful information by visualizations will touch on copyright. Policy should be flexible enough to account for this, but start with a default bias that public domain data is the most reusable, while providing “opt-out” capacity for data and disciplines where the public domain is simply not the best solution. There is an obvious problem with this set of policy recommendations. They rely on money to work. We do not yet know the true costs of storing digital data over the same time frames that we store the scholarly literature. As our capacity to generate data explodes, we must invest at the same time in our capacity to steward it. Research projects into large data information science should be a priority, with specific attention paid to when and where it is possible to compress data, move data to secure “cold storage”, jettison data (either because it is duplicative, or because it can be regenerated again later) , and more. We do not have the sociotechnical infrastructure required to answer questions of data stewardship with any authority, and we must create it on the fly at the same moment that the data creation burden is hitting exponential heights. Solving these stewardship problems might be best achieved through a coalition of research institutions, the library community, publishers, and funders. Taken together these groups already heavily regulate the daily life of a federally funded scientist. It is a small extension to imagine leveraging that regulatory power to provide new services to the scientist – a university and its library might keep an archive of standard data sharing plans, standard budget items to implement, which together would take the guesswork out of filing and operating a data sharing plan. Even better would be a federal program to certify a small number of such plans for each discipline. Missing from the set of stakeholders mentioned in the RFI is, notably, the business community, both the large scientific companies and the vast potential of startup firms. In an ideal world, the stewardship conversation will bring in actors from those industries, from pharma to venture capital, as we are missing an entire professional class of data stewards and data engineers (not just data scientists) who could serve the needs of the research enterprise while creating stable. Even better, because the data stewards must be close to the researchers to serve them, these jobs are less likely to move offshore. An investment in small business grants, job training (and retraining) vouchers, and the creation of community college pedagogy for data stewardship functions could go a long way towards stimulating the emergence of this professional class. In order to stimulate the interaction among these stakeholders and the emergence of a new class of data stewardship jobs, agencies could take additional steps to stimulate use of data. Contests are one obvious route, where a prize is posted in return for solving a problem (or simply for coming up with innovative ideas and/or applications that run on government data). Another route is the expansion of SBIR grants to create a track focused specifically on data startups, which lower the risk of company formation and job creation as well as creating non-dilutive funding sources for entrepreneurs. A route that is vital, but less obvious, is investment in and commitment to the emergence of standards that enable interoperability of, and thus reuse of, digital data. Standards lie at the heart of the Internet and the World Wide Web, and together lower the cost of failure to such a low point that companies built on the web and the internet can begin in garages. Such is not the case in the sciences. And it will not spontaneously emerge, even if data flow onto the web. As long as those data are in a tower of babel of formats, incoherent names, and might move about every day, they will be a slippery surface on which to build value and create jobs. Federal policy could call for a standard method for providing names and descriptions both for digital data and for the entities represented in digital data, like the proposed standard of the Shared Names project at http://sharedname.org . Standards also make it far easier to provide credit back to scientists who make data available, as well as increasing the odds that a user gets enough value from data to decide to give credit back. Embracing a standard identifier system for data posters will make it easier to link back unambiguously to a researcher as well as to make it easier for grant review committees and universities to receive a full picture of a scientist’s impact, not just their publication list. Standards for Interoperability, Re-Use and Re-Purposing About me: I am a Senior Fellow at the Kauffman Foundation, the Group D Commons Leader at Sage Bionetworks, and a Research Fellow at Lybba. I’ve worked at Harvard Law School, MIT’s Computer Science and Artificial Intelligence Laboratory, the World Wide Web Consortium, the US House of Representatives, and Creative Commons. I also started a bioinformatics company called Incellico, which is now part of Selventa. I sit on the Board of Directors for Sage Bionetworks, iCommons, and 1DegreeBio, as well as the Advisory Board for Boundless Learning and Genomera. I have been creating and funding jobs since 1999.
I’ve been watching the growth of The Cost Of Knowledge with fascination since it launched last week. If you’re following the kerfuffle around the Research Works Act, and the uncanny similarities between Elsevier press releases and the phrasing of Congressional responses to input on the Act, let me explain a little bit. Scientists are the labor on which the scientific publishing industry is built. They do the science, they do the writing of papers, and they decide where to submit their papers for publication. Then the publisher turns right back around and asks other scientists to do more work: read the submissions, review them for accuracy, review them for how important the science is, and decide if the paper is worth publishing or not. Then the publishers format the paper and sell it right back to the scientist via punishingly high subscription costs. With this labor system, traditional subscription-based publishers can (unsurprisingly) clear profit margins that would make Bill Gates jealous, upwards of 30%. They’ve increased prices, fought open access policies, and paid for false-front lobbying groups to maintain the status quo. But there was a fundamental fault line in scientific publishing, one those of us in the open science world have always watched, waiting for the first earthquake to strike: the willingness of scientists to be the volunteer labor in the equation of publication. Seeing 2600 (and increasing as of 1 February 2012) scientists state they won’t review, and won’t publish, at Elsevier journals in response to the RWA, is that earthquake. It’s a gorgeous example of nonviolent resistance. But it’s not enough. Scientists who won’t publish or review with Elsevier need to make a second commitment to make the same amout of labor they used to give to the Dutch giant and give it to a true Open Access publisher. We cannot make the change we want through telling Elsevier “I Prefer Not To” - we must make a separate commitment to devote time and sweat to open journals. Remember the parable of Bartleby. His passive nature did create change, but eventually he wound up too passive to eat, and he died. There are limits to the power of saying you won’t do the wrong thing - there are few limits when you commit to doing the right thing as well. And, because this was a way too thinky post, here’s a nice pop culture reference to Bartleby.