The Automation of the Publishing Industry

We are automating the business of publishing, and if you’re anyone other than a publisher, the results are largely positive. For readers, the cost of reading has dropped while selection has increased dramatically. Employment in newspaper, magazine and book publishing has definitely suffered, though the impact on the broader employment market for writers and editors has remained largely unchanged. What we have automated, in other words, is the publishing process – not writing or editing, which, so far at least, remain very much a human endeavor.

The automation of publishing results in a fusion of human and machine processes. Today that fusion shows up as search engines and social media networks that harness the attention of billions of people to perform many of the functions once played by the publishing industry. Tomorrow, this fusion will revolutionize the gift of knowledge by putting it increasingly into the logical processing power of a new generation of intelligent machines.

Automation, Not Just Digitalization

First, let’s get one thing straight: the publishing sector isn’t just digitizing – it’s automating. It’s just that digitalization is an integral aspect of automation in this particular industry. Digital information is information that is easily handled by computing machines. It can be routed, copied, filtered, augmented, and processed in countless ways limited only by the programming ingenuity of the underlying software.

Of course, software doesn’t always lead to automation; it can just as easily route digital information to humans in cases where the work is still too complex or too subtle for machines. Digital information acts as a kind of lingua franca for connecting machines and humans into increasingly integrated systems, where work moves easily between automated and manual processes. Once it’s digital, information can be easily routed to either a machine or a human, depending on which is able to add the most value for the least cost. Digital formats are therefore an essential stepping-stone to automation in the publishing industry. As the software behind the publishing systems improves, more of the work will be routed away from humans and toward machines.

In short: [Tweet “Digitalization translates publishing into a language easily understood by automation.”]

The Great Unbundling of Publishing

Another way of saying all this is that digitalization acts as a powerful “unbundler” – separating processes that are easily automated from those that are not. Most of the disruption now happening in the publishing sector stems from businesses that know how to automate what can be automated and build powerful networks that outsource what cannot be automated to people who will do it for free.

One way to think about all this is to look at the set of processes traditionally carried out by publishers, most of which were typically done in-house. This is the old vertical integration model, pioneered in the automotive industry and applied to the packaging of information. The amount of vertical integration and bundling varied by publishing business, of course; magazine and newspaper publishers tended to retain stables of writers, for example, while book publishers did not. Nevertheless, there are general industry patterns worth understanding.

The great unbundling of publishing is essentially a story of two related technology transformations: automated publishing tools and automated distribution and editorial selection technologies. Later, we’ll briefly look at the future of automated writing.

Automated Publishing

When I was five years old, my dad and his older brother ran a newspaper in Salt Lake City. I remember playing in his office, the smell of ink in the air, and the sound of the presses cranking out papers in the back. It was messy and loud, and it took a lot of work and money just keeping it all going. By the time I was out of college and in my first job, I was able to do a fair amount of what my dad’s printing press had once done just using an earlier word processing application called XyWrite on an early IBM desktop computer. Right around that same time, Paul Brainerd and the crew at Aldus were designing PageMaker, the first commercially viable desktop publishing system as a way to close the gap between computers and my dad’s printing press.

With the web, publishing output moved increasingly from paper to screens. First, we hand-coded HTML or used funky software packages like Microsoft FrontPage, but as Content Management Systems emerged in the late 1990s, web publishing software became quite sophisticated, allowing a whole new class of organizations to publish by themselves for the first time. Blogging software then emerged in 1999, streamlining the publishing process still further. Today, a solution like WordPress powers big sites such as CNN, TechCrunch and UPS, while also making this same power accessible to some 75 million other WordPress websites (including this one).

Social media is actually another step in the automation of publishing, where the ease with which we compose and publish content often obscures the fact that we are even publishing in the first place. Some people use that power simply to stay in touch with friends, while others use it as their own personal printing press. What we do everyday on Facebook, Twitter, LinkedIn and Google+ is nothing short of revolutionary when we see it in the long arc of the history of publishing.

We’re now witnessing another round of automation: this time in book publishing. New book publishing and distribution processes are quickly taking off. The number of self-published books nearly quadrupled between 2009 and 2012, and though ebooks have increased from 11% to 40% of that total, print still accounts for the majority of all self-published books:

Automated Distribution

Distribution of content used to be expensive. Now it’s not. Once in a digital format, information can be seamlessly and automatically transferred from one place to another at virtually no cost: no need for delivery bikes, bookshops, newsstands or direct mailings. The Internet takes care of all of that.

Once we connected our automated publishing tools to the Internet, the challenge shifted to making things discoverable in the resulting flood of information. Now we needed filters with the kind of automation and scale to match the output of our automated publishing tools.

The first steps in this direction was the search engine. Google’s breakthrough was mapping websites against various topics and then ranking them based on the number and importance of links pointing to them from other sites. Remember – one of the things automation is good at is unbundling processes that are easily automated from those that are not. Assessing content quality is difficult for software (for now), so Google built a massive-scale platform for harnessing the collective wisdom of millions of people, and used people’s decisions about what was worth linking to as a proxy for quality content.

Once we started sharing and liking things on social networks, audiences became distributors, and the filtering algorithms of the social networks joined Google’s search algorithms as a kind of collective attention-focusing system. Google, Facebook and the other social networks meet our information filtering needs by automating what they can and farming out what they can’t to millions of volunteers who aren’t paid a dime. These powerful, automated platforms are designed to engage end users with as little employee intervention as possible. Like all automation, they lower labor costs and improve quality, but in this case, their most important contribution is the massive scale of engagement they enable.

The Impacts of Automated Publishing

Automation shapes each industry it touches with subtle differences; wear patterns of efficiency, carved by currents of technological change.

For the producers of published works, the story is of automation is fairly complicated. Automation has wreaked havoc on certain segments of the publishing business, but it has also created opportunities for new publishers and countless new writers. In short, automation has brought pain and suffering to publishers, while also generating an outpouring of new forms of creative expression. Joseph Schumpeter described this phenomenon as “creative destruction,” the incessant cycles of death and birth, driven by technological innovation, in a market economy.

The impact of automation on publishing customers, which is to say, on readers, has been largely positive. This should come as no surprise, as automation is frequently much kinder to consumers than it is to producers. Automation generally improves quality and expands our choices, even offering us a kind of personalization unseen since the early days of craftsmen. One of the biggest impacts is automation’s ability to cut costs and lower prices.

The Impact on Readers

It does appear that people now spend less money reading published materials than they did on average twenty years earlier:

Over the twenty years between 1990 and 2009, average U.S. consumer expenditures on reading dropped 28%. By way of contrast, spending on entertainment increased 89% over this same period. The U.S. Bureau of Labor Statistics tracks various consumer-spending categories, and aside from insurance, reading was the only spending category to have actually declined over this twenty-year period.¹

This kind of drop in consumer spending isn’t always the result of price declines though; it could just as easily have resulted from people reading less. Men’s average reading time did drop a bit between 2003 and 2012, as younger men in particular spent more time using computers for leisure. Because the Bureau of Labor Statistics doesn’t break out time expenditure categories in detail, it is not clear exactly how much of what it defines as “using a computer or the Internet for personal interest” actually includes online reading. Presumably, gaming accounts for most of the time in this “computers for leisure” bucket, but if just fifteen percent of this total includes time reading on websites or social media, men’s reading time would remain unchanged.

In the case of women, reading time has held largely steady, even as time spent using a computer for leisure activities increased.

Even if none of men’s computer leisure time includes online reading, the total drop in reading time for men and women between 2003 and 2009 is less than five percent, compared to a 19% drop in reading expenditures for roughly the same period. In short, there may be some drop in our total reading time, but not enough to explain the much bigger drop in reading expenditures.

So there does appear to be some price drop at work here, but the question is how much of it is related to automation’s impact? Here, it’s very hard to compare aggregate prices of print versus online magazines and newspapers. One-to-one pricing analogies start to break down. In the old publishing model, readers paid a base-level subscription fee or a higher newsstand rate to get access to published material. This price was heavily subsidized by advertising, of course. Today, since most of this material is freely available online, and easily discoverable through search and social media, readers have become used to not paying anything for most online content.

Book publishing is becoming increasingly digital and automated, and it doesn’t have advertising revenues to distort the pricing picture. Fully half of adults in the U.S. now own digital readers, and digital books now account for 27% of all book revenues. Unable to find meaningful price comparisons between digital and print books, I decided to pull my own sample data straight from Amazon. To ensure a representative showing for paperbacks, I used the list of the top 100 best-selling books on Amazon from 2012, rather than the best-sellers from this year where many still do not yet have paperback editions.

As the chart above shows, digital prices (shown as “Kindle”) are, on average, less expensive than either hardcover or paperback print editions. The average price for Kindle editions was $8.56, versus $18.45 for hardcover and $11.96 for paperback. The average ratio of digital-to-hardcover pricing was 49%, and the digital-to-paperback pricing was 84%. Relying on best-selling books may introduce some sampling error (though it’s unclear which direction it might skew), but it does suggest that, despite recent controversies over digital book pricing, readers are paying less today for digital editions.

The Impact on Publishers

Automation isn’t the only force affecting the publishing sector. Financial markets, changing customer preferences and numerous other factors are clearly at play. But automation acts as a behind-the-scenes magnetic force, disrupting the economics of publishing by unbundling processes, and revenue streams, once held exclusively by publishers.

The most dramatic case of publisher unbundling is the newspaper industry. This was an industry that was extremely lucrative at one point because many papers held what were essentially local monopolies. In 1920, 42.6 percent of U.S. cities had two or more newspapers competing with each other. By 2000, only 1.4 percent did, mostly because afternoon newspapers had disappeared. In the 1980s, Wall Street-owned newspaper chains started gobbling up local independent papers, using “harvesting strategies” to extract maximum profits from each paper.

Not long afterwards, in the mid-nineties, many of the most lucrative advertising buckets once dominated by local papers started slipping into the hands of more focused and technologically savvy web-based, e-commerce businesses. I actually started one of those businesses at Microsoft, a car-buying service called CarPoint that served some seven million users a month. I remember meeting with some of the big newspaper chains at the time and being shocked at just how blind they were to what was happening to cornerstones of their advertising models; not just automotive, but real estate, job listings and wanted advertisements.

When you compare newspapers with books and periodicals, it’s clear that the papers have been hardest hit. Between 1997 and 2012, the total number of newspaper businesses dropped 13%:

The problems are even more obvious when you look at revenues…

…and employment:

My interpretation of these numbers is that the impacts of publishing automation have fallen most heavily on newspapers and magazines (periodicals). Without advertising to support it, book publishing never really achieved the same scale of revenues as these other two publishing sectors. When advertising moved online, magazines and papers had the most to lose. As form factors, magazines and newspapers also lent themselves to easier methods of automated publishing, which in turn stimulated the supply of new online writers and website owners and increased their overall competition.

The Impact on Writers and Editors

The nuance here is that the number of people working for publishers in the newspaper, magazine and book sectors has declined significantly over the last fifteen years, but the number of overall editor and writer jobs has not. Below are the number of writers and editors working for companies, and as you can see, it doesn’t exhibit any of the significant drop-off of the graphs above.

These are writers and editors working for companies, so clearly some of the job losses we’re seeing amongst book, magazine and newspaper publishers are being compensated for by job growth in other firms.

Not all writers and editors work for companies though. While 95% of technical writers (people who write equipment manuals, operating instructions, etc.), and 90% of editors do work for companies, authors and writers are a much more independent bunch. In 2012, there were an estimated 83,000 self employed writers and authors, compared to 42,000 wage and salary-based positions. So roughly two-thirds of writers and authors are self-employed.

Did the recent turmoil in the publishing business cause publishing firms to simply outsource writing to independent contractors? One way to tell is to look at the ratio of self-employed to total employment, and when you do, you see that it’s held quite steady over the last dozen years:

When you add those independent writers and editors into the total employment picture, it remains much the same. Though the the Bureau of Labor Statistics’ data on self-employed writers is more sparse, you can see that overall employment for writers and editors remains largely unchanged over the last twelve years:

Automation has killed jobs in the publishing sector, streamlining and eliminating what were once human tasks, transforming them into virtual processes more efficiently handled by machines. So far though, writing and editorial professions have proven surprisingly resilient, shaking off layoffs by publishers, and shifting their skills to new opportunities.

Of course, one could argue that employment might be strong but that writers have simply agreed to lower incomes in the face of the increased supply of writers. But this isn’t exactly the case either. When you look at pay increases amongst book, magazine and newspaper publishers, it’s clear that after largely keeping up with the cost of living between 1997 and 2007, newspaper wages were essentially frozen between 2007 and 2012, with magazine wages not keeping up with the cost of living, while wages in book publishing grew some 32% over that five-year period:

The problem with the above figures is that they include all publishing jobs, and not just the incomes of writers and editors. So while they illustrate another dimension of the woes plaguing the publishing sector, they don’t specifically say anything about the impact on writers and editors.

When you look at Bureau of Labor Statistics’ data on hourly wages for writing and editorial jobs, what you see is that they have more than kept up with the cost of living over the last sixteen years:

As new tools have simplified the publishing process and invited a whole new class of writers to the fray, demand for content in the Information Age has proven powerful enough that, despite many concerns, the demand for professional writers remains strong. The skills of writing and editing have thus far resisted succumbing to automation. But for how long?

Automating Writing?

As with many forms of automation, the first places to look for successful examples of writing automation are within fields where information and processes are already highly structured. That’s precisely the niche that Narrative Science targets with its “automated narrative generation platform” called Quill. Quill translates data into stories; it takes company performance metrics and sports statistics and automatically translates them into written reports. Narrative Science relies on “meta writers” to customize the platform for new topic areas, such as generating summaries of baseball games or restaurant reviews – even using the system to mimic the writing style of specific sports writers.

The first applications of automated writing are already tackling jobs that are simply too menial for humans. Right now Narrative Science generates recaps of thousands of Little League baseball games that would never receive the attention of a human reporter, but which now have nice little stories churned out to parents within minutes of the games’ finishes. This is data-driven stories for the long-tail, which is to say, writing where demand is so fragmented that automation is the only way to cost-effectively serve it.

That’s just the beginning though.

Transforming Human Knowledge

Attention Filters

Automating publishing has dramatically increased the supply of information. Barring Matrix-style downloading of knowledge into our minds, we run across a bottleneck in our information-processing throughput that turns our attention into a precious commodity.

Our current solution to this problem is filtering, and here our algorithms have replaced manual editorial selection processes as a more scalable means of staying up with the overwhelming sea of information made possible through automation in the first place. These information filters aren’t completely technological and neither are they completely human. They’re a hybrid, a fusion, uniquely suited to an Information Age.

Publishing to Virtual Personal Assistants

What comes next extends this fusion of technology and humanity in ways that are unlike anything we’ve quite seen to-date. This new fusion will grow out of what we today call the publishing industry, and result in a reinvention of the way we experience human knowledge.

Today’s information filters form a kind of feedback loop between people and algorithms. What happens next takes this fusion to a whole new level. As I have explained elsewhere on a few occasions, all this knowledge that we collectively publish onto the Internet serves as a seedbed from which artificial general intelligence seems most likely to arise. Google is experimenting with methods of teaching its algorithms how to extract facts from millions of websites, then assess the accuracy of those facts by comparing them to other websites and to its Knowledge Graph. It’s not just building a knowledge base, but the systems to continually feed that knowledge base with the latest outputs of our automated information publishing.

The result is likely to be some kind of artificial intelligence in the form of a virtual personal assistant, building from what Google Now does today. We will simply ask our questions using natural language and a computer will answer us – just like the computer on Star Trek.

In other words, computer code, in the form of something like a virtual personal assistant, is about to form a new publishing medium. It will wrap our knowledge in code, code that we will converse with in order to answer our questions. This is the next generation of publishing, made possible by the massive automation of publishing that has already preceded it.

It’s hard to convey just how big a change something like this will be for humanity. Think back to the wonder you may have felt after first using Google Search. Millions of valuable, but largely inaccessible, sources of information suddenly materialized out of thin air. In this next phase, we will simply ask what we want to know and receive a highly customized response – just as though we’d asked a knowledgeable expert. This new layer of logic will enable machines to understand the meaning of our information. It will map our questions against massive pools of information, assessing various answers and even providing us with levels of confidence for each answer’s accuracy.

Closing Thoughts

We live in a time when automation is creeping from one industry to another. Each time it does, its signature is a little different. In publishing, the impact is complex. If you come from the publishing world, the effects have been quite painful. For most of us though, automation has largely been a good thing. Our access to written material has exploded. We now have far more information and knowledge at our fingertips than at any time in history.

I started this exploration, assuming I would find much more widespread evidence of the destructive aspects of Schumpeter’s creative destruction. What I found surprised me. The fact that writers and editors have experienced no widespread downturn in employment suggests that the footprints before us in the snow are that of a different creature than what I had assumed. I had envisioned a voracious beast, devouring everything in its path. But instead, what I’ve seen is something much more selective in its appetite for our work. It is a creature capable of bringing about much creativity in our company, but for those of us who once specialized in the manual processes of selecting and filtering as well as preparing and distributing written material, this creature has brought economic destruction.

Pierre Teilhard de Chardin referred to what’s now emerging as the noosphere. Andy Clark and David Chalmers describe it as an extended mind. Whatever you call it, we are entering a new phase in human intelligence, where more and more of our cognitive capacity is embedded in machines – a great big, collective brain in the cloud. Wikipedia and other sites are its vast stores of knowledge, Google Search our primary method for information recall. The streams of information flowing through news outlets and social media services could similarly be said to constitute a kind of collective stream of consciousness; the filtering algorithms of Facebook and other social networks our methods for focusing attention within that stream.

Once we are able to layer these pools and streams of information with the kind of machine understanding outlined above, we will unleash a huge boon to humanity. Where there is light, there is also darkness, however, and so we should expect these same systems to also bring us many new problems. The one I will close with relates to how much of our humanity remains in the future of our automated publishing and knowledge systems.

Right now, the vast majority of the knowledge held in our search engines and social media streams originated from human minds. What I wonder about are all those Little League baseball stories being automatically generated by Narrative Science software. That kind of software will get better and better with time, and through the proliferation of digital sensors enabled by the Internet of Things, that software will have access to massive pools of raw data from which it will generate many new reports and other publications without any help from humans.

The question that therefore arises is what happens when the publisher and the writer are no longer human. Surely, there will continue to be fields where humans will retain their role not merely as consumers but as generators of knowledge. Just how prevalent that will be is hard to say right now, and in this question lies a huge question about the future importance of humanity in the continued discovery of new knowledge.



1) The U.S. Bureau of Labor Statistics reading category  includes “subscriptions for newspapers and magazines; books through book clubs; e-books and digital reading material; and the purchase of single-copy newspapers, magazines, newsletters, books, and encyclopedias and other reference books.”


Exit mobile version