Sunday, January 27, 2013

My mobile systems research wish list

Working on mobile systems at Google gives me some insight into what the hard open problems are in this space. Sometimes I am asked by academic researchers what I think these problems are and what they should be working on. I've got a growing list of projects I'd really like to see the academic community try to tackle. This is not to say that Google isn't working on some of these things, but academics have fewer constraints and might be able to come up with some radically new ideas.

Disclaimer: Everything in this post is my personal opinion and does not represent the view of my employer, or anyone else. In particular, sending a grant proposal to Google on any of the following topics will by no means guarantee it will be funded!

First, a few words on what I think academics shouldn't be working on. I help review proposals for Google's Faculty Research Awards program, and (in my opinion) we get too many proposals for things that Google can do (or is already doing) already -- such as energy measurements on mobile phones, tweaks to Android or the Dalvik VM to improve performance or energy efficiency, or building a new mobile app to support some specific domain science goal (such as a medical or environmental study). These aren't very good research proposal topics, in my opinion -- they aren't far-reaching enough, and aren't going to yield a dramatic change five to ten years down the line.

I also see too many academics doing goofy things that make no sense. A common example these days is dusting off the whole peer-to-peer networking area from the late 1990s and trying to apply it in some way to smartphones. Most of these papers start off with the flawed premise that using P2P would help reduce congestion in the cellular network. A similar flawed argument is made for some of the "cloud offload" proposals that I have seen recently. What this fails to take into account is where cellular bandwidth is going: About half is video streaming, and the other half things like Web browsing and photo sharing. None of the proposed applications for smartphone P2P and cloud offload are going to make a dent in this traffic.

So I think it would help academics to understand what the real -- rather than imagined -- problems are in mobile systems. Some of the things on my own wish list are below.

Understanding the interaction between mobile apps and the cellular network. It's well known that cellular networks weren't designed for things like TCP/IP, Web browsing, and YouTube video streaming. And of course most mobile apps have no understanding of how cellular networks operate. I feel that there is a lot of low-hanging fruit in the space of understanding these interactions and tuning protocols and apps to perform better on cellular networks. Ever noticed how a video playback might stall a few seconds in when streaming over 3G? Or that occasionally surfing to a new web page might take a maddening few extra seconds for no apparent reason? Well, there's a lot of complexity there and the dynamics are not well understood.

3G and 4G networks have very different properties from wired networks, or even WiFi, in terms of latency, the impact of packet loss, energy consumption, and overheads for transitioning between different radio states. Transport-layer loss is actually rare in cellular networks, since there are many layers of redundancy and HARQ that attempt to mask loss in lower layers of the network stack. This of course throws TCP's congestion control algorithms for a loop since it typically relies on packet loss to signal congestion. Likewise, the channel bandwidth can vary dramatically over short time windows. (By the way, any study that tries to understand this using simple benchmarks such as bulk downloads is going to get it wrong -- bulk downloads don't look anything like real-world mobile traffic, even video streaming, which is paced above the TCP level.)

The lifetime of a cellular network connection is also fairly complex. Negotiating a dedicated cellular channel can take several seconds, and there are many variables that affect how the cell network decides which state the device should be in (and yes, it's usually up to the network). These parameters are often chosen to balance battery lifetime on the device; signaling overhead in the cell network; user-perceived delays; and overall network capacity. You can't expect to fix this just by hacking the device firmware.

To make things even more hairy, mobile carriers often use different network tuning parameters in different markets, based on what kind of equipment they have deployed and how much (and what kinds) of traffic they see there. So there is no one-size-fits-all solution; you can't just solve the problem for one network on one carrier and assume you're done.

Understanding the impact of mobile handoffs on application performance. This is an extension to the above, but I haven't seen much academic work in this space. Handoffs are a complex beast in cellular networks and nobody really understands what their impact is on what a user experiences, at least for TCP/IP-based apps. (Handoff mechanisms are often more concerned with not dropping voice calls.) Also, with the increased availability of both WiFi and cellular networks, there's a lot to be done to tune when and how handoffs across network types occur. I hate it when I'm trying to get driving directions when leaving my house, only to find that my phone is trying in vain to hang onto a weak WiFi connection that is about to go away. Lots of interesting problems there.

Why doesn't my phone last all day? This is a hot topic right now but I think the research community's approach tends to be to change the mobile app SDK, which feels like a non-starter to me. Unfortunately, the genie is out of the bottle with respect to mobile app development, in the sense that any proposal that suggests we should just get all of the apps to use a new API for saving energy is probably not going to fly. In the battle between providing more power and flexibility to app developers versus constraining the API to make apps more efficient, the developer wins every time. A lot of the problems with apps causing battery drainage are simply bugs -- but app developers are going to continue to have plenty of rope to hang themselves (or their users) with. There needs to be a more fundamental approach to solving the energy management issue in mobile. This can be solved at many layers -- the OS, the virtual machine, the compiler -- and understanding how apps interact with the network would go a long way towards fixing things.

Where is my data and who has access to it? Let's be frank: Many apps turn smartphones into tracking devices, by collecting lots of data on their users: location, network activity, and so forth. Some mobile researchers even (unethically) collect this data for their own research studies. Once this data is "in the cloud", who knows where it's going and who has access to it. Buggy and malicious apps can easily leak sensitive data, and currently there's no good way to keep tabs on what information is being collected, by whom, for what purpose. There's been some great research on this (including the unfortunately-named TaintDroid) but I think there's lots more to be done here -- although we are sadly in an arms race with developers who are always finding new and better ways to track users.

What should a mobile web platform look like 10 years from now? I think that the research community fails to appreciate the degree of complexity and innovation that goes into building a really good, fast web browser. Unfortunately, the intersection between the research and web dev communities is pretty low, and most computer scientists think that JavaScript is a joke. But make no mistake: The browser is basically an operating system in its own right, and is rapidly getting features that will make it possible to do everything that native apps can do (and more). On the other hand, I find the web development community to be pretty short-sighted, and unlikely to come up with really compelling new architectures for the web itself. Hell, the biggest breakthroughs in the web community right now are a sane layout model for CSS and using sockets from JavaScript. In the mobile space, we are stuck in the stone ages in terms of exploiting the web's potential. So I think there is a lot the research community can offer here.

In ten years, the number of mobile web users will outstrip desktop web users by an order of magnitude. So the web is going to be primarily a mobile platform, which suggests a bunch of new trends: ubiquitous geolocation; users carrying (and interacting with) several devices at a time; voice input replacing typing; using the camera and sensors as first-class input methods; enough compute power in your pocket to do things like real-time speech translation and machine learning to predict what you will do next. I think we take a too-narrow view of what "the web" is, and we still talk about silly things like "pages" and "links" when in reality the web is a full application development platform with some amazing features. We should be thinking now about how it will evolve over the next decade.

5 comments:

  1. Very good post.

    Regarding the TCP over wireless, I think there are lots of research in this community already, and folks are aware that the packet loss could be much more than congestion. somehow, I think they may ignore the RTT variation and its bigger impact on TCP performance since the slow start is much worse than fast recovery.

    The fundamental thing is whether the TCP can afford so much in new scenarios? And will TCP need more information than pkt loss and RTT?

    As you mentioned, although being as "wireless", wifi and cellular are so different, and cell phone may frequently switch between them, dose the TCP should know underline issues in PHY/MAC layer? this breaks the layered solution but seems to be the only way to handle all those difference. Do we expect a new TCP scheme with API open to different under layer and application scenario? Will this make TCP overwhelmed?

    Probably the realistic solution will be more private "TCP like" solution for specific application or medium. Then what about the fairness with TCP.

    I agree this can definitely be a research Topic.

    ReplyDelete
  2. But I think the web APIs are stuck in the past even more than mobile apps. There is almost nothing (actually nothing) to get mobile optimized apps with HTML5/JS - battery management, running only when needed.

    What Chrome does is terrible (saying this as a user, I did not read its code). It just stop a background tab. Firefox keeps it running, but it is somehow random when the OS will destroy the tab. The webapp has no way to even know about this.

    ReplyDelete
  3. There has been a lot of research on the TCP over wireless since 1990s. Perhaps a but more in wireless (IEEE) community than within ACM, but regardless the state of the art is quite good. Similarly the handover issue has been studied quite much, and when 3G systems were introduced there was even measurement studies to understand the effects. This is not to say that there is no place for further studies, but it is not a virgin territory. One problem in way forward is that experimental work in this domain is very difficult (and rarely valued in peer-review process) .

    ReplyDelete
  4. Generally I agree with your thought process, but I think the focus should shift INTO the device space more than the network space AND up from link/network performance to application/system performance. There's been tons of work done by the likes of Ericsson, Motorola, ALU, Huawei, and Samsung and their associated University partners on network/link performance - you can see it in the conferences and even in the standards contributions.

    For example relating to TCP over wireless, check out papers from IEEE VTC from the past 15 years. VTC is a huge conference, so there may be hundreds of papers in this space. The research started as early as 2G (mostly related to GPRS/TCP interactions) and migrated to more advanced scheduled systems (including some fairly advanced scheduling approaches in LTE that involve managing/adapting to the time-varying nature of the radio link).

    In the handover realm, modern systems (HSPA and beyond, for example) tend to forward non-acknowledged or non-transmitted data (RLC can be acknowledged or unacknowledged) to the target cell on handover, but that's done as a mitigation strategy for both TCP performance and backhaul throughput.

    One thing that's hard for the network builders (the above-mentioned source) to understand is the impact of this "chunky variance" on application performance. Similarly, it's difficult for those same people to really understand what's going on in the device - it's largely treated as a dumb sink/source. For example, if you look at a lot of the modeling in standards contributions you'll see the most unrealistic traffic model you could imagine - an infinite amount of data ready to go or capable of being consumed, the so-called "full buffer" model. It's a nice model for working out math and understanding peak/maximal values for network capacity, etc., but it has little to do with the real world. Better analytic models for real traffic flows (as you note - ~50% video the rest photo sharing/web browsing) will help to engineer better systems. Until then, systems will continue to be designed with the "full buffer" model.

    ReplyDelete
  5. Congratulations Matt Welsh! Thank you so much for taking the time to share this exciting information.

    career training school

    ReplyDelete

Startup Life: Three Months In

I've posted a story to Medium on what it's been like to work at a startup, after years at Google. Check it out here.