Writing with Me

This is a post on how I write research papers.

It’s okay to disagree, since there are many approaches to writing. But if you disagree a lot, then it likely means we shouldn’t collaborate.

Prewriting

Before writing, you should be able to answer the following questions about your work (this is the Kent Beck four sentence abstract):

  1. What is the problem?
  2. Why is the problem interesting?
  3. What are you doing to solve the problem?
  4. What follows from this work?

You should ask yourself, “Why is this science? (from Mladen Vouk). There are many interesting things you can write about, but not all of them are considered science.

Next, you should identify what is interesting about your work from: Murray S. Davis, “That’s interesting! Towards a phenomenology of sociology and a sociology of phenomenology,” Philosophy of the Social Sciences (1971).

“Little is known” is not an interesting reason. Maybe little is known because the topic is so boring that no one cares. “Increasingly interested” is also not an interesting reason, since what goes up must come down.

If your research is on software engineering, you should first run your idea by a few software engineers. If your research is on data science, you should do the same with data scientists.

Don’t write papers on non-problems.

Pick a Conference

I pick a conference before I write the paper. This gives me a target date and also a sense of the work that needs to be done.

I select a set of template papers from this conference and model my own writing after them.

These template papers are your nearest neighbors, having similar methodology and topic area.

Best papers awards and honorable mention awards are good places to start. I restrict myself to papers from the last few years, since expectations at conferences have shifted over time.

Pick a Director (or Dictator)

Every paper needs a director, and you should agree on who this will be early on. The director has a holistic perspective of what the paper should be.

The director takes input from other co-authors, allows for some discussion, but ultimately makes the final decision about what to do (in other words, the disagree and commit management strategy).

Writing Process

Science is storytelling. HCI is the science of telling stories about people.

Clarity of writing follows clarity of thought. Bullet points are easy to rearrange and refactor. But a string of incoherent words, even if there are a lot of them, is unfixable.

Writers are sometimes thought of as plotsers or pantsters, at the extremes. I’m a plotster, so it’s useful if we can nail down an outline of what we want to say for each section and paragraph before we write it.

I do all of my writing in Scrivener. I do all of my editing in other tools. Writing and editing are different processes, and require different tools. If you intermingle them, you will do neither well.

I collaborate in Overleaf. Collaborative editing is indispensible. Use track changes.

E-mail is too slow to iterate effectively. Use Slack, Skype, Google Hangouts, or nearly anything else that allows for both real-time and asynchronous group chat.

Many activities are part of the writing process, not just the words on paper: in addition to outlining, these are activities like building concept maps and annotating related work. Save these intermediate artifacts somewhere.

Make a task list and put all tasks in this central place. A task list should be easy to get to. A TASKS.md file is often good enough. Trello works too. A task list makes sure that things won’t get lost.

If all of the user studies are done, and all the data is analyzed, and there is a co-author who can deal with the figures, then historically it has taken me two weeks of focused work to prepare a paper.

Figure Rules

Figures matter, especially in HCI conferences where the figures are the only way for readers to see how users will interact with your system.

Use vector-based figures when possible. I have standardized on Figma for this workflow. Even if you’ve built a real system I will still prepare Figma mock-ups for the paper. They just look better.

Render all plots through a tikz engine (for example, matplotlib2tikz), so that they have the same font as the rest of the text. Font sizes should be consistent across all figures.

Section Rules

The Introduction should tell a single, coherent story. I use a variation of the assertion-evidence approach. The most common problem I see with Introductions are too many detours. Don’t bury the lede.

Methodology should be written as a recipe, such that another research can rerun your study process and understand why you made the choices you did. Sometimes you aren’t able to execute your recipe exactly as you wanted: save that for Limitations.

Limitations should be actual limitations. If you mitigated it, then it isn’t a limitation. As Tyrion Lannister says, “once you’ve accepted your flaws, no one can use them against you.”

Related work should not be a laundry list of papers. Related work describes how related ideas have evolved over time, and how your problem fits and differentiates from existing work (read “The Related Work Section” in Thoughts on the Structure of CS Dissertations by Spencer Rugaber).

Discussion is another tricky section. The goal of Discussion is to provide a data-inspired interpretation of the Results, through the lens of the authors. A good Discussion section provides actionability. It is not a summary of the Results.

Expect to rewrite. A lot. Sometimes from scratch. It takes me three to four days to write an Introduction, and only after many false starts. It takes me one to two days for the Discussion.

The rest of the writing I can usually do the day of the deadline if there is an existing outline that I can follow.

Style Rules

Write in plain style (sometimes called Classic Style). This is a conversational style, much like how you would speak to another academic colleague.

Show, don’t tell. Concrete examples are more important than abstact ideas. “Eat more vegetables” is concrete. The concept of “nutrition” is abstract.

People are more important than things.

Insights are more important than data.

It’s fine to use contractions: don’t, won’t, can’t.

It’s also fine—and encouraged—to use --- (em-dash) in your sentences.

But do write out: for example (instead of e.g.), that is (instead of i.e.), and with respect to (instead of w.r.t).

If it’s easier, you can write everything in active voice, and I’ll patch it to passive voice in the rare cases that it makes more sense.

Every paragraph should have a purpose. You should be able to state it. One tip is boldification (from Margaret Burnett): at the beginning of every paragraph have a comment that states what this paragraph is about.

Punctuations go inside quotes.

Footnotes go outside punctuation.

I use the Chicago Manual of Style to resolve remaining bike shedding.

Chris Parnin wrote up some Error Codes for Introductions. I agree with them.

LaTeX Rules

Use booktabs and tabularx (or tabulary) for tables. Use threeparttable where it makes sense.

Use minted for code snippets.

Use cleveref to reference figures and tables.

Use \citet{paper} to insert the author names.

Don’t hardcode participants. Make all of your participants into macros. This way you can rename or renumber them if needed.

Your document should always compile, because LaTeX error messages are cryptic and hard to debug.

Submission

I submit papers to the following conferences:

  • CHI and UIST. For CHI, I pick the Engineering Interactive Systems and Technologies (EIST) subcommittee and nothing else.
  • ICSE and FSE (but only if another author leads the work). I pick the closest research area to human-centered computing and nothing else.
  • For early-stage work, sometimes I submit to VL/HCC because I like the smaller community and know it well.

Confusingly, the goal of picking a subcommittee or research area is to route your paper to appropriate reviewers in your community. It is not intended to accurately describe what your paper is about.

It takes a lot of effort to learn the norms of a community. Bouncing between different conferences (or even subcommittees) doesn’t help the paper.

FSE and ICSE are interchangeable. CHI and UIST seem interchangeable, but they have subtle differences in expectations.

I used to get away with not submitting the optional video, but now it appears that they are effectively required. I use Camtasia.

I can only give full attention to one submission a time.

For authorship, I always list the student researchers before full-time researchers, regardless of contribution. Generally, the intern is the first author and the intern’s mentor is the last author.

As an industry researcher, I try to write about topics that have a shorter timeline to product. I appreciate theory papers, don’t have the cycles to write them.

Submission Checklist

Pilots use checklists to prevent human error, because small mistakes in aviation can compound into big problems. The same is true of papers. Inevitably, we will need to submit a paper under pressure. Here’s a checklist for that:

  • Use the correct options for submission. These are: \documentclass[manuscript,review,anonymous]{acmart}. Don’t forget to anonymize the paper.
  • Run spellcheck.
  • Search for the the in the paper. This is my most common repeated word.
  • Search for ". You probably meant `` or ''.
  • Check for ?? in the paper. You are missing citations or figure references.
  • Update the abstract abstract in PCS to match the abstract in paper, and make the abstract a compelling one. Reviewers will make fast decisions about whether they want to bid on your paper based on the title and an abstract.
  • Check the overall paper gestalt. Scroll through every page at a distance. Does it look like a real paper?
  • Check for cliffs and potholes. A cliff is something that will cause your paper to be immediately rejected (like submitting an unfinished section). A pothole is something that accumulates damage to the paper (a confusing sentence or awkward phrasing, or missing an important related work).
  • Do a best-effort pass to make sure that the citations look “reasonable.” (delete New York, New York, USA). It doesn’t have to be perfect, as you can fix the remaining issues in your camera copy.
  • Check citations list for correct usage of Amy J. Ko (though it would be nice if there was a general list of names to check).

References

The following materials have influenced how I write:

  • The Sense of Style: The Thinking Person’s Guide to Writing in the 21st Century by Steven Pinker
  • Writing Science: How to Write Papers That Get Cited and Proposals That Get Funded by Joshua Schimel
  • Making Comics: Storytelling Secrets of Comics, Manga and Graphic Novels by Scott McCloud
  • How to Write a Great Research Paper: Seven Simple Suggestions by Simon Peyton Jones

I don’t recommend The Elements of Style by William Strunk JR. and E.B. White. It’s outdated and much of it is based on folklore.

Social Media Schedule: Taking Social Media Breaks

I’d like to be more intentional about how I use social media. There are many reasons for this: to embrace deep work, escape the dark playground, and generally reduce my day-to-day anxieties from fear of missing out and being driven to distraction. 👻

Strategies

  • I have deactivated Facebook.
  • I have deleted slack from my phone.
  • I have removed Twitter from my phone and tablet.
  • I have deleted other distracting sites like Reddit from my browser history, and logged out of these accounts.

These actions were easy.

Twitter Schedule

Unfortunately, removing Twitter entirely has been difficult, because so much of the academic community is intertwined with it. My approach is to introduce artificial latency by checking Twitter more infrequently: Monday, Wednesday, Friday, and Saturday are Twitter-free days.

I also periodically delete tweets (usually once a year) and consider them to be ephemeral. This reduces the “social media debt” otherwise required to engage with older, less relevant tweets.

Productive Procrastination

I’m not a point where I can avoid procrastination entirely, but at least I can be productive about it through reasonable alternatives:

  • GetAbstract offers 10-minute summaries of popular nonfiction books. These summaries feel “tweet-sized” and help me fill in otherwise short and unproductive gaps in my day.
  • Libby provides access to public library eBooks.
  • Similarly, O’Reilly provide access to a large catalog of technical content.
  • I have subscriptions to the New York Times and the Wall Street Journal.
  • If I don’t want to read, PluralSight has content in video form.

Microsoft Company Store

Every year the Microsoft Company Store lets employees offer up to 10 Friends and Family passes. If you’d like one, reach out to me. The most useful discounts are for Office, Xbox, and Windows:

Restrictions

Each pass allows you to spend your own money on $250 of digital downloads at employee prices. Passes can’t be distributed to any Microsoft FTE or Intern, or anyone employed by any government agency.

Passes Remaining

I have 8 passes remaining for 2020.

An Opinionated Onboarding Setup for New Hires at Microsoft

Welcome to Microsoft! I’ve written this guide to reduce decision fatigue about what to do during onboarding. This information complements what you already receive during New Employee Orientation (NEO). As I get recurring questions from new hires, I’ll update this guide accordingly. (If you’re not at Microsoft, this page isn’t particularly useful for you.)

Some assumptions:

  • You work at Microsoft in Redmond or the Puget Sound area.
  • You will primarily invest in tax-sheltered accounts, with checking, savings, or money market (SPRXX) accounts for everything else.
  • You have an emergency fund for 3-6 months of expenses in place (if not, prioritize building your emergency fund before investing).
  • You will pay for your day-to-day expenses almost entirely through bonuses and stock (both RSU and ESPP) until you hit your tax-sheltered limits, redirecting your base salary to retirement.
  • You prefer a set-and-forget approach to managing your life instead of day-to-day management.

Initial Day-to-Day Life Setup

Rename your alias. Usually, you’ll need to change your automatically-generated alias (for example, tibari to tbarik, internal link). You’ll want to follow the Microsoft account naming conventions (“intuitive aliases”) , which basically means some combination of letters from your first and last name. The naming conventions aren’t strictly enforced (and Microsoft Research has an explicit exemption), but colleagues find it pretentious when people choose exotic aliases. Do this as early as possible.

Your canonical e-mail address is usually [email protected]. Although you can be e-mailed at your alias, the canonical e-mail address is what you should give to others outside Microsoft.

Machine setup. On the first day, you’ll need to reformat your workstation and laptop using PXE boot, through the corporate network. Alternatively, you can install a fresh Windows 10 Enterprise image, and then join the domain. Don’t try to join the Microsoft domain using an OEM Windows installation. Name your computers such that they contain your alias (for example, tbarik-ws).

Activate subscriptions. As a Microsoft employee, you have complementary access to some amazing learning resources, including PluralSight and O’Reilly Online Learning. You should also go ahead and activate your subscriptions to the New York Times, Wall Street Journal, and a few others (Microsoft Library, internal link).

Professional membership dues. For researchers (technically I think anyone can do this though), you should expense your ACM and IEEE membership fees. Microsoft’s digital library also includes access to the ACM and IEEE libaries, so don’t pay for those again.

Corporate American Express card. This card takes a while to receive, so go ahead and activate it early (internal link). You only need this card if you travel somewhat frequently.

Visual Studio Online. Login to Visual Studio Online with your Microsoft account and activate your $150 personal monthly credit. Do the same for Office.

Blind. You can finally join Blind (yay?), which has an anonymous community for Microsoft. Yes, Blind is toxic, but it’s sometimes the only way to get an unfiltered lens of the company. Use sparingly.

Outlook rules. You will receive far more e-mail that you can actually manage. I only have three types of simple Outlook rules. First, for each Discussion list, I create a separate rule and folder. Second, I create a priority rule such that any e-mail from my manager or above goes directly to my INBOX (which triggers notifications). Third, everything else goes to a folder called Quiet, which I only check periodically (notifications suppressed).

Personal use. This is a surprisingly common question, so I just looked it up directly: “You may use Microsoft devices, networks and systems for personal use if it does not interfere with your job responsibilities or negatively impact corporate resources.” A separate policy says no pornography or pirated software. There are certain workstations, called Secure Access Workstations (SAW), which have a much stricter usage policy.

Matching donations. Each year, Microsoft will match up to $15000 of your donations, dollar-for-dollar. You can setup recurring donations (deducted directly from payroll) towards your most important causes through the giving portal.

Financial

Here’s how I setup my finances, if you want to do the same. Consult a financial planner for your specific situation.

This section reads like a Fidelity advertisement, but that’s mostly because Microsoft offers their 401k, ESPP, and stock awards through Fidelity.

Fidelity 401K. You will fund your 401K through both pre-tax and after-tax dollars. The pre-tax contribution limit is $19500 with a 50% match by Microsoft ($9750, also in pre-tax dollars). You should also do after-tax contributions to $28750, and for this you will do a daily Roth in-plan conversion to minimize tax burden.

The maximum total payroll deduction allowed for all sources is 65% of gross income per pay period. Microsoft will stop contributing automatically once you hit the IRS limits. Here’s a reasonable allocation:

To reach these targets, you will rely on selling Microsoft stock and using your cash bonuses for day-to-day income. For at least the first part of the year, your actual paycheck will be relatively tiny, since a significant chunk of your paycheck is redirected towards retirement.

If you start later in the year, for example September, you may need to be even more aggressive: ramp up the pre-tax contributions all way to 65% until you reach the maximum Microsoft match, then swing back to 65% for after-tax. Depending on when you start, it may not be possible to reach the after-tax limit in the first year.

The target date funds offered in this plan are really good (0.06% expense ratio). Pick one of the BlackRock LifePath index funds (based on your expected retirement age, mine is BTC LP IDX 2050 N) and put 100% in that fund. If you want to do something more sophisticated, join Microsoft’s invclub.

Employee stock purchase plan (ESPP). The ESPP plan is through Fidelity NetBenefits. Set your ESPP contributions to 15%. Your discounted (10%) stock purchases are limited to $25000 per year (well, essentially). You will automatically be refunded if you overfund your ESPP plan.

Sell your Microsoft stock the day you receive it and use it for income. The reason is that you are already overleveraged in Microsoft just by virtue of working for them.

Backdoor Roth IRA. Your income will likely be too high to contribute to a Roth IRA directly. But you can do a rollover, which basically works like this: open a Traditional IRA and a Roth IRA, then put $6000 in your Traditional IRA. As soon as the funds settle, do a rollover from your Traditional IRA to your Roth IRA.

If you have a non-employed spouse and like them, they too should have a Traditional IRA and a Roth IRA. Repeat the process ($6000) for a total of $12000 a year. But note that the money becomes theirs (the “I” in IRA is “Individual”).

There are some hurdles if you already have pre-tax accounts, in which case you’ll have to shuffle some accounts around first.

I do Boggleheads-style investing for the Roth IRA, with the funds FZROX (0% expense ratio), FZILX (0% expense ratio), and FXNAX (0.025% expense ratio).

Fidelity supports a backdoor Roth IRA out of the box: just do a Transfer from the Traditional IRA account to the Roth IRA account.

Other Fidelity odds and ends. Open a Fidelity Cash Management Account (basically, it acts like a checking account) and use it as your primary account. Direct deposit your pay to this account. You’ll have access to your pay a day early if you use Fidelity. Fidelity also has a nice Rewards Visa Signature card, for which you would now be eligible. Setup extra login security with the Symantec VIP access app.

Annual stock awards and special stock awards. You have a choice of two brokers, Morgan Stanley or Fidelity. Pick Fidelity. Like ESPP, sell your stock the day you receive it and use it as income.

These are announced in September. If you join Microsoft after March 31st, you won’t be eligible for rewards or bonuses until next year’s cycle. You will be eligible for a prorated merit increase (basically, cost of living adjustment).

529 plan. If you still have income left over, consider a 529 plan.

Open Enrollment

Just a few more items left around open enrollment. For almost all of these, I think the defaults are okay.

ARAG. Microsoft offers a group legal service, ARAG, which has gotten me out of a few traffic violations already. You also get credit monitoring through ARAG.

Health insurance. Both health insurance plans are reasonable. The HSA works a bit better if you’re single and don’t expect to have any health issues, while the Kaiser Permanente plan works a bit better if you have a family and have a Kaiser Permanante hospital nearby.

FSA. If you pick the Kaiser Permanente plan, you can utilize an FSA. If you don’t know how much to contribute, set this to $500, and adjust the following year as needed.

Other Benefits

Xbox Game Pass. A free 12-month subscription to Xbox Game Pass Ultimate.

How Should Compilers Explain Problems to Developers?

This week I presented the final part of my dissertation, How should compilers explain problems to developers?, at the Foundations of Software Engineering conference in Lake Buena Vista, Florida. This work is relevant to communities, such as Elm and Rust, that are invested in improving the usability of the error messages their tools produce.

The abstract of the paper follows:

Compilers primarily give feedback about problems to developers through the use of error messages. Unfortunately, developers routinely find these messages to be confusing and unhelpful. In this paper, we postulate that because error messages present poor explanations, theories of explanation—such as Toulmin’s model of argument—can be applied to improve their quality. To understand how compilers should present explanations to developers, we conducted a comparative evaluation with 68 professional software developers and an empirical study of compiler error messages found in Stack Overflow questions across seven different programming languages.

Our findings suggest that, given a pair of error messages, developers significantly prefer the error message that employs proper argument structure over a deficient argument structure when neither offers a resolution—but will accept a deficient argument structure if it provides a resolution to the problem. Human-authored explanations on Stack Overflow converge to one of the three argument structures: those that provide a resolution to the error, simple arguments, and extended arguments that provide additional evidence for the problem. Finally, we contribute three practical design principles to inform the design and evaluation of compiler error messages.

One λ at a time

Lambda
Our full paper, One λ at a time: What do we know about presenting human-friendly output from program analysis tools?, has been accepted to the 8th Workshop on Evaluation and Usability of Programming Languages and Tools (PLATEAU) at SPLASH 2017, Vancouver, British Columbia. The paper is useful for human-computer interaction researchers who want to understand the design space of error messages from program analysis tools.

The abstract of the paper follows:

Program analysis tools perform sophisticated analysis on source code to help programmers resolve compiler errors, apply optimizations, and identify security vulnerabilities. Despite the utility of these tools, research suggests that programmers do not frequently adopt them in practice—a primary reason being that the output of these tools is difficult to understand. Towards providing a synthesis of what researchers know about the presentation of program analysis output to programmers, we conducted a scoping review of the PLDI conference proceedings from 1988-2017. The scoping review serves as interim guidance for advancing collaborations between research disciplines. We discuss how cross-disciplinary communities, such as PLATEAU, are critical to improving the usability of program analysis tools.

Expressions on the Nature and Significance of Programming and Play

All is play.

My single-author full paper, Expressions on the Nature and Significance of Programming and Play, has been accepted to the IEEE Symposium on Visual Languages and Human-Centric Computing. I presented the paper in Raleigh, North Carolina.

The abstract of the paper follows:

Play is all around us, an essential and innate phenomenon that serves as an important mediator in creativity, interest, learning, and drive. Though play is thought to be universal, the way in which it materializes is situationally-dependent and not well-understood, particularly in software engineering. To understand how programmers express the concept of play, we conducted a qualitative study on the online social news website, Hacker News—a venue for software practitioners. From Hacker News, we qualitatively analyzed nearly 1,000 user-submitted comments containing the terms “programming” and “play.” The contribution of this work is a contemporary synthesis of how software practitioners interpret programming and play in experiential terms. Our findings suggest how programming and play can be understood through rich metaphors, among them, play as: art, playgrounds, spontaneity, and tinkering. Hacker News authors reflect about childhood experiences as a catalyst for learning programming, and contrast play against work.

Microsoft in Redmond

I’m delighted to announce that as of today, I’ve started the next phase of my research career at Microsoft. I’ll be working with the PROSE team in a dual role: as a Researcher at the intersection of programming languages and human-computer interaction, and as a Research Software Engineer to improve developer experiences across Microsoft’s software development tools.

Do Developers Read Compiler Error Messages?


Our full paper, Do Developers Read Compiler Error Messages?, has been accepted to the International Conference on Software Engineering. I will be presenting the paper at the conference in Buenos Aires, Argentina.

The abstract of the paper follows:

In integrated development environments, developers receive compiler error messages through a variety of textual and visual mechanisms, such as popups and wavy red underlines. Although error messages are the primary means of communicating defects to developers, researchers have a limited understanding on how developers actually use these messages to resolve defects. To understand how developers use error messages, we conducted an eye tracking study with 56 participants from undergraduate and graduate software engineering courses at our university. The participants attempted to resolve common, yet problematic defects in a Java code base within the Eclipse development environment. We found that: 1) participants read error messages and the difficulty of reading these messages is comparable to the difficulty of reading source code, 2) difficulty reading error messages significantly predicts participants’ task performance, and 3) participants allocate a substantial portion of their total task to reading error messages (13%-25%). The results of our study offer empirical justification for the need to improve compiler error messages for developers.

FSE Student Research Competition: How should static analysis tools explain anomalies to developers?

I presented my thesis proposal work, “How should static analysis tools explain anomalies to developers?”, at the Foundations of Software Engineering Student Research Competition in Seattle, Washington. I was awarded second place in the competition. The abstract of the short paper follows:

Despite the advanced static analysis tools available within modern integrated development environments (IDEs) for detecting anomalies, the error messages these tools produce to describe these anomalies remain perplexing for developers to comprehend. This research postulates that tools can computationally expose their internal reasoning processes to generate assistive error explanations that more closely align with how developers explain errors to themselves. My work demonstrates that tools stand to significantly benefit if they incorporate explanation principles in their design.

The associated poster for the paper is also available.