On Reading CS Papers – Thoughts & Reflections

Be forewarned:

  • This is not an advice post. There are tons of people out there who desperately want to give people advice on reading papers. Read theirs, please.
  • This post is a continuous reflection on the topic “how to read a CS paper” from my personal practice. I will list out my academic status before each point so that it may be interesting to myself on how my view on the matter has changed as time goes forward.

2018

The first year of my CS master program. Just get started on CS research.

  • It’s OK to not like a paper

In my first semester, I majorly read papers on Human Computation and Crowdsourcing.  Very occasionally, I read papers on NLP. Some papers on NLP are from extra readings in Greg’s course. Some are related to Greg’s final project, which deals with both code and language.  I don’t really like and want to read papers back then. In NLP class, I prefer to read textbooks (Jufrasky’s one) and tutorial posts that I can find online. One roadblock for me to read papers is that there is certain background knowledge gap I need to fill and I just simply don’t know how to read a paper. So, for Greg’s NLP course, I only read some papers related to my final project. This paper is the base paper for my final project. I got this paper from professors in linguistics and software engineering and they want me to try out the same idea but using neural network model instead. I read this paper several times and the more I read, the more I want to throw up.  I just think this paper hides many critical implementation details and the score 95% is just too high for me to believe. The authors open source their code but their code has some nasty maven dependencies, which won’t compile under my environment. Their evaluation metric is non-standard in NLP and many “junk words” wrap around their results. Of course, the result of my experiment is quite negative.  I often think it is just a waste of life to spend your precious time on some paper you dislike.  Here, I’m more of talking about paper writing style and the reproducibility of papers’ results. I probably want to count shunning from some background gap as a legitime reason not like a paper.

  • Try to get most of the paper and go from there

I got this message from Matt’s Crowdsourcing class. In the class, I have read a very mathematical heavy paper, which invokes some combinations of PGM and variational inference on the credibility of fake news. I’m worried back then about how should I approach a paper like this one, which I’m extremely lack of background and mathematics formula looks daunting.  I pose my doubts on Canvas and Matt responds in class and gives the message.  I think the message really gives me some courage on continuing read papers.

  • It’s OK to skip (most) parts of a paper.  Remember: paper is not a textbook!

This semester I’m taking a distributed system class. To be honest, distributed system paper can be extremely boring if they are from industry. Even worse, system paper can be quite long: usually around 15 pages, double column. So, if I read every word from beginning to end, I’ll be super tired and the goal is not feasible for a four-paper-per-week class. So, I have to skip. Some papers are quite useful maybe just for one or two paragraphs. Some papers are useful maybe just because of one figure. As long as your expectation about a paper gets met, you can stop wherever you want.

  • Multiple views of reading a paper

I didn’t get the point until very recently. I did quite terrible on the first midterm of my distributed system class. The exam is about how to design a system to meet a certain requirement. In the first half of the course, I focus on the knowledge part presented by the paper but that doesn’t work out well. Until then, I realize that I need to read those systems paper from a system design point of view: what problems they need to solve, what challenges they have, how they solve the challenges.  OF course, those papers are valuable from knowledge perspective: how consistent hashing works, for example. But, depends on the goal of reading paper, I can prioritize different angles of reading a paper. If I need to implement the system mentioned in the paper, I probably need to switch to a different paper reading style.

  • Get every bit of details of paper if you need to

It’s time again for the final course projects. Again, I need to generate some ideas and find some baseline papers. In this case, “skip parts” and “get most out of the paper and move on” strategy probably won’t work well. All in all, I need to understand the paper and those are rely on the details from the paper. In this case, I need to sit through the whole journey and remove any blockers that I may encounter.

Advertisements

Does teaching matter?

I really hesitate whether I should spend my precious hours during the working days composing this blog post. However, I feel I should. I wrote down the title several days ago but I felt some pieces were missing to formal a relative concrete post. However, today, the miracle happened and I can finally complete my puzzle.

Several days ago, I feel quite frustrated because there is a homework due for one of my classes and I have no clue how to finish it. I dig into the books on the subject and try to research the solution out. The most frustrating part isn’t the whole process of seeking answers. It from the lectures. The class is quite popular among the CS graduate student and no matter what areas of their research, everyone I know in the program will take this class sooner or later. The professor for the class is quite famous for his research but I have to say that the quality of the teaching is controversial. By controversial, I mean there is a debate in my head on whether his style of teaching is good or not. If you are familiar with Prof. Andrew Ng’s CS229 lecture videos, then his style is exactly opposite of Prof. Andrew Ng’s. Unlike Prof. Andrew Ng’s mathematical teaching style, professor in my class skips most the f derivations of the formulas and in some cases, he will read through the slides and talk loud about some steps of the derivation. He usually ends the 90 minutes lecture 30 minutes early and in-between he may make some jokes or take a diverge into his research areas that might seem related to lecture topic. The good side of his teaching is that he may offer some intuitions or insights on why we perform those steps and sometimes those few words may help you connect the dots. His teaching style may look like a good fit for someone has a solid background in the field but if you are relatively new to the field, you may have some hard time. This “twisted” class partially leads to my question in the title: “Does teaching matter?” For me, under the context of trying to finish the homework, I cannot see any good from my professor’s lecture style.

The reason that I now look quite peaceful in accepting his lecture style is because of some new insights into research. In a nutshell, you just really don’t have enough time getting everything figured out all at once. Once you’re inside the graduate courses, you will start to read research paper immediately. There can be a lot of background knowledge you need to clear up especially you are new to a field. However, can you say “let me take a pause and get everything figured out at the first.”? No! There are unstoppable piles of papers coming to you and all you need is try to iteratively make best out of the paper. If there are mathematical formulas you don’t understand, in most cases, that’s ok as long as you get a big picture of the paper. The formulas matter the most when you actually start to build your own models. But, that’s not like I have to super clear about every bit of variables appeared in the set of formulas. Many of times, you can take them as given and go straight to use them as basic bricks to build your own building. This feels a lot like playing with LEGO: you don’t care how each piece is made of. You simply use them to build your stuff. The way of looking at knowledge is totally different from your undergraduate where you are tested out every bit of information taught in class through the exam. This observation may look easy but it is really hard from psychological perspective especially when you are a strict person who holds tight to your knowledge system. This psychological barrier is hard to break when you have relative enough time to read through a single paper. You may really hog onto the background or related work section of the paper and you may think there is always a piece of information that you find yourself unclear. Then, you take several months to study the material in order to move a few words to the next sentence of the paragraph. That’s exactly the beauty of the graduate school where you get bombarded by the papers. You just simply don’t have enough time to get everything cleared up before moving on. Classes are heavily centered around the papers and you are sort of expected to figure out on your own by adopting an iterative approach to the knowledge understanding. Take PCA algorithm as an example. The first pass of the material may just simply know how to follow the algorithm and implemented it. The second pass of the material may involve understanding the intuition behind the method and some mathematics derivations. The third pass of the material may actually need to dive to figure out every bit of information and so on.

Now, let’s get back to the question: “Does teaching matter?” It is sort of yes and no question depending on the perspective. From the undergraduate perspective, the hand-holding strategy is probably the must because that’s how we help students build the solid knowledge foundation and allow them to have the basic strategies to survive in the water. Now, for graduate students, it’s debatable whether we should go freestyle of teaching like my professor of the class or we still proceed somewhat like hand-holding but with modification. I guess that depends on the information that the instructor wants to deliver: knowledge itself or how the research is done.

P.S. The miracle happened to me today is during the calculus discussion section, a bunch of freshman chats out loud when I try to explain the solution of the problem to the class. That brings me to think whether the education quality of public system relatively weak compared to the private institutions is due to the quality difference of students. People may think that the reason why faculty in public universities don’t really care about teaching that much is due to the lack of the incentives. But, I’m now starting to think whether that also probably involves another party as well: the students who in short give the wrong signals to the faculty who try hard to achieve teaching excellence. That’s probably an another post in the future.

 

“Research” Interest

This week Friday, I meet with my future roommate in Beijing. During the lunch, we had a conversation about each one’s research interest. My roommate, likes me, is also a CS graduate student at Austin. However, unlike me, he has a clear vision about what direction he is going to pursue in graduate school. He just finished his undergraduate degree in Automation department at Tsinghua University. Automation department, as he explained, is similar to a mixture of mechanical engineering and electrical engineering. He has interest in mathematics since high school and naturally, he wants to work on machine learning theory in graduate school with emphasis on computer vision (CV).

Now comes to my turn. That’s a hard question I have been thinking about for a while. I don’t have clear vision on what I’m going to pursue next. I think maybe I’m too greedy and want to keep everything. However, I also realize that I may not be as greedy as I thought initially. I know I don’t want to work on computer architecture, computation theory, algorithm, compiler, network. Now, my options really just choosing among operating system, database, and machine learning. For the machine learning, I even know I probably won’t choose computer vision eventually (still want to try a course though) and I more lean towards the natural language processing (NLP). However, picking one out of those areas is just too hard for me now, even after I did some analysis in my last post trying to buy myself into picking machine learning only. There is always a question running in my head: why I have to pick one? Sometimes I just envy the person like my future roommate who doesn’t have this torture in his mind (maybe he does? I don’t know).

This feeling, to be honest, doesn’t new to me. When I was undergraduate facing the pressure of getting a job, a naive approach is just locking oneself in the room and keeping thinking what profession might suit me the best. After two years of working, I grow up enough to know that this methodology on making choice is stupid and I also grow up enough to know that “give up is a practice of art”. Why I’m in this rush to pick the direction I want to pursue even before I’m taking any graduate course yet? Why can’t I sit down and try out several courses first? Because I want to get a PhD in good school so bad. Let’s face the fact that people get smarter and smarter in generations. Here “smarter and smarter” doesn’t necessarily mean that people won’t repeat the mistake that happened before. It means that people will have better capability to improve themselves. Machine learning is not hot in 2014 from my experience in college. Back that time, Leetcode only has around 100 problems. I have no particular emotional attachment to machine learning material when I’m taking the AI class. Maybe because wisconsin has tradition in system area? I don’t know. However, in 2017, everyone, even my mother who is a retired accountant, can say some words about AI, machine learning. Isn’t that crazy?

On my homepage,  I write the following words:

I like to spend time on both system and machine learning: system programming is deeply rooted in my heart that cannot easily get rid of; machine learning is like the magic trick that the audience always want to know how it works. I come back to the academia in the hope of finding the spark between these two fascinating fields.

Trust me, I really mean it. Maybe because I graduate from wisconsin, I have naturally passion for system-level programming, no matter it from operating system or database. Professor Remzi’s system class is just a blast for anyone who wants to know what’s going on really under the software application layer. Professor Naughton’s db course is fully of insights that I can keep referring to even I begin to work a DBMS in real world. Wisconsin is just too good in system field and this is something that I can hardly say no even I have work so hard lie to my face saying that “system is not worth your time”. What about machine learning? To be honest, great AI dream may never accomplish. Undergraduate AI course surveys almost every corner of AI development but only machine learning becomes the hottest nowadays. Almost every AI-related development nowadays (i.e. NLP,  Robotics, CV) relies on machine learning technique support. Why I’m attracted to machine learning? Because it’s so cool. I’m like a kid who is eager to know what is going on behind magic trick. Machine learning is a technique to solve un-programmable task. We cannot come up with a procedure to teach machine read text, identify image object, and so on. We can solve these tasks only because the advancement of machine learning. Isn’t this great? Why both? I think machine learning and system becomes more and more inseparable. Without good knowledge about system, one can hardly build a good machine learning system. Implementing batch gradient descent using map-reduce is a good example in this case.

I just realized that I haven’t answered the question about rushing towards the making decision. In order to get a good graduate school to pursue PhD, you need to demonstrate that you can do research. This is done by publishing papers. Most of undergraduates nowadays have papers under their belt. That’s huge pressure to me. Master program only has two years. I cannot afford the time to look around. I need to get started with research immediately in order to have a good standing when I apply to PhD in 2018.

So, as you can tell, I have problem. So, as a future researcher, I need to solve the problem. Here is what I’m planning to do:

  • Take courses in machine learning in first semester and begin to work on research project as soon as I can. I’ll give NLP problem a chance.
  • Meanwhile, sitting in OS class and begin to read papers produced by the Berkeley Database group. People their seem to have interest in the intersection between machine learning and system. This paper looks like promising one.
  • Talk to more people in the area and seek some advice from others.
  • Start reading “How to stop worrying and start living

Will this solve the problem eventually? I don’t know. Only time can tell.

Thoughts on PhD

Preface

This post serves as a record of thoughts regarding PhD. This post is from the person who is about to embark on the journey of getting a PhD. The thoughts from this post may look stupid or naive for someone who has already gone through the phase. However, based upon my past experience, if you don’t have some baseline for something that you decide to begin fighting for, you can barely have a measure on how much you have progressed when you actually start to fight, and highly likely, you may fall into the same trap over and over again for future similar situations.

Let’s dive into …

I have been considering getting a PhD since my sophomore year at University. This page summarizes some commonly-seen motivation for people getting a PhD. I think mine can be partly described as “Dr. Hu — sounds cool!” and “Eternal quest for knowledge (yeah right!)”. Another part, I guess, probably be the childhood dream of becoming a scientist. However, now, after two years working in industry, I realize that the most important motivation for me to get a PhD is that I want to have the ability to solve the problem that nobody explores before. This is different from quick learner because quick learner means grasping the material that has already studied before quickly. But, that doesn’t mean he can handle the unexplored area very well. I like to ask “why” when I face a problem but gradually I realize that I don’t have enough knowledge and more importantly, the confidence to solve some of crazy ideas in my mind. So, getting a PhD means I build some good knowledge foundation in some specific area and process the ability to solve any open question, even it is from the area that I haven’t explored before.

However, getting a PhD is a non-trivial task and itself demands lots of commitment. I personally view PhD and marriage are two the most serious commitment a person can ever give in his entire life. I try to play “rational” card here by doing some evaluation beforehand because I’m type of do-something-that-can-be-successful person and several years of study in economics make me become more and more like a “risk-averse” kind of person. So, I do various RA jobs in various departments (i.e. Math, Psychology, Biostatistics) to get a sense of what PhD life might look like.  The result is not good for me because I find out that working on some topics that you have the least interest in can be a lot like being in jail. However, those undergraduate research experience also has its positive side.  Imagine if I jump into the graduate school directly and choose some direction that I have no interest in (i.e. medical imaging),  I am sure that I cannot survive till the end. Sometimes I feel that the process of making decision is a two-way street: one way to do is to pick something that fits you from the pool; the other way is that you get rid of the choices that definitely do not work for you and then see what is left inside the pool. Apparently, for me, the latter strategy works slightly better.

So, I choose to work in industry for two years to find the things that I have passion about.
There isn’t much left in my choice pool by that time: I either pick from AI or from system.  For AI,  my focus is majorly on the application of ML techniques, such as CV, NLP. For system, my choice is distributed system (i.e. distributed database, distributed file system).  So, I need to carefully think about the pro and cons for which track I decide to pursue. Like I said in my offer choice post, there can hardly be a perfect choice that meets your need by any measure. Preference ranking in economics may be too ideal.  There is always trade-off. After spending two years working in database, I realize that I’m not really a hardcore system guy.  The most attractive feature from system is that I can do lots of coding. The coding here is naturally different from coding in, say, ML. In system, most of coding is done involves implementing data structures, data process models, and so on. However, for ML, coding is more like a direct translation of some mathematical formulas. However, the problem I find out about system research is that it is hard to propose problems that can directly link to the industry level production. This problem becomes clear to me after I attend DTCC 2017 last week.  The key success element for building a system is the production painpoints or user scenario. Alibaba and Tencent build system just to cater their specific business scenarios. In my view, the system has value once it can solve some specific problems that are not formed from someone’s imaginary. This can be very hard for newcomers who just join inside academic circle. In this case, advisor may work like a offer manager who regular visits companies to see what kind of problems they try to solve and bring those practical problems back to the research group and hopefully these issues can be resolved by his students. Research is all about solving problems and great research comes from the problems that have or potentially have great impact in the industry or people’s daily life.

In addition, if I recall the fun course experience from my undergraduate, I realize that I have much more fun with manipulating formulas and work out the problem that has strong connection with people’s daily life. The biggest trend right now is on big data. However, to be honest, for database system developer like me, I can barely get in touch with actual big data in my daily work. So, whether the system I build is robust enough to handle the actual big data, I don’t know. The only thing I can say is that I implement the design correctly. So, I feel like it is really hard to work out some good system by spending most of the time in school. This idea is partially confirmed by the trend that people jump out of academia and head to the industry like this.  However, even I have spent almost all the space so far talking about the “problems” I have observed about system research. I do enjoy the “traditional” programming scheme that system research possess. Rather than taking some data and train a somewhat blackbox network to achieve outcome, traditional if-else programming feels more rewarding for a hardcore programmer.

For AI, things can be radically different from system. Specifically for ML, one thing I learned is that ML is used to solve for the task that can hardly solvable by traditional programming, like autonomous driving, pattern recognition. Those stuff has strong connection to people’s daily life, which means can make a lot more impact. This is some historical pattern that can be easily observed: serving individual people is lot more profitable than serving big companies. Doing research on system is a lot like serving big companies if we consider the problem: who needs to build infrastructure from scratch? However, working on AI is a lot like serving people by making iPhones. If we observe the trend of companies like IBM and Apple, this analogy can easily work. So, even programming in ML is less satisfying in my sense, we just need to embrace the future to better maximize our utility.  Of course, mathematics are quite bit involved in the field of AI, and tweaking parameters of learning models can feel quite subjective. However, I guess that’s some obstacles I need to face. The rationale is same as before: there is no perfect choice and we just need to try even if we have only 10% confidence about success.

Last word …

The motivation for considering this issue right now is that I need to start planning my course schedule for the upcoming semester. The course schedule can be balanced between system and AI. But, it can also be AI focused. So, I really need to evaluate myself to see which direction I want to go.  There is a famous quote in China: “Choice matters!”

Nano-thought on Research

午休时间打开手机上的知乎随便读了一两个post。其中有一个作者的回答吸引了我的注意:

CV/ML顶级会议上的灌水文都有哪些特征?如何快速判断顶会论文是在灌水?- 李沐回答

在回答中作者这段话让我感触颇深:

学术会议除了交流学术成果外,还有一个重要功能是培养新人。虽然存在super star登场就是开创性的工作,但绝大部分人还是从小白文开始,慢慢积累经验。

小白文一般来说是基于前面的工作,做一点细微的改动,然后有理有据的把结果写下来。这样一方面通过实际动手熟悉这个领域,另一方面练习写作。但从读者的角度来说,这些小白文十有八九是灌水。

所以对于研究者来说,一方面既不要觉得灌水是耻辱也不要觉得这就是目标,比较好的心态是总是保证下一篇文章比上一篇要好。
这段话让我回想起我在wisc上Econ 580时教授Kenneth West对我们说过的话(大意如下):

Making a theoretical new model is certainly called innovation and definitely counted towards research. However, applying an old model to a completely new data set can also be called research and innovation. By all means, you shouldn’t feel shame of it.
对于一个刚接触research, 不知天高地厚,并且又有点妄自菲薄的我来讲,这句话当时我并没有听明白同时我也没有听进去。当时候我的想法就是我一定要通过这门课搞出个惊天大模型出来,即使一个独立模型没搞出来,我也要弄出个模型衍生品出来。我从内心多多少少有些鄙视那些把在原型paper中一个对油价的linear regression model放到自己的insurance cost data的做法。但是,与此同时,我内心里确有另外一种声音;这个看起来是一个挑战很大的任务,你真的可以吗?

一学期下来我发现做research是一个既困难又容易的事物。一方面觉得同一个模型放到不同数据上就能叫research让整个research看起来不是那么daunting。另一方面,却又觉得那些在我看来不那么“细微的改动”的research是多么的不容易。最后,我还是成为了被我鄙视的那类人,写了这篇关于足球运动员转会费的研究。

另外就像知乎作者回答的那样,不要觉得写这些所谓的“灌水文”就是耻辱,万事都需要过程, “绝大部分人还是从小白文开始,慢慢积累经验”。把自己当作目标,“保证下一篇文章比自己上一篇要好”就可以了。

最后,贴出一个我比较喜欢的quote,多多少少和这个topic有些关联吧:

“There are two kinds of gifts. First, there is the innate gift of a given skill. This is a minor gift. If you have this gift, a skill such as doing math or playing the piano comes naturally to you. There are millions of people with minor gifts of all kinds who never do anything great with their gifted skills, because they lack the major gift.

The major gift is the love of the work. This might seem backward. How can love of using a skill be more important than the skill itself? It is for this simple reason: if you have a major gift, you will do things with the skills you have. And keep doing them. And your love of the work will shine through. And through practice, your skills will grow and become more powerful, until your skills are as great or greater than someone who only has the minor gift.

There is only one way to find out if you have the major gift. Start down the path, and see if it makes your heart sing.— From “The Art of Game Design”[Schell ’08]