Leaving IBM

To be honest, this is probably the most difficult post I have ever written. This is majorly because there is a ton of stuff I want to say but I’m unsure whether I should keep them public or should keep it to myself. Another factor that makes this post hard to write is because the span of drafting. I have been drafting this post since April in 2016, right after when I decide to start the whole process of quit-IBM-and-get-a-PhD project.  I used to use this post as a log to record things and feelings when somethings happens around me at IBM. Frankly, if I take a look at the stuff I record (mostly are rantings) retrospectively, lots of stuff still hold but the anger just passes away with the time. So, that year-long drafting really makes me hesitate even more because the mood when those stuff are written are gone. However, two years can be a significant amount of time and quitting IBM can be called “an end of era” and I should give a closure to my happy-and-bitter experience with IBM anyway. So, here it goes.

 

Thank you, IBM!

I’m really thankful for the opportunities working with IBM. This experience really makes me grow both technically and mentally.  Technical-wise, I have the opportunity to get hands on experience with DB2 development. DB2 as a database engine is extremely complex. It has over 10 million lines of code and it is way beyond the scope of any school project. Working on those projects are quite challenging because there is no way you can get clear understanding of every part of the project. I still remember when I attend the new hire education on DB2, there is one guy says: “I have been working on the DB2 optimizer for over 10 years but I cannot claim with certainty that I know every bit of the component I own.” This fact really shocks me and based upon my experience so far, his claim still holds but with one subtle assumption, which I’ll talk about later. There are lots of tools are developed internally and reading through both the code and tool chains are a great fortune for any self-motivated developers. I pick a lots of skills alongside: C, C++, Makefile, Emacs, Perl, Shell, AIX and many more. I’m really appreciated with this opportunity and I feel my knowledge with database and operating system grow a lot since my graduation from college.

Mentally, there are also lots of gains. Being a fresh grad is no easy. Lots of people get burned out because they are just like people who try to learn swim and are put inside water: either swim or drown. I’m lucky that my first job is with IBM because the atmosphere is just so relax: people expect you to learn on your own but they are also friendly enough (majority of them) to give you a hand when you need help. I still remember my first ticket with a customer is on a severity one issue, which should be updated your progress with the problem daily. There is a lot of pressure on me because I really have no clue with the product at the very beginning. I’m thankful for those who help me at that time and many difficult moments afterwards. That makes me realize how important is to be nice and stay active with the people around you.  Because no matter how good you are with technology and the product, there are always stuff you don’t know. Staying active with people around you may help you go through the difficult moment like this by giving you a thread that you can start at least pull. In addition, participating with toastmasters club really improve my communication and leadership skills and more importantly, I make tons of friends inside the club. Without working at IBM, I probably won’t even know the existence of the toastmasters club. If you happen to follow my posts, you’ll see lots of going on around me when I work at IBM. Every experience you go through offer you a great opportunity to learn and improve yourself. Some people may look at them as setbacks but for me, I look at them as opportunities.

toastmasters1

( the picture on the left is all the comments people give to me about my speech and on the right is the awards I have earned inside the club in these two years)

With the help of all those experience, I have developed a good habit of writing blogs (both technical and non-technical), reading books, and keep working out six days per week. All those things cannot be possible if I work at a place where extra hour work commonly happened. I’m very thankful for IBM for this because staying healthy both physically and mentally are super critical for one’s career. Even though those stuff don’t directly come from IBM, but IBM does provide the environment to nurture this things to happen.

 

IBM has its own problem. The problem is centered around people. There are many words I want to say but I think I’ll keep them secretly but I want to show my point with a picture:

ibm_survey

I don’t know why IBM’s term “resource action” on firing employees and the sentence “IBM recognize that our employee are our most valuable resources.” bother me so much. I probably just hate the word “resource” as a way to directly describe people and how this word get spammed so much around IBM. I know everyone working for a big corporation is just like a cog in a machine. However, what I feel based upon lots of things happened around me is that IBM as its attitudes represented by its first-line managers (because those people I commonly work with) makes this fact very explicitly. It hurts, to be honest. No matter how hard you work and no matter how many prizes you have earned for yourself and your first-line manager, you are nothing more than a cog in a machine, which is not worth for high price to have you around because there are many cogs behind you that are ready to replace you. They are much cheaper, much younger, and more or less can work like you because your duty in the machine is just so precisely specified, which doesn’t really depend on how much experience you have had under your belt. To me, that’s devastating.

This leads to the problem that talented people are reluctant to stay with company. My mentor and the people are so good with DB2 have bid farewell to the team. That’s really sad to me because they are the truly asset to the company and the product. The consequence of this is that crucial knowledge is gone with people. Some quirks existing in the product are only known by some people and once they leave the company, the knowledge is gone with them. That makes mastering of the product even harder. That’s the subtle assumption that the person makes during the new hire education and that’s also part of the problem when working with legacy code. The whole legacy code issue is worth another post but one thing I now strongly believe is that any technical problem has its own root cause in company culture and management style. To me, I’m not a guru now but I cannot see the way to become a guru with my current position, which scares me the most

That’s it for this section and I’ll leave the rest to my journal.

Advertisements

First time ever working as a host

I’m not sure how system works in other parts of the world, but in China, we usually throw a company-wide “party” near the end of the year. By “party” I don’t mean that people got hammered and play silly. The “party” is sponsored by the company as a thank-you event for all the employees. During the “party”, we usually do multiple rounds of lucky draws for the prizes and we also produce some shows – dancing, solo, rap, and so on. That is what we called “年会” in Chinese. Even for international companies like IBM, there is no exception to this convention.

Probably due to the really early spring festival (in January), we put off our “party” until today (Feb. 18th). However, I have to say that the preparation of this event starts long time ago. The audition for the host is on Jan. 14th and I’m guessing the preparation for this event really starts near the end of 2016. The reason I want to participate this event this year is that I want to make full out of the IBM experience. Usually, I’m not a huge fan of participating a big crowded entertaining event as an audience. I want to be in it and try to learn or have unique fun from my participation if possible. In addition, I have joined Toastmasters club for more than a year now and I want to see if I can really make some progress on the public speaking. One day when I actually receive the email calling for host audition, I make no doubt and click the “signup” button.

Boy, I have to say that show business can be quite energy consuming. Unlike working as a programmer dealing with physical or virtual devices, the central task in show business is around people. You need to try really hard to get people like you and enjoy the whatever stuff (i.e. voice, words, sentences, gestures) you deliver. The first big checkpoint for me is to pass the audition. When I actually get agenda for the audition, I realize the competition is somewhat fiercer than I thought. There are around 15 people competing for the hosts – two boys and two girls. Someone even tells me that the returning hosts from the previous year are also in this game. The task for audition is to read part of the script from previous year while staring at the camera. This makes me super nervous because camera is really like a black hole: you never get the feedback from the stuff you send to it. I have to push all my energy out and make myself super hype for this job even though that’s really not who I am in the daily life. I raise my voice super loud and weave my over-dramatic gestures in between. I walk around the stage like some rap star and try to make myself the owner of the stage. At some point, I feel blackout. I don’t care what lines I’m reading and I just push my boundary to make the whole atmosphere super hot. On my way to home, my brain couldn’t function properly and I fall deep asleep in the cab. Luckily, after almost two weeks, I get the callback from the casting director.

The next huge part of the task is working on the lines and repeatedly rehearsal with the other three hosts. Just like those documentary or footage showing what actors do in set on TV, we need to sit around and go through the script line by line to make sure we are working as a cohesive group to make the actual event go as smoothly as possible. Also, another big reason for this grinding is to get to know your partner. What he is comfortable with and what he is not. Probably unlike actors, we have almost full control on what we can say on the stage. Usually, we sit together and come up with a draft and then we send them to the review and then we repeat the same cycle for another draft. By the time we on the stage, the draft number is 21. The hardest part for me is that you need to separate the role from the actual yourself. In other words, I have to repeatedly tell myself that what you play on the stage is really not who you are in your daily life. People will not judge you by any means. This can be super hard and I can sense that actors may need to repeatedly tell themselves about this point specifically when they are really into the part they are playing. In my situation, since my partner and I are really the rookie to the hosting, we are in charge of the pre-show warmup and the transitions between different programs. Director definitely do not want to leave the lucky draw part for us because in that part, you need to deal with many big bosses who will be the guest to actually draw the number on the stage. You cannot say their titles wrong, you cannot say their names wrong and there are no script for bosses, so you have to act smart to handle any unexpected incidents. In order to make the atmosphere warm before the show, the casting crew’s idea is to play games. One of the game is that the audience needs to make the facial expression based upon the host description and I’m the one who makes the demo, which is a trademark facial expression of a famous Chinese comedian “小岳岳”:

5f66841255

I have to say it is a big challenge for me because I don’t use weibo, which is the major repository for those GIFs and I’m not really into this meme style setting. But I somehow still make it and with some self-loathing mindset.

Rehearsal is really time consuming and frustrating. We do rehearsal yesterday from 1pm to 10pm and this morning from 9am to 12pm. The major part of time has nothing to do with hosting but with the program itself. The reason we stuck in there partly due to we need to work on the actual transition and keep refining lines. Some problems about the lines can hardly be observed by reading off the stage. For example, originally, my partner should make a gesture by putting her arms in parallel to make me look like I’m doing the facial expression in TV. However, we realize that this is impossible to do on the stage because she is holding a microphone on one of her hands and the distance between she and me is beyond her reach. Another example is from the other group. Since they are majorly in charge of the lucky draw part, there are lots of gadgets need to be put onto the stage. For instance, the box contains all the numbers that the guest should pick from. We don’t really consider there is a time chunk we need to say something to avoid the awkward silence while the staff try to put the box on to the stage. There are many examples like this when we do rehearsal and to my surprise, new problems happen no matter how many times we have rehearsed on the stage.

1:30pm today is when we need to actually get the job done. Without doubt, problems happen. My partner has really slim body and the address she choose is a little bigger. It takes almost 20 minutes for her to somewhat fix the address on her body and we are supposed to be on the stage to do pre-show warmup at 1:30pm. In the end, the warmup is cut into only one game and the order of lines are messed up. I don’t blame my partner because things always happen for a live show. This makes me realize there is really a lot of stuff need to practice and to learn as a professional host and experience for this profession is really really important. One thing in this scenario is that you need to be smart and keep your rhythms  on the stage even the procedure doesn’t meet with your expectation.

Overall this is really a unique experience for me and I actually learn something from it. The following is some notes I take when we have a training with a professor from Communication University of China. The session is conducted in Chinese and I don’t know the corresponding term in English. So bear with me:

  • 视线一般瞄准主机位。若紧张可以斜向上25度。
  • 视线一般要照顾到每位观众。以主机位为轴,左右45度形成的扇形为实现覆盖区域。
  • 视线一般要照顾全场,即左手区域,中间区域,右手区域,每个区域视线停留5秒左右
  • 上台四点一线:头,肩,臀,脚后跟。站直。不可以有小动作,乱晃
  • 男士那麦克风的手自然下垂。上台时也保持这样姿势即可。
  • 话筒离下嘴唇两只手距离,保证收音效果。

Lastly, let me post some photos from the event as a good memory for this incredible journey.

This slideshow requires JavaScript.

 

 

First Time Ever Hackathon

I have never been to hackathon before. In my imagination, hackathon is like a festival for people who have passion about creating cool stuff during a limited amount of time. Hackathon is going to bring tons of fun if you can work with dedicated people on some interesting idea and try to make the idea into reality. Luckily, Early Professional Hire (EPH) Hackathon event hosted by China Development Laboratory (CDL) meets my expectation perfectly.

Experience Recount

When I first time heard about the hackathon on EPH Day 5, I’m a little intimidated. This is because the solution demo you present at last needs to fit in certain hackathon topics. The topics include business platform innovation, light-weight e-commerce reinvention, blockchain, application of Watson technology into medical services, and integration of Watson technology with 3D demonstration. All these topics are super cool, very advanced and I barely have a chance to get touched in my day-to-day work. So, I’m not sure if I can handle those topics well. However, I want to give this hackathon a shot: not only because this is part of EPH program but, more importantly, one of the topics really catches my eyes- that is, the integration of Watson technology with 3D demonstration.

This topic actually has a name, which is called Watson Introspector. It is a cognitive tool for understanding software, answering questions, and interacting with software architecture and data flows in 3D.  This topic suits my interests perfectly. It has always been a challenge for new comers to study a code base especially when the code base has been evolved for several years. What sets of functions or data flows get involved in certain feature of the software has always been the type of questions we are asking all the time, especially when bug fix or enhancement request kicks in. Conventionally, there are tools to help us to visualize the code path like debug trace, UML graph, and so on. However, none of them are straightforward and fun to use if we put them under new comer education context. I can hardly imagine some guy will choose staring at the debug trace on Friday night over going out for a date. So, I think the visualization of code path in 3D and get some question answering system integrated (like Watson) may probably make our software developer’s lives much easier.

After I set the topic that I want to work on the most, the next step is to get a team. Originally, there are four team members besides me within the team. All of them don’t have any experience with any technology involved in the topic. This is perfect because neither do I. However, all of them are testers, which make my situation a little bit difficult. This means that  I am the only developer in the team and I will take much more responsibility than I thought I would. But, that’s OK because my teammates want to grow with me and want to give the hackathon a try. So, I become the captain of the team.

Everything goes quite well for the first two weeks. We narrow down the architecture of the solution: we are going to make a 3D space game just like the classic snake game. In the study mode, player is free to explore the 3D world, which is constructed from the classes and functions parsed from a random-selected Java project. Inside the world, the player can interact with Watson on what kind of feature he wants to learn and Watson will return a code path that best meets the player’s request. Then player can spend his time getting familiar with the call stack of the functions along with the purpose of each function. In the test mode, the player is required to visit the functions he just studied in the correct function call order so that he can win the game. Our goal of making this game is to offer a fun way to learn about a source code project and we believe educational game best suits our needs.

Then, on September 14th, everything is just changed. Once we settle down the architecture of the solution, two members decide to quit with the excuse of limited time. I got this feeling that someday they are going to quit but I didn’t expect this timing. They refuse to work on the solution outside of the work time. This makes me quite frustration because anyone who attends hackathon should expect that he is going to spend fair amount of time outside of work to finish a demo. Even worse, this means we probably don’t have enough resources to finish our solution. It looks like mission impossible with only one developer and one tester left with the team.  But, a second thought comes into my mind. I work as the president of IBM Diamond & Ring Toastmasters Club. One important lesson I learn from it is that as a leader, the first priority task is to take responsibility and get the job done no matter how difficult the situation is. Under my current situation, my goal is to at least finish this hackathon, and I need to make this happen. Plus, I’m not alone: I still have a teammate, Rachel, who wants to give out all she has in order to succeed in the event. I just cannot let her and myself down.

In the final week,  we work super hard with our adviser, Trent, in order to get a demo working.  Even during Mid-Autumn Festival holiday, we still come to the office around 2 pm in the afternoon and hack through the rest of day to 1-2 am. That has been the theme for the whole week. On Tuesday, September 20th, right before the final day, we work over 30 hours to 4am, September 21th to do bug fix and 3D modeling. Rachel and Trent live closer to the office, so they rush home to get some sleep. I, however, live really far away from the company (I live southern 4th ring of Beijing) and unfortunately, I have to take a nap at the office coach to avoid being late for the team show order decision draw happened four hours later.

Even we almost live inside the company, we still haven’t finished our demo on the final day morning. There is some performance issue with our game during the launch phase: since we talk with Watson at the same time the game assets are loading, the framerate of the game drops significantly. In addition, we haven’t figured out a way to grow our character body just like the classic snake game. These put a lot of pressure on me because there isn’t enough time to fix everything in a nice clean way. However, we somehow manage to finish all this by the demo time. We adopt agile practice. Maybe we cannot fix these problems nicely but we can definitely walk around the problem just for the demo’s sake. We do incremental world object construction during the game loading phase: we only load the objects that player can actually see through the camera and we use a big skybox to block unloaded part of the world from the user. For the snake body problem, each time the character hits the target, we put a sphere behind the character and we somehow manage to let the newly added part follow the movement of the character. Maybe the movement doesn’t mimic the snake body movement nicely but for the sake of demo, that’s enough.

During the demo time, everything works well. Thanks to the public speaking practice I have kept doing at the toastmasters club, I delivered a successful presentation to the senior management level at the lab and we obtained 2nd place in this hackathon with the fewest team members among all the contest teams.

Here are some interesting stats that are worth mentioning:

  • We have 0 experience with the technology stack of the hackathon
  • We originally have 4 team members but down to 2 halfway through the event
  • We only have 1 developer and 1 tester eventually
  • We only have 1 person with sufficient programming experience
  • We work to the super late nights for at least 3 days
  • The longest non-stop hacking lasts for 30 hours
  • We consumed 50+ bottles of water and bags of snack
  • We watched 60+ hours video tutorial on YouTube and safaribooksonline.com
  • We write 2500+ lines of code for the demo

 

Hackathon Takeaway

Always remember you’re the captain

There are couple of times I want to quit the event. Thankfully, I don’t actually do that because I always remind myself I’m the captain of the two-man squad. When you set a goal to meet, you have got to do whatever you can to reach that goal and get the job done. Thanks to this hackathon event, I can now clearly see this point.

Stay positive during the difficult situation

There are downtime during the whole hackathon event. Face the technology we only have never actually experimented with when we enter the contest; Two of the original team members leave the team; Unfamiliar with the development tool; Debug the code to the late nights … All these things can drag the moral down pretty quickly. However, I’m the leader, I cannot do this. So, even in these difficult moments, I try to call the team for a short break and entertain ourselves by coming up jokes or have some random chats. These techniques work amazingly well because we don’t feel stressed and we can actually enjoy the whole problem-solving process. Without fun, the hackathon will never be the same to me.

Don’t be afraid of making the tough call

To be honest, I’m the chief solution architect of our demo and sometimes I have to make some tough calls, especially when both options look tempting. For instance, implementing the game like the classic snake game or like Super Mario are both good options. However, if we put time and various other resource limit under consideration, two options cannot be the same. Super Mario has its advantage in 3D exploration and the snake game reflects the core idea of our solution – make the body grows with the code path. I have to make the tough call on what path we want to pursue and I have to say, it’s always not easy.

Motivation and hardworking is the key to success

Motivation will give you the courage to take your first action but only can hardworking make you reach the goal. In this hackathon, I feel lucky that I follow my interest and choose Watson Introspector as my topic to work on. My interest provides me enough motivation to power through the whole event. However, I know that in the deep of my heart, I want to win and I want to my demo to reflect the technical expertise we equip. That requires hard working. Thankfully, we don’t let ourselves down and we work super hard towards our goal. I’m so glad we finally make it.

Public speaking is crucial

I feel our solution may not use all these fancy technologies like some other group does but I feel the public speaking or the presentation skill well compensated for this “disadvantage”. During the presentation, I use numbers from above to define our hackathon experience with the speech style like Steven Jobs and our beloved CEO, Ginni Rometty. This delivers a concrete message to the audience: we come here to win and we deserve it. I want to give the judges the feeling that we are 120% confident with our solution and more importantly, we feel proud of it.

All in all, this hackathon has become one of the moments that I’m extremely proud of myself as an IBMer and I do learn a ton from it: not just technical stuff but how to be a leader as well.

I want to end my post with the slogan of our project:

Evolve ourselves, beyond the limit!

Sqoop2 7 Minutes Demo with DB2

In this post, I’m walking you through Sqoop 1.99.6 5 minutes Demo with emphasis on working with DB2. Hope this post can save you some time on googling (Believe me, I spent quite some time to get everything figured out when doing this hands-on practice on my own.) Let’s get started.

Preparation phase

Installation

I highly recommended to use cloudera hadoop distribution, CDH for this hands-on lab because CDH has already pre-installed Sqoop2 and has everything well-configured. I have tried out sqoop2 installation on  Hortonworks HDP. The experience is not fun. So, I will assume you work with CDH for the rest of walkthrough.

I’m working with cloudera CDH 5.7.0 and “sqoop-1.99.6-bin-hadoop200”.

  1. Once you have sqoop downloaded, you need to untar it on the client that DB2 instance installed.
  2. You need to download the JDBC 4.0 Driver from IBM that matches with your DB2 version.
  3. You need to install the driver onto CDH sqoop and restart the sqoop service:
sudo cp db2jcc4.jar db2jcc.jar /var/lib/sqoop2/
sudo /sbin/service sqoop2-server stop
sudo /sbin/service sqoop2-server start

Create a test database and table in DB2 to work with

$ db2 list db directory

System Database Directory

Number of entries in the directory = 1

Database 1 entry:

Database alias                       = HZY
Database name                        = HZY
Local database directory             = /home/iidev20
Database release level               = 14.00
Comment                              =
Directory entry type                 = Indirect
Catalog database partition number    = 0
Alternate server hostname            =
Alternate server port number         =

$ db2 connect to hzy

  Database Connection Information

Database server        = DB2/LINUXX8664 11.1.0
SQL authorization ID   = IIDEV20
Local database alias   = HZY

$ db2 list tables

Table/View                      Schema          Type  Creation time
------------------------------- --------------- ----- --------------------------
TEST1                           IIDEV20         T     2016-07-12-20.33.38.801694

  1 record(s) selected.

$ db2 "select * from test1"

C1
-----------
          4

  1 record(s) selected.

Creating Link Object

Check for the registered connectors on your sqoop server:

sqoop:000> show connector
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
| Id |          Name          |     Version     |                        Class                         | Supported Directions |
+----+------------------------+-----------------+------------------------------------------------------+----------------------+
| 1  | kite-connector         | 1.99.5-cdh5.7.0 | org.apache.sqoop.connector.kite.KiteConnector        | FROM/TO              |
| 2  | kafka-connector        | 1.99.5-cdh5.7.0 | org.apache.sqoop.connector.kafka.KafkaConnector      | TO                   |
| 3  | hdfs-connector         | 1.99.5-cdh5.7.0 | org.apache.sqoop.connector.hdfs.HdfsConnector        | FROM/TO              |
| 4  | generic-jdbc-connector | 1.99.5-cdh5.7.0 | org.apache.sqoop.connector.jdbc.GenericJdbcConnector | FROM/TO              |
+----+------------------------+-----------------+------------------------------------------------------+----------------------+

Generic JDBC Connector in our example has a persistence Id 4 and we will use this value to create new link object for this connector:

sqoop:000> create link -c 4
Creating link for connector with id 4
Please fill following values to create new link object
Name: Fist Link

Link configuration

JDBC Driver Class: com.ibm.db2.jcc.DB2Driver
JDBC Connection String: jdbc:db2://9.112.250.80:50591/HZY
Username: iidev20
Password: ********
JDBC Connection Properties:
There are currently 0 values in the map:
entry#
New link was successfully created with validation status OK and persistent id 4

There are two things worth mentioning about DB2:

  1. “JDBC Driver Class” for DB2 is “com.ibm.db2.jcc.DB2Driver
  2. In “JDBC Connection String”, I specify the IP address of the client where DB2 instance resides. Then I specify the port number that is for DB2. Lastly I specify the database name we just created.  The port number “50591” can be obtained by:
$ db2 get dbm cfg | grep SVCE
 TCP/IP Service name                          (SVCENAME) = iidev20
 SSL service name                         (SSL_SVCENAME) =

$ grep xiidev20 /etc/services
xiidev20                      50591/tcp
xiidev20_int                  50592/tcp

Our new link object was created with assigned id 4.

Let us create another link object but this time for the hdfs-connector instead:

sqoop:000> create link -c 3
Creating link for connector with id 3
Please fill following values to create new link object
Name: Second Link

Link configuration

HDFS URI: hdfs://quickstart.cloudera:8020/
New link was successfully created with validation status OK and persistent id 5
  1. “quickstart.cloudera” is the hostname, which can be obtained by hostname command in CDH.

Now, we have created two links. You can check by running show link --all:

sqoop:000> show link --all
2 link(s) to show:
link with id 4 and name Fist Link (Enabled: true, Created by iidev20 at 8/4/16 11:57 AM, Updated by iidev20 at 8/4/16 11:57 AM)
Using Connector generic-jdbc-connector with id 4
  Link configuration
      JDBC Driver Class: com.ibm.db2.jcc.DB2Driver
      JDBC Connection String: jdbc:db2://9.112.250.80:50591/HZY
      Username: iidev20
      Password:
      JDBC Connection Properties:
link with id 5 and name Second Link (Enabled: true, Created by iidev20 at 8/4/16 12:36 PM, Updated by iidev20 at 8/4/16 12:36 PM)
Using Connector hdfs-connector with id 3
  Link configuration
    HDFS URI: hdfs://quickstart.cloudera:8020/

Creating Job Object

Now, we create the job object:

sqoop:000> create job -f 4 -t 5
Creating job for links with from id 4 and to id 5
Please fill following values to create new job object
Name: sqoopy

From database configuration

Schema name: IIDEV20
Table name: TEST1
Table SQL statement:
Table column names:
Partition column name: C1
Null value allowed for the partition column:
Boundary query:

ToJob configuration

Override null value:
Null value:
Output format:
  0 : TEXT_FILE
    1 : SEQUENCE_FILE
    Choose: 0
    Compression format:
      0 : NONE
      1 : DEFAULT
      2 : DEFLATE
      3 : GZIP
      4 : BZIP2
      5 : LZO
      6 : LZ4
      7 : SNAPPY
      8 : CUSTOM
    Choose: 0
    Custom compression format:
    Output directory: /cloudera/

    Throttling resources

    Extractors: 2
    Loaders: 2
    New job was successfully created with validation status OK  and persistent id 3

sqoop:000> show job
+----+--------+----------------+--------------+---------+
| Id |  Name  | From Connector | To Connector | Enabled |
+----+--------+----------------+--------------+---------+
| 3  | sqoopy | 4              | 3            | true    |
+----+--------+----------------+--------------+---------+

The idea here is quite simple. We use JDBC connector to read data from our DB2 table (indicated by -f 4), and then we write the data to HDFS by using -t 5.

Here, you need to pay attention to “Partition column name: C1”. If you don’t specify the partition column, you may hit the error when starting the job:

Exception has occurred during processing command
Exception: org.apache.sqoop.common.SqoopException Message: CLIENT_0001:Server has returned exception

Our new job object was created with assigned id 3.

Start Job (Data Transfer)

Now we are ready to start the sqoop job we just created:

sqoop:000> start job -j 3
Submission details
Job ID: 3
Server URL: http://9.181.139.59:12000/sqoop/
Created by: iidev20
Creation date: 2016-08-04 11:57:43 CDT
Lastly updated by: iidev20
External ID: job_1469730693462_0125
        http://quickstart.cloudera:8088/proxy/application_1469730693462_0125/
2016-08-04 11:57:43 CDT: BOOTING  - Progress is not available

You can interatively check your running job status with status job command:

sqoop:000> status job -j 3
Submission details
Job ID: 3
Server URL: http://9.181.139.59:12000/sqoop/
Created by: iidev20
Creation date: 2016-08-04 11:57:43 CDT
Lastly updated by: iidev20
External ID: job_1469730693462_0125
        http://quickstart.cloudera:8088/proxy/application_1469730693462_0125/
2016-08-04 11:58:30 CDT: RUNNING  - 0.00 %

sqoop:000> status job -j 3
Submission details
Job ID: 3
Server URL: http://9.181.139.59:12000/sqoop/
Created by: iidev20
Creation date: 2016-08-04 11:57:43 CDT
Lastly updated by: iidev20
External ID: job_1469730693462_0125
        http://quickstart.cloudera:8088/proxy/application_1469730693462_0125/
2016-08-04 11:58:47 CDT: RUNNING  - 50.00 %

sqoop:000> status job -j 3
Submission details
Job ID: 3
Server URL: http://9.181.139.59:12000/sqoop/
Created by: iidev20
Creation date: 2016-08-04 11:57:43 CDT
Lastly updated by: iidev20
External ID: job_1469730693462_0125
        http://quickstart.cloudera:8088/proxy/application_1469730693462_0125/
2016-08-04 12:00:02 CDT: SUCCEEDED
Counters:
        org.apache.hadoop.mapreduce.FileSystemCounter
                FILE_LARGE_READ_OPS: 0
                FILE_WRITE_OPS: 0
                HDFS_READ_OPS: 1
                HDFS_BYTES_READ: 110
                HDFS_LARGE_READ_OPS: 0
                FILE_READ_OPS: 0
                FILE_BYTES_WRITTEN: 373985
                FILE_BYTES_READ: 17
                HDFS_WRITE_OPS: 2
                HDFS_BYTES_WRITTEN: 2
        org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
                BYTES_WRITTEN: 0
        org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
                BYTES_READ: 0
        org.apache.hadoop.mapreduce.JobCounter
                MB_MILLIS_MAPS: 20448256
                TOTAL_LAUNCHED_MAPS: 1
                VCORES_MILLIS_REDUCES: 82496
                TOTAL_LAUNCHED_REDUCES: 2
                VCORES_MILLIS_MAPS: 19969
                SLOTS_MILLIS_REDUCES: 82496
                MB_MILLIS_REDUCES: 84475904
                SLOTS_MILLIS_MAPS: 19969
                MILLIS_REDUCES: 82496
                MILLIS_MAPS: 19969
                OTHER_LOCAL_MAPS: 1
        org.apache.sqoop.submission.counter.SqoopCounters
                ROWS_READ: 1
                ROWS_WRITTEN: 1
                Shuffle Errors
                CONNECTION: 0
                WRONG_LENGTH: 0
                BAD_ID: 0
                WRONG_MAP: 0
                WRONG_REDUCE: 0
                IO_ERROR: 0
         org.apache.hadoop.mapreduce.TaskCounter
                MAP_OUTPUT_MATERIALIZED_BYTES: 17
                MERGED_MAP_OUTPUTS: 2
                SPILLED_RECORDS: 2
                REDUCE_INPUT_RECORDS: 1
                VIRTUAL_MEMORY_BYTES: 4516757504
                MAP_INPUT_RECORDS: 0
                SPLIT_RAW_BYTES: 110
                FAILED_SHUFFLE: 0
                REDUCE_SHUFFLE_BYTES: 17
                MAP_OUTPUT_BYTES: 3
                PHYSICAL_MEMORY_BYTES: 466186240
                GC_TIME_MILLIS: 1305
                REDUCE_INPUT_GROUPS: 1
                COMBINE_OUTPUT_RECORDS: 0
                SHUFFLED_MAPS: 2
                REDUCE_OUTPUT_RECORDS: 1
                MAP_OUTPUT_RECORDS: 1
                COMBINE_INPUT_RECORDS: 0
                CPU_MILLISECONDS: 7470
                COMMITTED_HEAP_BYTES: 393216000
Job executed successfully
  1. start job -j 3 -s allows you to start a sqoop job and observe job running status. stop job -j 3 will not stop running job at any time.

 

Check Result

we can check the result in CDH:

[cloudera@quickstart ~]$ hdfs dfs -ls /cloudera/
Found 2 items
-rw-r--r--   1 sqoop2 supergroup          0 2016-08-04 09:59 /cloudera/33895be5-a670-4e25-aada-a66fc2cf1919.txt
-rw-r--r--   1 sqoop2 supergroup          2 2016-08-04 09:59 /cloudera/ffe359d6-afe9-40e9-baf9-d2e29937a86c.txt
[cloudera@quickstart ~]$ hdfs dfs -cat /cloudera/33895be5-a670-4e25-aada-a66fc2cf1919.txt
[cloudera@quickstart ~]$ hdfs dfs -cat /cloudera/ffe359d6-afe9-40e9-baf9-d2e29937a86c.txt
4

 

Extra 2 Minutes …

In this section, we transfer data back from HDFS to DB2 table:

sqoop:000> create job -f 5 -t 4
Creating job for links with from id 5 and to id 4
Please fill following values to create new job object
Name: h2d

From Job configuration

Input directory: /cloudera/
Override null value:
Null value:

To database configuration

Schema name: IIDEV20
Table name: TEST1
Table SQL statement:
Table column names:
Stage table name:
Should clear stage table:

Throttling resources

Extractors: 2
Loaders: 2
New job was successfully created with validation status OK  and persistent id 4

Then, we start the job:

sqoop:000> start job -j 4 -s
Submission details
Job ID: 4
Server URL: http://9.181.139.59:12000/sqoop/
Created by: iidev20
Creation date: 2016-08-04 14:53:20 CDT
Lastly updated by: iidev20
External ID: job_1469730693462_0134
        http://quickstart.cloudera:8088/proxy/application_1469730693462_0134/
2016-08-04 14:53:20 CDT: BOOTING  - Progress is not available
2016-08-04 14:53:37 CDT: BOOTING  - 0.00 %
2016-08-04 14:53:48 CDT: BOOTING  - 0.00 %
2016-08-04 14:53:58 CDT: BOOTING  - 0.00 %
2016-08-04 14:54:11 CDT: RUNNING  - 0.00 %
2016-08-04 14:54:21 CDT: RUNNING  - 0.00 %
2016-08-04 14:54:32 CDT: RUNNING  - 0.00 %
2016-08-04 14:54:43 CDT: RUNNING  - 0.00 %
2016-08-04 14:54:54 CDT: RUNNING  - 0.00 %
2016-08-04 14:55:05 CDT: RUNNING  - 0.00 %
2016-08-04 14:55:16 CDT: RUNNING  - 50.00 %
2016-08-04 14:55:27 CDT: RUNNING  - 50.00 %
2016-08-04 14:55:38 CDT: RUNNING  - 50.00 %
2016-08-04 14:55:49 CDT: RUNNING  - 50.00 %
2016-08-04 14:55:59 CDT: RUNNING  - 50.00 %
2016-08-04 14:56:10 CDT: RUNNING  - 50.00 %
2016-08-04 14:56:21 CDT: RUNNING  - 100.00 %
2016-08-04 14:56:35 CDT: SUCCEEDED
Counters:
      org.apache.hadoop.mapreduce.FileSystemCounter
              FILE_LARGE_READ_OPS: 0
              FILE_WRITE_OPS: 0
              HDFS_READ_OPS: 8
              HDFS_BYTES_READ: 427
              HDFS_LARGE_READ_OPS: 0
              FILE_READ_OPS: 0
              FILE_BYTES_WRITTEN: 494580
              FILE_BYTES_READ: 17
              HDFS_WRITE_OPS: 0
              HDFS_BYTES_WRITTEN: 0
      org.apache.hadoop.mapreduce.lib.output.FileOutputFormatCounter
              BYTES_WRITTEN: 0
      org.apache.hadoop.mapreduce.lib.input.FileInputFormatCounter
              BYTES_READ: 0
      org.apache.hadoop.mapreduce.JobCounter
              MB_MILLIS_MAPS: 109466624
              TOTAL_LAUNCHED_MAPS: 2
              VCORES_MILLIS_REDUCES: 135729
              TOTAL_LAUNCHED_REDUCES: 2
              VCORES_MILLIS_MAPS: 106901
              SLOTS_MILLIS_REDUCES: 135729
              MB_MILLIS_REDUCES: 138986496
              SLOTS_MILLIS_MAPS: 106901
              MILLIS_REDUCES: 135729
              MILLIS_MAPS: 106901
              OTHER_LOCAL_MAPS: 2
      org.apache.sqoop.submission.counter.SqoopCounters
              ROWS_READ: 1
              ROWS_WRITTEN: 1
              Shuffle Errors
              CONNECTION: 0
              WRONG_LENGTH: 0
              BAD_ID: 0
              WRONG_MAP: 0
              WRONG_REDUCE: 0
              IO_ERROR: 0
      org.apache.hadoop.mapreduce.TaskCounter
              MAP_OUTPUT_MATERIALIZED_BYTES: 29
              MERGED_MAP_OUTPUTS: 4
              SPILLED_RECORDS: 2
              REDUCE_INPUT_RECORDS: 1
              VIRTUAL_MEMORY_BYTES: 6016950272
              MAP_INPUT_RECORDS: 0
              SPLIT_RAW_BYTES: 420
              FAILED_SHUFFLE: 0
              REDUCE_SHUFFLE_BYTES: 29
              MAP_OUTPUT_BYTES: 3
              PHYSICAL_MEMORY_BYTES: 670957568
              GC_TIME_MILLIS: 4852
              REDUCE_INPUT_GROUPS: 1
              COMBINE_OUTPUT_RECORDS: 0
              SHUFFLED_MAPS: 4
              REDUCE_OUTPUT_RECORDS: 1
              MAP_OUTPUT_RECORDS: 1
              COMBINE_INPUT_RECORDS: 0
              CPU_MILLISECONDS: 9510
              COMMITTED_HEAP_BYTES: 453255168
Job executed successfully

Finally, we verify the result from DB2’s perspective:

$ db2 "select * from test1"

C1
-----------
          4
          4

  2 record(s) selected.

A recap on EPH program

This week (05/16 – 05/20), I attended 2016 GCG Early Professional Hire (EPH) Program offered by the company. The following is the recap of the whole program with some of my thoughts.

Pre-show

GCG Early Professional Hire (EPH) program run by IBM is a 2-year program that targets specifically at the new employees with working experience less than two years. It aims to develop core and valuable skills for the new IBMers. When I first receive the advertisement email, my incentive tells me not to attend even though it is required for new hires (you can reject by obtaining approve from your manager). However, I figure it is a good chance to take a break from the work and have a chance to meet some people (some beauties if I’m lucky and in fact, there are some), so I withdraw my request to not attend.

Kick-off Event (05/16 – 05/19)

Day 1

Morning

The kick-off Event is hold at Marco Polo Parkside Hotel in Beijing. It is really a fancy hotel and I’m really surprised that my company could spend so much money hosting an event in a hotel like this, especially it has been rough years for GCG.  The agenda for the first day consists of bunch of speeches, BU introductions, and a welcome dinner.  Sign in starts at 9 is quite tough for me as the distance between hotel and my apartment is 11.5 miles! However, “Watson Coffee” (some fruit and yogurt) helps me to go through this tough time to wait for event start at 10.

Morning speech is not quite impressive. The first speech is delivered by Shally Wang, GM of GCG. She talked about something that I could hardly recall but her opening talks about moving start time earlier to compensate the people get there early is quite thoughtful at some level.

The next speech is delivered by Anita Sabatino, a senior leader at IBM. I have to say her speech is the only shinning point on Day 1. She recaps her career at IBM:

She starts as a software engineer at IBM and becomes a sale once she is a advisory software engineer. She then changes the role to sales and work for JP Morgan for a couple of years before rejoining IBM. She then moves to China with her daughter who is adopted from China and works with Bank of China. She gives examples on how to build trust with clients. For instance, meets with leader from BOC weekly and always be on time; Build personal relationship to a reasonable amount that facilitates the collaboration. Also she shares some stories about her daughter.

This is a quite good speech because it has really substance. It is not hollow words without any meaning. It feels like a friend talks about career directly to you. Plus, I’m always interested in people sharing their career story and how they make decisions.

The last part of the morning consists of people from GCG share their story to new hires. They are not as senior as previous speakers but they are experienced. I didn’t quite listen to their talk shows because it’s already 12:30pm when they start their sharing and I’m quite starving. All in all, it’s just some show value stuff so that they can brag to their boss. Nothing new.

Afternoon & Evening

The lunch is buffet and I heard it costs around 200 RMB per person. It is quite good and I had a tons of steak and ice cream. The key word for the afternoon is “BORING”. It consists of speeches from different BU leaders (GTS, CAMSS, GBS, Technology Partnership), which essentially wants you to have a big picture about their BU and appreciate their business value. The rest of the day is a nice welcome dinner and some shows from fellow IBMers. The shows are quite nice but unfortunately I cannot sit till the last minute because it still went on at 8:30pm and I’m afraid of missing the last subway back home.

Day 2 – 4

These three days consist of four main parts:  building your professional reputation@IBM, workplace etiquette, delivering quality work with agility, business writing. The overall is quite boring but there are indeed some shinning points that are worth to mention:

mmexport1463657100629

BU Session (05/20)

There are two great speeches delivered today. One delivered by Ge Song, CDL cloud leader and the other one delivered by Zhong Tian, the only Distinguished Engineer (DE) at CDL.

Ge Song’s Speech

Ge Song’s speech mainly focuses on some takeaway she gets from Things I Wish I Knew Before Working in Industry (this source based upon her reference during the speech but could be wrong as she didn’t explicitly cite the source). The following are the key points she mentioned (I write them down based upon my audio record):

  1. Attitude makes everything; be willing to do more. She draws on her own experience and offers an example: When she started her career at IBM, her manager sometimes got challenge tasks and asked if anyone is willing to take it. The most courage sentence she could ever say at that time was “I can try it!” even she knew she was totally capable of doing it. So, she suggests that if you are pretty sure you can handle the task, then always say “I can do it!” This is because it is the manager’s responsibility to help you to succeed at your task. They will do whatever they can to help you (frequently review …) and to control the risk. They will not blame you for the failure because it is their failure if you fail. Also, be willing to take more tasks whenever possible and necessary. Don’t be the kind of person that cannot hang around any longer after 5pm and can’t wait to catch the first shuttle to get back home. So, always remember “No pain no gain”!
  2. Be visible (show value). The example she gives here is the global conference call scenario. Usually, for Chinese, people barely talk anything during the call except “Hi! I’m Mark.”, “Bye Bye!”. That doesn’t work in the sense that you don’t show your value. Here is a tips. If you know the conference call will discuss some difficult problem beforehand, you can prepare for that. When global team leader asks for any input, you should speak up (because you’re already prepared).
  3. Find your mentor. Everybody knows what mentor means for a person’s career. Here, she emphasizes that you should build a solid skill (foundation) before you ask for changing mentor.
  4. Be yourself and build your identity (Build your personal branding). You need to strive for excellence for the area you are working on (become a goto person). However, you don’t have to care how people treat you. Build your expertise and keep learning! “忠实于对技术的感情!”
  5. Think big and act from small (志存高远,从小事做起). 不要好高骛远!不要老觉得某个leader很强而忘记他在技术领域的耕耘。Again, she offers a tips regarding conference call. You need to focus two points during the call. 1. Why she asks this kind of technical question? 2. Develop your English speaking skill.
  6. Managing your time. You will become the person that you spend the most time on.
  7. Priority. briefly mentioned.
  8. Managing the risk. briefly mentioned.
  9. With courage to say “No”! briefly mentioned.
  10. Not only plan your career, also your life. briefly mentioned.

Zhong Tian’s Speech

Zhong Tian’s speech focuses on the share of a technical career. I listed some of the inspirational sentences he mentioned:

  • Keeps learning!
  • Excel what you do, the world is yours!
  • 不要觉得你是band6就应该做band6的活,如果你是band6已经在做band7的活的话,你离promote已经不远了!

Source code security

Well, this is a post that I started on 2016-04-15 and I finally finish today…

Yesterday morning (04/15/16), when I came to the office, I got a bad news from my manager: he was informed by security that I had an abnormal checkout of code on Monday, 04/11/16. The way how things work regarding source code security in our lab and probably in IBM other labs is that security will track each developer the frequency and quanity of checkout each day. They collect some statistics and alert the first-line manager when something potentially terrible happened. For instance, if I usually checkout code twice per day and each time around 20 source files, but on 04/15/16, I checkout 3456 files in day will certainly set off the alarm. Believe me, this number is exactly the number I was informed from my manager. What did I do on that day? It turns out that I need to make a special build on top of a GA build for a client and I need include all the code change specifically for this client in the past plus my code this time. The way to make a special build is that we use some scripts to check out the source files that are needed to be changed and merge the code, and run test buckets on them. Those will involve tons of checkout & checkin. After all, I successfully explain this to my manager and everything works out at last.

What interests to me for this incident is that this is the first time I realize the power of Clearcase. I have never heard of ClearCase until I join IBM. Back to the college, I solely work with Git and I feel extremely uncomfortable when I firstly work with ClearCase. However, from this incident, I personally start to feel like ClearCase is probably more powerful than Git on security level. Basically, in Git world, I need to fork or clone the repository so that I can have a local copy of ALL the source code and to start work on my branch. There has some problem in terms of security because I literally need to have all the code locally before I can work on my stuff. Make branch on the remote repository also has this issue. However, in ClearCase, I only need to first make a dynamic view and only check out the files I need to modify. If I check out too many files that will raise warning like this time. This security checking mechanism works great with ClearCase because:

  • There is a central server to hold all the source code. A Corporation can simply monitor the checkout behavior of this central code repository.
  • the quantity of checkout is different from person to person. In Git, it feels like a standard way for everyone to checkout all source files even you only need to modify one. However, with ClearCase, that can be different from person to person. This will the statistics monitoring checkout becomes meaningful.

I’m not saying Git is bad. In fact, in IBM, we are starting to have GitHub Enterprise that hosts on SoftLayer behind IBM firewall. That is really a great news for me because I can finally have “social coding” experience that I have been enjoying so far outside of the work. It will make some work I have done tailored specifically to fellow IBMers more organized and easy to get. I don’t need to attach the code inside emails sent to each member of the team that we collaborat with one by one. I can simply send the git repo to their team lead and each member of their team can access simultaneously. Plus, having Github inside IBM also helps me to track issue with the code I own and again, saves ton of communication cost for me.

Quick primer on checking database object privileges in DB2 LUW

db2talk

If you are a DBA, you will inevitably work on troubleshooting/ granting / revoking object privileges to database users. In this blog post, I am going to share how to check for privileges that have been granted to an object in a DB2 LUW database. This post is an introductory level post for new DBAs. Database level authorities are not discussed in this post.

View original post 461 more words