Skip to content

{ Category Archives } Uncategorized

Zookeeper experience

While working on Kafka, a distributed pub/sub system (more on that later) at LinkedIn, I need to use Zookeeper (ZK) to implement the load-balancing logic. I’d like to share my experience of using Zookeeper. First of all, for those of you who don’t know, Zookeeper is an Apache project that implements a consensus service based [...]

The Kamikaze version 3.0.0 is released

Kamikaze is a utility package wrapping set implementations on sorted integer arrays. Search indexes, graph algorithms and certain sparse matrix representations tend to make heavy use of sorted integer arrays.
For example, in search engines, for each term t, the index, or called inverted index, contains an inverted list, which is typically a sequence [...]

Tagged , , ,

LinkedIn Faceted Search

Faceted search has been fully rolled out late last year, we wanted to give you some insights into how it came to be, some of its challenges and what is in the future.

At scale and with relevance, faceted search makes a lot of sense on the rich structured data we have here [...]

Tagged ,

When Pigs Fly: Apache Pig, Open Source and Understanding Systems

Pig at LinkedIn
Hadoop drives many of our most powerful features at LinkedIn.  About half of our Hadoop jobs are submitted by Apache Pig.  This means that along with Azkaban and Voldemort, Pig is a large part of LinkedIn’s data cycle - the process behind features like People You May Know and Who Viewed My Profile.
I have used Pig [...]

Tagged

JNA

There are two groups of people in CS who want to control your program’s interaction with the outside world–the operating system people and programming language people–and they are always fighting over who will have this honor. The operating system people want to provide a set of functionality that is available to any programming language, and [...]

Beating Binary Search

A search exponentially faster than binary search, and a use for it.

SOCC 2010 updates

Just came back from the 1st ACM Symposium on Cloud Computing at Indianapolis. The conference is collocated with Sigmod and lasts a day and half. A total of 7 people from LinkedIn were at SOCC and the blog below reflects the notes that we took collectively. There were three keynote speeches, all of which are [...]

New docs for Norbert

We finally added some documentation for Norbert, our open source cluster management and RPC system.

Introducing the NIO SocketServer Implementation

Users of Voldemort have the option of using a binary protocol for efficient network communication between clients and nodes. This is implemented on the server side using an abstraction known as a SocketServer. Previously the only implementation of the voldemort.server.socket.SocketServer used the classic thread-per-socket blocking I/O approach to handling the network communication.
Recently my NIO implementation [...]

Building Voldemort read-only stores with Hadoop

A well-known lesson in scalability is that writes are 40x more expensive than reads and if your application becomes write-intensive as it is easily the case when you are dealing with sufficiently large number of users, you will be in trouble if you don’t design to scale. For example, if you are using MySQL, [...]

Tagged , ,