Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

(True story: in my interview I was asked how I would extract entities from an HTML page. I suggested using OpenCalais (a free-as-in-beer API that does just that, and returns Freebase identifiers). If someone in the Freebase community wanted to do something like that, that’s exactly what I would have recommended. But the interviewer wanted to know how I would implement it myself. I told him I wouldn’t — that that’s why I was leaving the Search group for Developer Relations! Wrong answer, apparently.)

No opinion on the rest of the article; never worked for Google, never interviewed for a job there. But I wanted to respond to this graf.

This seems to be a trope in job interview postmortems: the "technical problem that the candidate responds to with the name of a library because why would you reinvent the wheel". I doubt that this approach will come in handy. The interviewer isn't simply asking how you'd extract entities from an HTML page. They're using entity extraction as a model problem, a stand-in for the kinds of problems they expect you might deal with at that company, most of which won't have library solutions. Answering with the name of a library is about as productive as saying "I'd post on Craigslist to find someone to write the solution for me".

Meanwhile, coming up with a solution for extracting entities from a page isn't that hard; it's a weird place to stick to your guns on needing a library. And Calais is a lot of machinery to involve just to extract entities from a page of text.



Uh, it is easy to make an entity extraction program that will get you a passing grade for a CS course.

It's not easy to make one that is good enough for commercial use unless somebody is hand curating the results. For the record, OpenCalais isn't good enough for commercial use (I've tried), and like most text analysis vendors, the people there blame their customers for the lack of adoption, not their product.

If you look at the leading entity extraction products they tend to be by huge companies like BBN and IBM; the state of the art open source product is UIMA, which out of the box does precisely nothing. What it does do is make it possible to coordinate the work of 100+ developers, linguists, scientists and other people in a bunch of different timezones so you can run a sweat shop that piles up a pyramid of heuristics at a price that can only be afforded by large organizations that expect to pay a lot and get very little for it.

Now the question of how you can cheaply build a knowledge base that can do the same is an interesting one that I've thought about a lot.

You can answer 90% of the questions that show up in a data science interview with "look it up in a hash table" or "look it up the literature". Off the top of my head I'd have a hard time explaining how to make a bloom filter or how to turn text into a suffix tree, although I probably could derive an algorithm to train an HMM. I can look up all this stuff in the literature so why bother?


What's your point? Are you suggesting that an interviewer expects you to write a production quality DOM parser in the course of a one-hour interview on a blackboard? No, they don't. So, move on: citing a library is an inadequate answer. You need to write code.


What's my point is that I'm an expert on the above problem, working on stuff that is beyond state-of-the-art.

If somebody hires me to work on that I can produce exceptional results.

If somebody hires me to fill in for the last guy who burned out on a project that is two years behind schedule, my performance is average.

Asking a question like that from an average person will give you an average answer and average, at best, results in real life.


My problem with that answer is that it doesn't tell me much about the engineer. It's a one sentence response and other than showing me that you know the right library to use, doesn't show any other depth. It's one step above "I would google it". As you said, libraries require a good deal of machinery to integrate into your project, so just saying you'd use a library doesn't touch on any of the potential problems that project might have. If you happened to know that libraries API and could explain in some more depth how you would use that library, I would accept it, but I highly doubt most people would have API knowledge off the top of their heads, and I would be suitably impressed if they did.

Ignoring all the other potential problems with Google's interview process, and the question of whether he should have been asked this for the position he was interviewing for in the first place (I can't say without knowing more details), saying I would use library X is a terrible answer to a question. The point of the interview is to get to know more about your skills and abilities, and a one sentence answer tells me almost nothing. Of course it is also partially the interviewer's responsibility to ask the right questions, look for the right qualities, and lead the interview in a way where the interviewee can properly demonstrate their skills, but it's also partially or equally the interviewee's responsibility to demonstrate that.


The point, in this case, being that they weren't going for a coding position. They would not expect to solve that kind of problem as part of their job - because they were going for a community liason/management role, not a technical one.


I can't help you do a better job getting a job at Google, but I can help you with tech job interviews in general: don't suggest big libraries as solutions to small programming problems, and especially don't stick to your guns after the interviewer asks for the DIY solution.


I think the gist of the problem is that when Google acquires a company the way they handle the non-technical staff can be a bit off-putting. Instead of working hard to find them a place within Google during the acquisition process, or just laying them off with some kind of severance, they put the burden on the individual. You have a year to find a place? Basically that just means, you have a year to find a new place to work.


> when Google acquires a company the way they handle the non-technical staff can be a bit off-putting.

True enough.

Depending on the acquisition and the acquiring company, non-technical staff might not even get an interview -- they're just sent on their way with some thank you money, sometimes.

The standard year is the "don't say bad things about us" move done by companies that are sensitive to that kind of negative PR.

But really, finding a place for staff members that you would not have hired involves a lot of energy which can almost certainly be put to better use elsewhere in the business.


Yeah, I think you can open your answer with, "well, I'd use libfoo, because there are a bunch of hairy edge cases", but then follow on immediately with a toy implementation to show you understand the problem domain. There's a certain sort of nerd-arrogance to the "I'm too good to write it myself" sort of answer.


Where you hear "I'm too good to write it myself" I hear "I'm too bad to write it myself"

I mean, working on a whiteboard with no reference material and no test data I'd be glad if I had time in an interview to write a bug-free DOM parser that would work on a modern, standards-compliant web page - I wouldn't expect to even start on dealing with malformed inputs, javascript, images, nested documents...


Sure. But writing a reliable bug-free DOM parser isn't the point of the question.


Do you mean you'd answer the question with a DOM parser that wasn't reliable and bug-free, that you wouldn't use a DOM parser at all, or that you'd use a DOM parser from a library (but consider such a library more acceptable than Calais which is understandable)?

Seems to me you're going to have to get a handle on the page to figure out which entities are going to be treated as such and which are e.g. nested inside comments, or within javascript strings, or within inline CSS, or in the head, or after the body close tag, or resemble entities but aren't valid ones, and so on.


This is like in an interview context. The basic point to the question is at the highest level--how would you write such a parser, what might be some of the steps, what exactly is an entity.

It seems like you are addressing a question of total implementation. But in an interview, I would expect a candidate to be able to answer--not fully implement--what some of the basic steps are. As opposed "use a library".


More to the point, if I'm asking you in an interview how to build a DOM parser, take it as given that, if you get the job, and we need a DOM parser, you will be actively encouraged to go use a library rather than build from scratch.


And if I am asking you in an interview, it would be a given that you would be actively encouraged to figure out how to break it. One key skill in knowing how to break it knowing how one might be built.

Learning how to think abstractly is one thing, but the ability to "puncture" abstractions is quite another. The latter is more valuable in this context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: