These pages rank at the top of the site: search because they have a lot of backlinks. Why? Because someone created the profile, then went out and created links to their profile page (e.g. http://www.opensiteexplorer.org/links?site=http%3A%2F%2Fnews...). Why? Because they were probably planning to drop a link on their HN profile page to a page that sells Zithromax.
This is a common strategy on Youtube or really any social site that gives you a profile page and the ability to drop a followed link. Even if you're building junky links to your HN profile page, the overall domain authority and trust of HN is relatively high, so it's a way to launder link juice and create a quality link to whatever it is you're selling.
The sad thing is that it's not just spammers who need to do this. Even people who are genuinely creating great content in an attempt to help others still need to create backlinks from social sites if they want to show up in Google. I've made all sorts of resources that are far better than anything else that currently exists, but without actively going out and building a few backlinks these pages would get literally zero hits.
While the core of what you say is true, I think the overall message, in the context of the comments above you, is slightly misleading. Yes, you cannot just create a great website and magically get hits, you do have to tell people it exists! This isn't news, and "SEO" while much maligned these days is still a valid, useful and in many cases necessary skill. However, it is not necessary to engage in blackhat SEO practices and it is far from true to say that you must build spammy backlinks from social networking sites in order to succeed. I don't think you mean that it is, but it came across a little like that.
Oh wow. Thank you for that explanation. It's impressive how many of these profiles were made. This means the total number of real HN accounts is probably significantly less than 28k.
We've seen people who post a link to HN, then have a dozen or so of their fake user accounts up vote that link. Given the economic incentive of being/lingering on the front page there will be folks who try to game it.
I'd be interested to see some analytics, such as how many users are there with 100+ karma because that's likely a better measure of the number of active users than the number of registered users (many of which are likely inactive). However, that doesn't account for lurkers.
Or, the number of users that have logged in within the past week though this is probably something pg would have to give us because I can't think of any way to determine that without the backend data.
100+ karma won't exactly be a good parameter. I am a regular at Hacker News for around 3 months now , mostly just to read the content & up-vote the ones I really like. But my karma has been at 1 forever now, maybe because I don't comment much, not sure. Still karma does not say much.
Yeah, I agree. Karma shouldn't be an indicator of how often one visits HN, it's merely an indicator of how much one participates in the HN community.
For me, getting karma is not desirable at all. What's the point of it? I only comment, when I think I have something useful to say, which is not often the case. I never felt the urge to post just so that my karma increases. I think, this really reflects the good design of this whole system by pg.
It's certainly not a direct correlation, but it does help to sort the spam bots and people who created an account but never use it from the rest.
However, I did mention that the karma approach would not account for lurkers which was why I proposed the logged in method. There are many other interesting data points I'd love, but without pg releasing a lot of the info, it's tough (or in some cases impossible) to gather from public data.
I have to say that I did not know about the inurl google search parameter.
However, this is indeed a very rough estimate because the parameter 100...100000 also captures the number that indicates how long ago a user account was created, so it would also count accounts that were created a while ago (over 100 days) but were inactive.
for a numerical range find, which is VERY useful for date searches. A couple years ago, I gave a lesson to a class of mostly senior citizens for how to use Google to do genealogy work. We covered the range find, the inurl: and a couple other obscure but useful search enhancers. Pretty interesting stuff (here's the ODP, if interested) http://www.zentu.net/fmt/searchpresentation.odp
Apparently there are a bunch of pages in the index that just go to "No such user" pages (looks like killed spam accounts).
Unfortunately this search also includes people with accounts created between 100 and 100000 days ago, regardless of karma level. Not sure how to filter that out; you can't use the range syntax in between double quotes, it seems.
It would be really nice to have dataset available for researchers with username, their posts and their comments. As an example, we could use the "SPEAR Algorithm"[1] to find expertise in specific domains per HN user.
If pg wouldn't mind me scraping the site, in a manner that would not impact the sites performance, I will write an app to collect this data and publish it to the public domain.
wget does this automatically, in case grandparent doesn't know. I'm not going to try it (because I'm not going to analyze the results myself, so it would be wasteful), but I think "wget -r news.ycombinator.com" would take care of this... It will not visit external links without the --span-hosts parameter (I believe).
My account page exists, but I rarely post (4 comments); adding my username to the google search finds zero results, so there are at least 28,701, and likely many more than that.
My intuition is that there are many people that don't post at all or only post infrequently.
I also mess up the numbers. HN has so many ridiculous ways to log in, that I forgot a couple of times how I logged in (native, google, open auth, etc.) so I've created four or so accounts now. If I add them up I'm > 100, but no single user has 100 karma.
This doesn't account for users who have never submitted or commented, because there would be no way for their profile URLs to end up in Google's index. Or am I missing something? I know that there's the possibility for things to get into the index through the Google Toolbar, but I suspect that the fraction of HN users who use the Google Toolbar is quite low.
If someone creates an account but never posts, will there ever be a public-facing link the the userid page? If not, then Googlebot could be doing its job perfectly but have a totally inaccurate number of registered users using this method.
All of those people have posted comments. That's how. If you have never posted a comment or submitted a link, yet have an account, Google won't find you/
The last big poll (the one about hiding comment scores) had about ~4500 votes, IIRC. Everybody who voted is probably somewhat active on here; and I'd guess that most active users voted.
If (as the results of my Google search revealed) there are 28,600 members, then 90,000 uniques is roughly a 2:1 lurker ratio (almost certainly much higher, given member!=active member, and not all members visit every day)
True, but don't forget that number of uniques is not the same as number of users, especially if you aren't logged in.
For example: In any given month I visit HN from a laptop, desktop, and work computer using both Chrome and Firefox, in and out of incognito mode, as well as from two different browsers on my iPad and often from other people's computers. Obviously I am not always logged in to all of these browsers so depending on how pg counted uniques (my understanding is that cookies are the common practice though I could be totally off base on this) I could be counted as anywhere from 3 to 15 uniques a month.
On a related note, how much of a slashdoting can you expect for posting a link here? There aren't as many registered users as I expected, but it doesn't account for lurkers.
When my niche website is finished, I'd like to share it with the hacker news crowd, but it's not built to scale to a bajillion users.
Google was showing 27,200 results so if they are all valid user id's without repetitions, I would assume around 27,200 registered users but that seems pretty low, doesn't it?
Based on the referrals the site sends (often 4,000+) I would have thought it was much higher than that. My guess is that there are a lot of lurkers & only something like 2% of site visitors even create an account.