id 0
title 0
url 2440
num_points 0
num_comments 0
author 0
created_at 0
dtype: int64
hn.describe().T
count
mean
std
min
25%
50%
75%
max
id
20099.0
1.131755e+07
696453.087424
10176908.0
10701720.0
11284523.0
11926127.0
12578975.0
num_points
20099.0
5.029663e+01
107.110322
1.0
3.0
9.0
54.0
2553.0
num_comments
20099.0
2.480303e+01
56.108639
1.0
1.0
3.0
21.0
1733.0
how many times is Python mentioned in the title of stories in our Hacker News dataset.
len([title for title in hn.title.to_list() if re.search('[Pp]ython', title)])
160
hn.title.str.contains('[Pp]ython').sum()
160
Titles that mention the programming language Ruby
hn.title[hn.title.str.contains('[Rr]uby')]
190 Ruby on Google AppEngine Goes Beta
484 Related: Pure Ruby Relational Algebra Engine
1388 Show HN: HTTPalooza Ruby's greatest HTTP clie...
1949 Rewriting a Ruby C Extension in Rust: How a Na...
2022 Show HN: CrashBreak Reproduce exceptions as f...
2163 Ruby 2.3 Is Only 4% Faster than 2.2
2306 Websocket Shootout: Clojure, C++, Elixir, Go, ...
2620 Why Startups Use Ruby on Rails?
2645 Ask HN: Should I continue working a Ruby gem f...
3290 Ruby on Rails and the importance of being stup...
3749 Telegram.org Bot Platform Webhooks Server, for...
3874 Warp Directory (wd) unix command line tool for...
4026 OS X 10.11 Ruby / Rails users can install ther...
4163 Charles Nutter of JRuby Banned by Rubinius for...
4602 Quiz: Ruby or Rails? Matz and DHH were not abl...
5832 Show HN: An experimental Python to C#/Go/Ruby/...
6180 Shrine A new solution for handling file uploa...
7171 JRuby+Truffle: Why its important to optimise t...
7235 Ruby or Rails?
7671 How I hunted the most odd ruby bug
7776 Elixir obsoletes Ruby, Erlang and Clojure in o...
7870 Elixir and Ruby Comparison
8502 Show HN: Di-ary a math note-taking app built ...
10212 Ruby has been fast enough for 13 years
11060 Show HN: VeryAnts: Probabilistic Integer Arith...
11534 The Ruby Code of Conduct
11622 FasterPath: Faster Pathname Handling for Ruby ...
12061 Ask HN: What's your favorite ruby HTTP client?
12091 Show HN: Automated Bundle Update with Descript...
12114 Awesome Ruby
12543 Ruby Bug: SecureRandom should try /dev/urandom...
12987 Show HN: Klipse code evaluator pluggable on a...
13550 Matz: I cannot accept the CoC for the Ruby com...
13650 Programs that rewrite Ruby programs
14798 Ruby Wrapper for Telegram's Bot API
14980 A Ruby gem for genetic algorithms
16093 Master Ruby Web APIs Is Out
16149 Ruru: native Ruby extensions written in Rust
16327 Make Ruby Great Again [transcript]
16422 Object Oriented Ruby
16536 Ruby Deoptimization Engine
16875 Video: Make Ruby Great Again
17072 A coupon/deals site built using Roda gem for Ruby
17510 Table Flip on Ruby Exceptions
18877 Using Rust with Ruby, a Deep Dive with Yehuda ...
19077 Python is Better than Ruby
19224 Modern concurrency tools for Ruby
19743 Using a Neural Network to Train a Ruby Twitter...
Name: title, dtype: object
how many titles in our dataset mention email or e-mail
hn.title[hn.title.str.contains('e-?mail')]
119 Show HN: Send an email from your shell to your...
313 Disposable emails for safe spam free shopping
1361 Ask HN: Doing cold emails? helps us prove this...
1750 Protect yourself from spam, bots and phishing ...
2421 Ashley Madison hack treating email
...
18098 House panel looking into Reddit post about Cli...
18583 Mailgen Generates clean, responsive HTML for ...
18847 Show HN: Crisp iOS keyboard for email and text...
19303 Ask HN: Why big email providers don't sign the...
19446 Tell HN: Secure email provider Riseup will run...
Name: title, Length: 86, dtype: object
how many titles in our dataset have tags?
hn.title[hn.title.str.contains('\[\w+\]')]
66 Analysis of 114 propaganda sources from ISIS, ...
100 Munich Gunman Got Weapon from the Darknet [Ger...
159 File indexing and searching for Plan 9 [pdf]
162 Attack on Kunduz Trauma Centre, Afghanistan I...
195 [Beta] Speedtest.net HTML5 Speed Test
...
19763 TSA can now force you to go through body scann...
19867 Using Pony for Fintech [video]
19947 Swift Reversing [pdf]
19979 WSJ/Dowjones Announce Unauthorized Access Betw...
20089 Users Really Do Plug in USB Drives They Find [...
Name: title, Length: 444, dtype: object
we were able to calculate that 444 of the 20,100 Hacker News stories in our dataset contain tags. What if we wanted to find out what the text of these tags were, and how many of each are in the dataset? In order to do this, we’ll need to use capture groups.
# extract all of the tags from the Hacker News titles and build a frequency table of those tags.
pdf 276
video 111
2015 3
audio 3
slides 2
Name: 0, dtype: int64
def first_10_matches(pattern):""" Return the story titles that match the provided regular expression """return titles[titles.str.contains(pattern)]
Titles that contain Java
hn.title[hn.title.str.contains(r'[Jj]ava[^Ss]')]
436 Unikernel Power Comes to Java, Node.js, Go, an...
811 Ask HN: Are there any projects or compilers wh...
1840 Adopting RxJava on the Airbnb App
1972 Node.js vs. Java: Which Is Faster for APIs?
2093 Java EE and Microservices in 2016
2367 Code that is valid in both PHP and Java, and p...
2493 Ask HN: I've been a java dev for a couple of y...
2751 Eventsourcing for Java 0.4.0 released
2910 2016 JavaOne Intel Keynote 32mn Talk
3452 What are the Differences Between Java Platform...
4273 Ask HN: Is Bloch's Effective Java Still Current?
4624 Oracle Discloses Critical Java Vulnerability i...
5461 Lambdas (in Java 8) Screencast
5847 IntelliJ IDEA and the whole IntelliJ platform ...
5947 JavaFX is dead
6268 Oracle deprecating Java applets in Java 9
7436 Forget Guava: 5 Google Libraries Java Develope...
7481 Ask HN: Beside Java what languages have a stro...
8100 Advantages of Functional Programming in Java 8
8135 Show HN: Rogue AI Dungeon, javacript bot scrip...
8447 Show HN: Java multicore intelligence
8487 Why IntelliJ IDEA is hailed as the most friend...
8984 Ask HN: Should Learn/switch to JavaScript Prog...
8987 Last-khajiit/vkb: Java bot for vk.com competit...
10529 Angular 2 coming to Java, Python and PHP
11454 Ask HN: Java or .NET for a new big enterprise ...
11902 The Java Deserialization Bug
12382 Ask HN: Why does Java continue to dominate?
12582 Java Memory Model Examples: Good, Bad and Ugly...
12711 Oracle seeks $9.3B for Googles use of Java in ...
13048 A high performance caching library for Java 8
13105 Show HN: Backblaze-b2 is a simple java library...
13150 Java Tops TIOBE's Popular-Languages List
13170 Show HN: Tablesaw: A Java data-frame for 500M-...
13272 Java StringBuffer and StringBuilder performance
13620 1M Java questions have now been asked on Stack...
13839 Ask HN: Hosting a Java Spring web application
13843 Var and val in Java?
13844 Answerz.com Java and J2ee Programming
13930 Java 8s new Optional type doesn't solve anything
13934 Java 6 vs. Java 7 vs. Java 8 between 2013 201...
15257 Oracle and the fall of Java EE
15868 Java generics never cease to impress
16023 Will you use ReactJS with a REST service inste...
16932 Swift versus Java: the bitset performance test
16948 Show HN: Bt 0-hassle BitTorrent for Java 8
17579 Java Lazy Streamed Zip Implementation
18407 Show HN: Scala idioms in Java: cases, patterns...
19481 Show HN: Adding List Comprehension in Java - E...
19735 Java Named Top Programming Language of 2015
Name: title, dtype: object
hn.title[hn.title.str.contains(r'\b[Jj]ava\b')]
436 Unikernel Power Comes to Java, Node.js, Go, an...
811 Ask HN: Are there any projects or compilers wh...
1023 Pippo Web framework in Java
1972 Node.js vs. Java: Which Is Faster for APIs?
2093 Java EE and Microservices in 2016
2367 Code that is valid in both PHP and Java, and p...
2493 Ask HN: I've been a java dev for a couple of y...
2751 Eventsourcing for Java 0.4.0 released
3228 Comparing Rust and Java
3452 What are the Differences Between Java Platform...
3627 Friends don't let friends do Java
4273 Ask HN: Is Bloch's Effective Java Still Current?
4624 Oracle Discloses Critical Java Vulnerability i...
5461 Lambdas (in Java 8) Screencast
5847 IntelliJ IDEA and the whole IntelliJ platform ...
6268 Oracle deprecating Java applets in Java 9
7436 Forget Guava: 5 Google Libraries Java Develope...
7481 Ask HN: Beside Java what languages have a stro...
7686 Insider: Oracle has lost interest in Java
8100 Advantages of Functional Programming in Java 8
8447 Show HN: Java multicore intelligence
8487 Why IntelliJ IDEA is hailed as the most friend...
8984 Ask HN: Should Learn/switch to JavaScript Prog...
8987 Last-khajiit/vkb: Java bot for vk.com competit...
10529 Angular 2 coming to Java, Python and PHP
11454 Ask HN: Java or .NET for a new big enterprise ...
11902 The Java Deserialization Bug
12382 Ask HN: Why does Java continue to dominate?
12582 Java Memory Model Examples: Good, Bad and Ugly...
12711 Oracle seeks $9.3B for Googles use of Java in ...
12730 Show HN: Shazam in Java
13048 A high performance caching library for Java 8
13105 Show HN: Backblaze-b2 is a simple java library...
13150 Java Tops TIOBE's Popular-Languages List
13170 Show HN: Tablesaw: A Java data-frame for 500M-...
13272 Java StringBuffer and StringBuilder performance
13620 1M Java questions have now been asked on Stack...
13839 Ask HN: Hosting a Java Spring web application
13843 Var and val in Java?
13844 Answerz.com Java and J2ee Programming
13930 Java 8s new Optional type doesn't solve anything
13934 Java 6 vs. Java 7 vs. Java 8 between 2013 201...
14393 JavaScript is immature compared to Java
14847 Show HN: TurboRLE: Bringing Turbo Run Length E...
15257 Oracle and the fall of Java EE
15868 Java generics never cease to impress
16023 Will you use ReactJS with a REST service inste...
16932 Swift versus Java: the bitset performance test
16948 Show HN: Bt 0-hassle BitTorrent for Java 8
17458 Super Mario clone in Java
17579 Java Lazy Streamed Zip Implementation
18407 Show HN: Scala idioms in Java: cases, patterns...
19481 Show HN: Adding List Comprehension in Java - E...
19735 Java Named Top Programming Language of 2015
Name: title, dtype: object
how many titles have tags at the start versus the end of the story title in our Hacker News dataset.
hn.title.str.contains(r'^\[\w+\]').sum()
15
hn.title.str.contains(r'\[\w+\]$').sum()
417
count the number of times that email is mentioned in story titles.
We’ll continue to analyze and count mentions of different programming languages in the dataset, and then we’ll finish by extracting the different components of the URLs submitted to Hacker News.
count the number of times that sql is mentioned in story titles.
221 MemSQL (YC W11) Raises $36M Series C
365 The new C standards are worth it
444 Moz raises $10m Series C from Foundry Group
521 Fuchsia: Micro kernel written in C by Google
1307 Show HN: Yupp, yet another C preprocessor
...
18549 Show HN: An awesome C library for Windows
18649 Python vs. C/C++ in embedded systems
18689 Philz Coffee raises $45M Series C
19151 Ask HN: How to learn C in 2016?
19933 Lightweight C library to parse NMEA 0183 sente...
Name: title, Length: 105, dtype: object
make all the different variations of “email” in the dataset uniform.
119 Show HN: Send an email from your shell to your...
161 Computer Specialist Who Deleted Clinton emails...
174 email Apps Suck
261 emails Show Unqualified Clinton Foundation Don...
313 Disposable emails for safe spam free shopping
...
19303 Ask HN: Why big email providers don't sign the...
19395 I used HTML email when applying for jobs, here...
19446 Tell HN: Secure email provider Riseup will run...
19838 Petition to Open Sourcemailbox
19905 Gmail Will Soon Warn Users When emails Arrive ...
Name: title, Length: 151, dtype: object
extract components of URLs from our dataset.
most stories on Hacker News contain a link to an external resource. Once we have extracted the domains, we will be building a frequency table so we can determine the most popular domains. There are over 7,000 unique domains in our dataset, so to make the frequency table easier to analyze, we’ll look at only the top 20 domains