Thursday, 24 July 2014

Grabbing Content Website using CURL

Actually had a lot of tutorials to explain the technique tenatang grabing website content using the CURL, I was inspired from the writings of the pack rosihanari in his blog, here I want to rewrite cuman with different language course with different CASE.

What's Grabbing?

can we interpret the text or a technique generally take to get the final output data or the results to display a another website shown on our website.

Acquainted with CURL (er yu si el):

PHP supports libcurl, cURL soon is a library created by Daniel Stenberg, that allows us to connect and communicate with different types of servers with different types of protocols. libcurl currently supports the http, https, ftp, gopher, telnet, dict, file, and ldap protocols. libcurl also supports HTTPS certificates, HTTP POST, HTTP PUT, FTP uploading (this can also be done with PHP ftp extension), HTTP form based upload, proxies, cookies, and user + password authentication. These functions have been added in PHP 4.0.2.
source: http://us3.php.net/manual/en/intro.curl.php

CASE

In this case we will take the content of a website domain http://detik.com, but the contents of the "Top News" We are going to take
 
  

OK, we make his next script to read HTML, the following script to read HTML using cURL.

<?php
2.  function bacaHTML($url){
3.       // inisialisasi CURL
4.       $data = curl_init();
5.       // setting CURL
6.       curl_setopt($data, CURLOPT_RETURNTRANSFER, 1);
7.       curl_setopt($data, CURLOPT_URL, $url);
8.       // menjalankan CURL untuk membaca isi file
9.       $hasil = curl_exec($data);
10.      curl_close($data);
11.      return $hasil;
12. }
13. ?>


Here the use of the above function:
 
   1. <?php
   2. echo bacaHTML(“http://www.detik.com”);
   3. ?>
 
The code above will be grabbing all the HTML content of the homepage http://detik.com. So how do you get "Headline News" it's all Keambil? good Question. For the First thing we have to do is pick out where the HTML code section that holds "Headline News". Run your browser and then open http://detik.com then press CTRL + U there will appear the HTML source of the homepage http://detik.com. HTML Code roughly "Headline News" in this section:

 

<ul id=”beritautama”>
--------
--------

</ul>
  

Once we know where the HTML code for the "Main Berita", we have to do is break the HTML Code Using Function "explode ()" to String <ul id = "beritautama"> Smithers result of the function "explode ()" will The capacity of the array variable.
Her following script:

   1. $bacaHTML = bacaHTML(“http://www.detik.com”);
   2. $pecah = explode(‘<ul id=”beritautama”>’,$bacaHTML);
   3. echo $pecah[1];


index to 0 ($ ruptured [0]) of the breakdown products will Take all the HTML code from the homepage http://detik.com
index to 1 ($ rupture [1]) will Taking HTML code after the <ul id = "beritautama">
index to 1 ($ rupture [1]) we will explode again to get the HTML code in the Tag
<ul id = "beritautama"> and </ ul>

Following his script:


   1. $pecah2 = explode('<ul/>',$pecah[1]);
   2. echo $pecah2[0];
 
FULL CODE:

   1. <?php
   2. function bacaHTML($url){
   3.      // inisialisasi CURL
   4.      $data = curl_init();
   5.      // setting CURL
   6.      curl_setopt($data, CURLOPT_RETURNTRANSFER, 1);
   7.      curl_setopt($data, CURLOPT_URL, $url);
   8.      // menjalankan CURL untuk membaca isi file
   9.      $hasil = curl_exec($data);
  10.              curl_close($data);
  11.              return $hasil;
  12.         }
  13.          
  14.         $kodeHTML =  bacaHTML('http://www.detik.com/');
  15.         $pecah = explode('<ul id="beritautama">', $kodeHTML);
  16.         $pecahLagi = explode('</ul>', $pecah[1]);
  17.         echo "<ul>".$pecahLagi[0]."</ul>";
  18.         ?>