jeudi 22 décembre 2016

Scrap a site that uses captcha using PHP (laravel)

I am trying to scrap a webpage that uses Captcha using the following code in laravel (PHP) -

        $ch = curl_init();
        curl_setopt($ch, CURLOPT_URL, $startPage);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
        $output = curl_exec($ch);
        curl_close($ch); 
        dd($output);

But the output I am getting is -

<!DOCTYPE html>\n
<!--[if lt IE 7]> <html class="no-js ie6 oldie" lang="en-US"> <![endif]-->\n
<!--[if IE 7]>    <html class="no-js ie7 oldie" lang="en-US"> <![endif]-->\n
<!--[if IE 8]>    <html class="no-js ie8 oldie" lang="en-US"> <![endif]-->\n
<!--[if gt IE 8]><!--> <html class="no-js" lang="en-US"> <!--<![endif]-->\n
<head>\n
<title>Attention Required! | CloudFlare</title>\n
<meta charset="UTF-8" />\n
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />\n
<meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1" />\n
<meta name="robots" content="noindex, nofollow" />\n
<meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1" />\n
<link rel="stylesheet" id="cf_styles-css" href="/cdn-cgi/styles/cf.errors.css" type="text/css" media="screen,projection" />\n
<!--[if lt IE 9]><link rel="stylesheet" id='cf_styles-ie-css' href="/cdn-cgi/styles/cf.errors.ie.css" type="text/css" media="screen,projection" /><![endif]-->\n
<style type="text/css">body{margin:0;padding:0}</style>\n
<!--[if lte IE 9]><script type="text/javascript" src="/cdn-cgi/scripts/jquery.min.js"></script><![endif]-->\n
<!--[if gte IE 10]><!--><script type="text/javascript" src="/cdn-cgi/scripts/zepto.min.js"></script><!--<![endif]-->\n
<script type="text/javascript" src="/cdn-cgi/scripts/cf.common.js"></script>\n
\n
\n
</head>\n
<body>\n
  <div id="cf-wrapper">\n
    <div class="cf-alert cf-alert-error cf-cookie-error" id="cookie-alert" data-translate="enable_cookies">Please enable cookies.</div>\n
    <div id="cf-error-details" class="cf-error-details-wrapper">\n
      <div class="cf-wrapper cf-header cf-error-overview">\n
        <h1 data-translate="challenge_headline">One more step</h1>\n
        <h2 class="cf-subheadline"><span data-translate="complete_sec_check">Please complete the security check to access</span> http://ift.tt/2hhjSQC
      </div><!-- /.header -->\n
\n
      <div class="cf-section cf-highlight cf-captcha-container">\n
        <div class="cf-wrapper">\n
          <div class="cf-columns two">\n
            <div class="cf-column">\n
              <div class="cf-highlight-inverse cf-form-stacked">\n
                <form class="challenge-form" id="challenge-form" action="/cdn-cgi/l/chk_captcha" method="get">\n
  <script type="text/javascript" src="/cdn-cgi/scripts/cf.challenge.js" data-type="normal"  data-ray="3155d0e9e7183162" async data-sitekey="6LfOYgoTAAAAAInWDVTLSc8Yibqp-c9DaLimzNGM" data-stoken="w544lymazwGjwIrzS5FkbUOmI_S-R0ayyGrufDpBHB0r9RUgxfcbbqQFA06YCTh9vgvumCj1JE6IT8QGMsSv6Fy5SElxzE39mNI3WmwDqpE"></script>\n
  <div class="g-recaptcha"></div>\n
  <noscript id="cf-captcha-bookmark" class="cf-captcha-info">\n
    <div><div style="width: 302px">\n
      <div>\n
        <iframe src="http://ift.tt/2ikpKEW" frameborder="0" scrolling="no" style="width: 302px; height:422px; border-style: none;"></iframe>\n
      </div>\n
      <div style="width: 300px; border-style: none; bottom: 12px; left: 25px; margin: 0px; padding: 0px; right: 25px; background: #f9f9f9; border: 1px solid #c1c1c1; border-radius: 3px;">\n
        <textarea id="g-recaptcha-response" name="g-recaptcha-response" class="g-recaptcha-response" style="width: 250px; height: 40px; border: 1px solid #c1c1c1; margin: 10px 25px; padding: 0px; resize: none;"></textarea>\n
        <input type="submit" value="Submit"></input>\n
      </div>\n
    </div></div>\n
  </noscript>\n
</form>\n
\n
              </div>\n
            </div>\n
\n
            <div class="cf-column">\n
              <div class="cf-screenshot-container">\n
              \n
                <span class="cf-no-screenshot"></span>\n
              \n
              </div>\n
            </div>\n
          </div><!-- /.columns -->\n
        </div>\n
      </div><!-- /.captcha-container -->\n
\n
      <div class="cf-section cf-wrapper">\n
        <div class="cf-columns two">\n
          <div class="cf-column">\n
            <h2 data-translate="why_captcha_headline">Why do I have to complete a CAPTCHA?</h2>\n
\n
            <p data-translate="why_captcha_detail">Completing the CAPTCHA proves you are a human and gives you temporary access to the web property.</p>\n
          </div>\n
\n
          <div class="cf-column">\n
            <h2 data-translate="resolve_captcha_headline">What can I do to prevent this in the future?</h2>\n
\n
            <p data-translate="resolve_captcha_antivirus">If you are on a personal connection, like at home, you can run an anti-virus scan on your device to make sure it is not infected with malware.</p>\n
\n
            <p data-translate="resolve_captcha_network">If you are at an office or shared network, you can ask the network administrator to run a scan across the network looking for misconfigured or infected devices.</p>\n
          </div>\n
        </div>\n
      </div><!-- /.section -->\n
\n
      <div class="cf-error-footer cf-wrapper">\n
  <p>\n
   \n
    \n
  </p>\n
</div><!-- /.error-footer -->\n
\n
\n
    </div><!-- /#cf-error-details -->\n
  </div><!-- /#cf-wrapper -->\n
\n
  <script type="text/javascript">\n
  window._cf_translation = {};\n
  \n
  \n
</script>\n
\n
</body>\n
</html>

How I can bypass captcha page from the code? Is there any free library of some sort?



via Chebli Mohamed

1 commentaire:

  1. Using AVG protection for a couple of years now, I would recommend this Anti-virus to everyone.

    RépondreSupprimer