2016年11月20日 星期日

準備 Captcha 資料集


2015 年 2 月,嘗試過 Captcha 解碼,但失敗了。最近不斷吸收機器學習方面的知識,坊間也有用 Google Cloud Vision API 對 Google Captcha 的方案,我也想試試自己在這方面的能力。但在這之前,我需要有一些 Captcha 圖案及標籤,用來訓練大腦。最簡單的方法就是用程式來自行生產。我用 PHP 寫了兩個程序:

第一個是生成 Captcha 圖像及標籤的程式:
<?php
//----------------------------------------------------------------------------------------
//  Captcha Generator
//----------------------------------------------------------------------------------------
//  Platform: CentOS7 + PHP + Apache
//  Written by Pacess HO
//  Copyright 2016 Pacess Studio.  All rights reserved.
//----------------------------------------------------------------------------------------

header("Access-Control-Allow-Origin: https://home.pacess.com");
header("Access-Control-Allow-Methods: POST");

date_default_timezone_set("Asia/Hong_Kong");
mb_internal_encoding("UTF-8");
ini_set("memory_limit", "-1");
set_time_limit(0);

session_start();

//----------------------------------------------------------------------------------------
require_once "./securimage/securimage.php";

//========================================================================================
if ($_REQUEST["code"] == "1")  {

   header("Content-Type: text/html");
   $codeArray = $_SESSION["securimage_code_disp"];
   echo("Captcha:".$codeArray["default"]);
   exit(0);
}

$securimage = new Securimage();
$securimage->show();

?>

第二個是讀取圖像並把它以標籤作為檔名的程式:
<?php
//----------------------------------------------------------------------------------------
//  Captcha Dataset Generator
//----------------------------------------------------------------------------------------
//  Platform: CentOS7 + PHP + Apache
//  Written by Pacess HO
//  Copyright 2016 Pacess Studio.  All rights reserved.
//----------------------------------------------------------------------------------------

header("Content-type: text/html");
header("Cache-Control: no-cache, must-revalidate");
header("Expires: Tue, 10 Mar 1987 00:00:00 GMT");

date_default_timezone_set("Asia/Hong_Kong");
mb_internal_encoding("UTF-8");
ini_set("memory_limit", "-1");
set_time_limit(0);

//----------------------------------------------------------------------------------------
$path = "./files/";
$cookieFile = $path."_cookie.txt";
$count = 1;

//========================================================================================
//  Main program
if (isset($_REQUEST["count"]))  {$count = intval($_REQUEST["count"]);}
for ($i=0; $i<$count; $i++)  {

   //  Get a Captcha image
   $curl = curl_init();
   curl_setopt($curl, CURLOPT_URL, "http://sitachan.local/captcha/getCode.php");
   curl_setopt($curl, CURLOPT_POST, 1);
   curl_setopt($curl, CURLOPT_POSTFIELDS, "code=0");
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($curl, CURLOPT_COOKIEJAR, $cookieFile); 
   curl_setopt($curl, CURLOPT_COOKIEFILE, $cookieFile); 
   $pngContent = curl_exec($curl);
   curl_close($curl);

   //  Get a Captcha value
   $curl = curl_init();
   curl_setopt($curl, CURLOPT_URL, "http://sitachan.local/captcha/getCode.php");
   curl_setopt($curl, CURLOPT_POST, 1);
   curl_setopt($curl, CURLOPT_POSTFIELDS, "code=1");
   curl_setopt($curl, CURLOPT_RETURNTRANSFER, 1);
   curl_setopt($curl, CURLOPT_COOKIEJAR, $cookieFile); 
   curl_setopt($curl, CURLOPT_COOKIEFILE, $cookieFile); 
   $string = curl_exec($curl);
   curl_close($curl);

   //  String: "Captcha: nYY6FF"
   $array = explode(":", $string);
   $code = $array[1];
   if (strlen($code) == 0)  {$code = "default";}
   $filename = $code.".png";

   //  Save image
   $filePath = $path.$filename;
   $file = fopen($filePath, "w");
   fwrite($file, $pngContent);
   fclose($file);

   //----------------------------------------------------------------------------------------
   //  Output
   echo("<img src='$filePath' />");
   echo("Image size: ".strlen($pngContent));
   echo("String: $string");
   echo("Filename: $filename");
}
?>

沒有留言: