我想要使用MODI到OCR窗口的程序。它工作正常的我抢编程用win32互操作这样的画面:
I'm trying to use MODI to OCR a window's program. It works fine for screenshots I grab programmatically using win32 interop like this:
public string SaveScreenShotToFile()
{
RECT rc;
GetWindowRect(_hWnd, out rc);
int width = rc.right - rc.left;
int height = rc.bottom - rc.top;
Bitmap bmp = new Bitmap(width, height);
Graphics gfxBmp = Graphics.FromImage(bmp);
IntPtr hdcBitmap = gfxBmp.GetHdc();
PrintWindow(_hWnd, hdcBitmap, 0);
gfxBmp.ReleaseHdc(hdcBitmap);
gfxBmp.Dispose();
string fileName = @"c:\temp\screenshots\" + Guid.NewGuid().ToString() + ".bmp";
bmp.Save(fileName);
return fileName;
}
此图像,然后保存为一个文件,跑过MODI像这样的:
This image is then saved to a file and ran through MODI like this:
private string GetTextFromImage(string fileName)
{
MODI.Document doc = new MODI.DocumentClass();
doc.Create(fileName);
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
MODI.Image img = (MODI.Image)doc.Images[0];
MODI.Layout layout = img.Layout;
StringBuilder sb = new StringBuilder();
for (int i = 0; i < layout.Words.Count; i++)
{
MODI.Word word = (MODI.Word)layout.Words[i];
sb.Append(word.Text);
sb.Append(" ");
}
if (sb.Length > 1)
sb.Length--;
return sb.ToString();
}
这部分工作正常,但是,我不想OCR,整个的屏幕截图,只是它的部分。我尝试裁剪图像编程方式是这样的:
This part works fine, however, I don't want to OCR the entire screenshot, just portions of it. I try cropping the image programmatically like this:
private string SaveToCroppedImage(Bitmap original)
{
Bitmap result = original.Clone(new Rectangle(0, 0, 250, 250), original.PixelFormat);
var fileName = "c:\\" + Guid.NewGuid().ToString() + ".bmp";
result.Save(fileName, original.RawFormat);
return fileName;
}
然后OCRing这个更小的图像,但是MODI抛出一个异常; 'OCR运行错误',错误code是-959967087。
and then OCRing this smaller image, however MODI throws an exception; 'OCR running error', the error code is -959967087.
为什么MODI处理原始位图,但不是缩小版从什么时间?
Why can MODI handle the original bitmap but not the smaller version taken from it?
看起来好像答案是在给MODI一个更大的画布。我也试图采取控制和OCR它的屏幕截图,并碰到了同样的问题。最后,我采取了控制的图像,复制图像到一个更大的位图,并进行OCR较大的位图。
Looks as though the answer is in giving MODI a bigger canvas. I was also trying to take a screenshot of a control and OCR it and ran into the same problem. In the end I took the image of the control, copied the image into a larger bitmap and OCRed the larger bitmap.
另一个问题,我发现的是,你必须有一个正确的扩展图像文件。换句话说,的.tmp不剪
Another issue I found was that you must have a proper extension for your image file. In other words, .tmp doesn't cut it.
我一直在创造我的OCR方法,它看起来是这样的(我直接处理图像对象)内的大源的工作:
I kept the work of creating a larger source inside my OCR method, which looks something like this (I deal directly with Image objects):
public static string ExtractText(this Image image)
{
var tmpFile = Path.GetTempFileName();
string text;
try
{
var bmp = new Bitmap(Math.Max(image.Width, 1024), Math.Max(image.Height, 768));
var gfxResize = Graphics.FromImage(bmp);
gfxResize.DrawImage(image, new Rectangle(0, 0, image.Width, image.Height));
bmp.Save(tmpFile + ".bmp", ImageFormat.Bmp);
var doc = new MODI.Document();
doc.Create(tmpFile + ".bmp");
doc.OCR(MODI.MiLANGUAGES.miLANG_ENGLISH, true, true);
var img = (MODI.Image)doc.Images[0];
var layout = img.Layout;
text = layout.Text;
}
finally
{
File.Delete(tmpFile);
File.Delete(tmpFile + ".bmp");
}
return text;
}
我不知道的最小尺寸是什么,但它看起来好像1024×768的伎俩。
I'm not sure exactly what the minimum size is, but it appears as though 1024 x 768 does the trick.
上一篇:从Django应用程序添加URL应用程序、Django、URL
下一篇:找不到带有关键字参数';{';PK';:';的';PLAN_EDIT';的反向。1个已尝试的图案:['