这是我的第一个问题在这里,因为我是相当新的这个世界!我花了几天试图找出了这一点为自己,但至今仍未能找到任何有用的信息。
This is my first question here as I'm fairly new to this world! I've spent a few days trying to figure this out for myself, but haven't so far been able to find any useful info.
我想找回一个字节的范围从存储在S3中的文件,使用这样的:
I'm trying to retrieve a byte range from a file stored in S3, using something like:
S3Key.get_contents_to_file(tempfile, headers={'Range': 'bytes=0-100000'}
这是我试图从恢复的文件是一个视频文件,具体的MXF。当我要求一个字节的范围内,我回来了临时文件的详细信息比要求。例如,使用一个文件,我要求100,000字节,并取回100451。
The file that I'm trying to restore from is a video file, specifically an MXF. When I request a byte range, I get back more info in the tempfile than requested. For example, using one file, I request 100,000 bytes and get back 100,451.
有一点需要注意的MXF文件是他们合法包含的0x0A(ASCII换行)和0X0D(ASCII回车)。
One thing to note about MXF files is that they legitimately contain 0x0A (ASCII line feed) and 0x0D (ASCII carriage return).
我周围挖,看来,任何时候0D字节是present文件中,检索到的信息增加了0D 0A,而不是仅仅0D,因此出现检索比需要更多的信息。
I had a dig around and it appears that any time a 0D byte is present in the file, the retrieved info adds 0A 0D instead of just 0D, therefore appearing to retrieve more info than required.
举个例子,原始文件包含的十六进制字符串:
As an example, original file contains the Hex string of:
02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05
02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05
不过,文件下载的形式S3有:
But the file downloaded form S3 has:
02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05
02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05
我试着调试code,并通过逻辑宝途的工作我的方式,但我是比较新的,在这一点,所以得到很容易丢失。
I've tried to debug the code and work my way through the Boto logic, but I'm relatively new at this, so get lost very easily.
我创造了这个测试,这表明该问题
I created this for testing, which shows the issue
from boto.s3.connection import S3Connection
from boto.s3.connection import Location
from boto.s3.key import Key
import boto
import os
## AWS credentials
AWS_ACCESS_KEY_ID = 'secret key'
AWS_SECRET_ACCESS_KEY = 'access key'
## Bucket name and path to file
bucketName = 'bucket name'
filePath = 'path/to/file.mxf'
#Local temp file to download to
tempFilePath = 'c:/tmp/tempfile'
## Setup the S3 connection and create a Key to access the file specified
## in filePath
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucketName)
S3Key = Key(bucket)
S3Key.key = filePath
def testRangeGet(bytesToRead=100000): # default read of 100K
tempfile = open(tempFilePath, 'w')
rangeString = 'bytes=0-' + str(bytesToRead -1) #create byte range as string
rangeDict = {'Range': rangeString} # add this to the dictionary
S3Key.get_contents_to_file(tempfile, headers=rangeDict) # using Boto
tempfile.close()
bytesRead = os.path.getsize(tempFilePath)
print 'Bytes requested = ' + str(bytesToRead)
print 'Bytes recieved = ' + str(bytesRead)
print 'Additional bytes = ' + str(bytesRead - bytesToRead)
我想有什么东西在宝途code,它是寻找出某些ASCII转义字符和修改他们,我找不到任何方法来指定,只是把它当作一个二进制文件。
I guess there is something in the Boto code that is looking out for certain ASCII escape characters and modifying them, and I can't find any way to specify to just treat it as a binary file.
有没有人遇到过类似问题,可以共享一个解决的方法?
Has anyone had a similar problem and can share a way around it?
感谢
添
打开输出文件为二进制文件。否则,写入该文件将LF自动转换为CR / LF。
Open your output file as a binary file. Otherwise writing into that file will convert LF to CR/LF automatically.
tempfile = open(tempFilePath, 'wb')
这当然只需要在Windows系统上。 Unix系统不会任何转换,文件不管已经打开的文本或二进制文件。
That of course is only necessary on Windows systems. Unixes won't convert anything, regardless whether a file has been opened as text or as binary file.
在上传时,您应该小心为好,你没有得到这样的状损坏的数据进入S3摆在首位。
You should take care when uploading as well that you don't get such-like corrupted data into S3 in the first place.