宝途"得到字节范围和QUOT;返回超过预期字节、范围、宝途、QUOT

2023-09-11 09:46:50 作者:夜の未殇

这是我的第一个问题在这里,因为我是相当新的这个世界!我花了几天试图找出了这一点为自己,但至今仍未能找到任何有用的信息。

This is my first question here as I'm fairly new to this world! I've spent a few days trying to figure this out for myself, but haven't so far been able to find any useful info.

我想找回一个字节的范围从存储在S3中的文件,使用这样的:

I'm trying to retrieve a byte range from a file stored in S3, using something like:

S3Key.get_contents_to_file(tempfile, headers={'Range': 'bytes=0-100000'}

这是我试图从恢复的文件是一个视频文件,具体的MXF。当我要求一个字节的范围内,我回来了临时文件的详细信息比要求。例如,使用一个文件,我要求100,000字节,并取回100451。

The file that I'm trying to restore from is a video file, specifically an MXF. When I request a byte range, I get back more info in the tempfile than requested. For example, using one file, I request 100,000 bytes and get back 100,451.

有一点需要注意的MXF文件是他们合法包含的0x0A(ASCII换行)和0X0D(ASCII回车)​​。

One thing to note about MXF files is that they legitimately contain 0x0A (ASCII line feed) and 0x0D (ASCII carriage return).

我周围挖,看来,任何时候0D字节是present文件中,检索到的信息增加了0D 0A,而不是仅仅0D,因此出现检索比需要更多的信息。

I had a dig around and it appears that any time a 0D byte is present in the file, the retrieved info adds 0A 0D instead of just 0D, therefore appearing to retrieve more info than required.

举个例子,原始文件包含的十六进制字符串:

As an example, original file contains the Hex string of:

02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05

02 03 00 00 00 00 3B 0A 06 0E 2B 34 01 01 01 05

不过,文件下载的形式S3有:

But the file downloaded form S3 has:

02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05

02 03 00 00 00 00 3B 0D 0A 06 0E 2B 34 01 01 01 05

我试着调试code,并通过逻辑宝途的工作我的方式,但我是比较新的,在这一点,所以得到很容易丢失。

I've tried to debug the code and work my way through the Boto logic, but I'm relatively new at this, so get lost very easily.

我创造了这个测试,这表明该问题

I created this for testing, which shows the issue

from boto.s3.connection import S3Connection
from boto.s3.connection import Location
from boto.s3.key import Key
import boto
import os


## AWS credentials
AWS_ACCESS_KEY_ID = 'secret key'
AWS_SECRET_ACCESS_KEY = 'access key'

## Bucket name and path to file
bucketName = 'bucket name'
filePath = 'path/to/file.mxf'

#Local temp file to download to
tempFilePath = 'c:/tmp/tempfile'


## Setup the S3 connection and create a Key to access the file specified
## in filePath
conn = S3Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)
bucket = conn.get_bucket(bucketName)
S3Key = Key(bucket)
S3Key.key = filePath

def testRangeGet(bytesToRead=100000): # default read of 100K
    tempfile = open(tempFilePath, 'w')
    rangeString = 'bytes=0-' + str(bytesToRead -1)  #create byte range as string
    rangeDict = {'Range': rangeString} # add this to the dictionary
    S3Key.get_contents_to_file(tempfile, headers=rangeDict) # using Boto
    tempfile.close()
    bytesRead = os.path.getsize(tempFilePath)
    print 'Bytes requested = ' + str(bytesToRead)
    print 'Bytes recieved = ' + str(bytesRead)
    print 'Additional bytes = ' + str(bytesRead - bytesToRead)

我想有什么东西在宝途code,它是寻找出某些ASCII转义字符和修改他们,我找不到任何方法来指定,只是把它当作一个二进制文件。

I guess there is something in the Boto code that is looking out for certain ASCII escape characters and modifying them, and I can't find any way to specify to just treat it as a binary file.

有没有人遇到过类似问题,可以共享一个解决的方法?

Has anyone had a similar problem and can share a way around it?

感谢

推荐答案

打开输出文件为二进制文件。否则,写入该文件将LF自动转换为CR / LF。

Open your output file as a binary file. Otherwise writing into that file will convert LF to CR/LF automatically.

tempfile = open(tempFilePath, 'wb')

这当然只需要在Windows系统上。 Unix系统不会任何转换,文件不管已经打开的文本或二进制文件。

That of course is only necessary on Windows systems. Unixes won't convert anything, regardless whether a file has been opened as text or as binary file.

在上传时,您应该小心为好,你没有得到这样的状损坏的数据进入S3摆在首位。

You should take care when uploading as well that you don't get such-like corrupted data into S3 in the first place.